Archive for the ‘Uncategorized’ Category
This was originally posted at the now-defunct Baseballin’ on a Budget on December 22, 2010.
Note: This is for fun. It’s an estimate. It’s December and the A’s haven’t done anything this week. There will be reactions to this that I don’t understand. “HOW COULD YOU SUGGEST KOUZ WILL ONLY WALK THAT MUCH?” “Brett Anderson WILL STRIKE OUT EVERYONE!” My estimates are wrong. They might even be really wrong. But this took me probably 30 minutes to do on FanGraphs, so if you really disagree or have 30 minutes to spare, I ask you only to try it for yourself. I bet our answers aren’t that different. Post your results in the comments and let’s have some civil discourse. Also, sorry for defending myself before you read anything. Maybe no one will care that much.
I love FanGraphs*. I love the amount of information. I love how many ways I can cut it up. I love that it can summarize a season in a number or two, but also let dig into all the minutiae I want. And they keep adding to it.
*I suppose it’s somewhat strange, given the name of the website, that one of the things I’m not in love with are the graphs. Not that I don’t find them useful. But I usually don’t go to FanGraphs looking for pictures. Just pictures that the numbers paint ZING! Ahh, digressions…
One of the features I like are the fans’ projections. I like seeing what a large of group of (presumably) knowledgeable fans think about particular players and teams. Of greater interest to me for this exercise though are the WAR results for each player that I project. Therefore, if I projected every player on a given roster, I should get an estimate of how good a team might be.
A couple of notes first. I was conservative about talent levels, or at least tried to be. Nothing too much better than what a player had accomplished previously. The only thing that I was a little aggressive on (which, as an A’s fan, might scare you) was playing time. I don’t think that I was too bad about it though. I also was very conservative with defense, mostly because I don’t fully trust the metrics yet. I rated no player more than 5 runs above or below average.
I got about 1,350 innings pitched by the entire team, and I estimated roughly 6,300 plate appearances. A typical major league team will pitch roughly 1,450 innings in a season and have about 6,100 plate appearances. I’m assuming that the remaining 100 innings will be replacement-level for the pitchers, and you can knock off somewhere in the neighborhood of 0.5 WAR for the extra plate appearances that will be given to replacement level players.
So without further ado, I bring you your 2011 Oakland Athletics (if you click on each graphic, you should be able to see larger versions):
(I HAVE NO IDEA WHERE THESE GRAPHICS WENT. SORRY)
Upon looking at the tables, I see where I had extra plate appearances. There’s about an extra 500 in the outfield and an extra 500 for designated hitters. If you want to make that adjustment, take away from Ryan Sweeney, Conor Jackson, and Chris Carter. I’ve also got 166 starts by pitchers in there, but there’s only so much I can do about that.
Other things to note from my predictions: Josh Willingham and Hideki Matsui, the two hitters the A’s signed last week, will be their best hitters. I like David DeJesus, but if he struggles adjusting to rightfield, he’s going to have to hit a lot to be an average rightfielder. And a re-birth for Kurt Suzuki? He really can’t be as bad as he was last season, and it’s not hard to be an average catcher.
I don’t trust Gio Gonzalez to stop walking people. Despite what we’ve said about the team BABIP possibly being sustainable with the defensive talent, I tried to bring all of the ERAs up to more reasonable levels (except for Andrew Bailey, who is, of course, dreamy). And I really like Brett Anderson. Too much.
In the tables above, the hitters account for 19.6 WAR (of which 1.3 is accumulated defensively). The total for the pitchers (which is conservative talent wise but aggressive playing time wise) is also 19.6 WAR. With 39.2 WAR, that leaves us with about an 85-win team, based on my back-calculations of replacement level last year.
That feels about right to me, particularly given my conservative estimates for the defense and pitching. Last season, this was an 81-win team that accumulated 35.6 WAR. To describe the WAR components in the chart below, the 2010 position players generally were average hitters, caught the ball really well, used the DH, and fielded a full lineup for every game. The starting pitchers were good and the bullpen was ok. (Side note: the 2010 Pittsburgh Pirates were really awful).
(I HAVE NO IDEA WHERE THESE GRAPHICS WENT. SORRY)
I wanted to do one last check on my predictions. I checked each individual’s WAR estimate above with their WAR totals from last season, pro-rated to match the playing time.
Clearly I’m bullish on the pitchers. The red above is where I’m fairly high on the WAR estimate (per playing time) and the yellow is where I’m low. As I said, I was conservative about defense, and having that drive so much of the positions players’ value, I was bound to have some lower estimates. The other thing I notice is that I’m expecting too much from the 5th starter candidates. They all could be good pitchers, but if Bobby Cramer, Tyson Ross, Brandon McCarthy, and Josh Outman combine for 2 WAR, that’s probably pretty good.
As cautiously optimistic (passively pessimistic?) fans, this team has improved this offseason, but mostly on the margins. A half-win here, a half-win there. Those matter, don’t get me wrong. And it appears they’ll have every shot starting in April. Texas hasn’t added anything yet (though they’ll still be good), Anaheim’s big spending so far was on a guy who will pitch 70 innings, and Seattle stinks.
But this hasn’t become a juggernaut overnight, nor do I think anyone was/is expecting it to. If the A’s are going to make the playoffs this year, it’s probably going to involve a painful regular season (see: Giants, San Francisco, 2010). That’s better than a painless regular season though, because at least it matters.
The Indians gave up 23 runs on the first two days of the season on their way to an 0-2 start. Their ace gave up 10 runs in 3+ innings on Opening Day and the Wahoo Warriors were down 14-0 in the fourth. Through two-and-a-half weeks, their best two hitters (Shin-Soo Choo and Carlos Santana) are hitting .206/.281/.327. Whatever’s left of Grady Sizemore just played his first game Sunday. And yet the Indians are 11-4, a game ahead of Kansas City and four north of Chicago through a tenth of the season.
Most projection systems had the Tribe winning somewhere between 68 and 75 games in 2011. I too thought that the Indians would win 70-75 games. Obviously the hot start is just that, only the beginning of a long season, but it’s hard not to get excited. After all, isn’t that what April baseball is all about? If their true talent level suggests that they’ll only win 72 games though, what are the chances that at some point during a 162-game season they’d have a stretch like this, winning 11 of 15?
In any given set of 15 games, they have a 2.28% chance of winning at least 11 games (1.72% of winning exactly 11). They have a 148 sets of 15 games (games 1 to 15, 2 to 16…148 to 162), though the sets are interrelated. If they had 148 distinct chances to win 11 of 15, they’d find a stretch like this about 96.7% of the time. If we want to take the 15-game subset very literally, there are 11 (I’m rounding), giving them only a 22.4% of pulling off this feat. We also know that a run like this would likely start with a win, as a 72-win team, they’d have about 60 chances to do this, increasing the chances to 75.0%.
So there’s a decent enough chance that during some half-month the Indians would play this well. Doing it at the beginning of the season does two things though. One, it allows the Indians to possibly run into another stretch like this at some point. The odds that the Indians run into a streak like this remain the same (still 2.28%), but the number of opportunities to do so are only slightly fewer. Two, even if the Indians play at their assumed talent level of 72 wins, they be viewed as a contender into the summer, and the Fans of the Feather will have something for which to root.
Let’s assume that beginning with the upcoming four-game series with Kansas City (for AL Central supremacy; who’d have thought that?) the Indians play like a 72-win team. Over their next 65 games we’d expect them to win approximately 29, making them 40-40 after 80 games. As they approach the trade deadline, the people of Cleveland are not talking about who to trade away but what spare parts to acquire.
What if we adjust the Indians “talent-level” based on the hot start. A team that wins 72 of 162 should win roughly 65 of 147. If we make the Indians a 76-win team based on talent, they’d be 42-38 after 80 games, definitely in the mix.
This is all just math though. While Michael Brantley and Asdrubal Cabrera have played over their heads so far, Carlos Santana and Shin-Soo Choo have done almost nothing and should come around. Travis Hafner won’t hit .350 all year either, but the guy finally seems healthy, was the best hitter in the American League in 2005-2006, and has been semi-productive while hurt the last few seasons. Sizemore homered and doubled in his first game in almost a year Sunday. The starters have pitched well so far (enough to give home even when the BABIPs go up and LOB percentages come down) and the bullpen (especially the back end of Rafael Perez, Tony Sipp, and Chris Perez) has been terrific.
The Sons of the Cuyahoga have posted a +29 run differential so far, allowing the third-fewest runs and scoring the third-most runs in the American League. No one was expected to run away with this division (90 wins seems unlikely for any of these teams). Given that the White Sox and Tigers are off to mediocre starts and the major problems Minnesota is facing, the Erie Warriors and their fans might find themselves in a pennant race this summer earlier than they expected.
Season in and season out, Cain gives up a lesser percentage of home runs on fly balls than the average pitcher, both at home and away. Why? I can’t say with confidence, but Dave Pinto at Baseball Musings has come up with a theory that’s as good as any I’ve read: he surmised that Cain’s fastball is dropping less than the hitter expects, resulting in the ball being struck somewhere below the center of the ball, and either going straight into the air (as his 12.9% career infield fly ball percentage (IFFB%) might suggest) or at least somewhere inside the park*.
*Quick aside that I had not considered at any point until NOW: wouldn’t Oakland be ideal for him, given the enormous foul ground territory? *Checks* In a small sample, granted, he’s pitched to a 1.16 ERA in 3 starts, his best ERA at any park where he’s had two or more starts. This could be coincidence, but I’m intrigued.
Do more infield pop ups matter more for pitchers in Oakland? I too was intrigued, so I tried to look it up. Not for Matt Cain. But for every Oakland pitcher to pitch during the period for which we have the data.
Batted ball data has been recorded for nine seasons (2002-2010), creating a sample of 99 pitchers and 190 pitcher-seasons. For all these pitchers, I created a database with about 40 columns, with everything from wins and losses to balls and strikes thrown to FIP and tERA. That might seem like a lot of data, but after it’s parsed, there’s very little left.
Another item to note is that the batted balls are recorded only when they are hits or outs. Many foul balls that are caught in Oakland go unnoticed in other ballparks and result in something else (or another foul ball). There should be more fly balls recorded in Oakland because of these additional outs.
Of the 99 pitchers, only 38 of them threw at least 100 innings for the Athletics (I only looked at time with the Athletics for this study), allowing for roughly 50 innings in the Coliseum and 50 on the road. If I further parse the data, only 15 pitchers threw both 100 innings at home and on the road while with the A’s in the last nine seasons.
At first, I checked to see if there was any correlation between infield fly ball percentage (IFFB%) and ERA, RA, BABIP, FIP, xFIP, and ERA-FIP using the home and road stats. More infield pop ups would lead to more outs thanks to the additional foul territory.
I used all three sets of pitchers (all 99, the 38, and the 15); if the theory has any weight, we would expect to see a negative correlation with the first three statistics (they go down as IFFB% goes up) for the home data, and there should be no correlation for the road data. FIP and xFIP are theoretically supposed to normalize batted ball noise, and I’m really not sure what to expect from those, though I suppose there should be no difference. If ERAs are lower though, then ERA-FIP should also be lower, and that would have a positive correlation.
Here are a few sets of results:
I could analyze them, but I realize I made a mistake. IFFB% is the percentage of fly balls that are infield fly balls. What I really needed was the percentage of batted balls that are infield fly balls. To get that I multiplied IFFB% by the fly ball percentage (FB%), to get what I called TOTAL_IFFB%. Here are those results:
I also did R-squared tests with the same data:
I would call it “mixed results,” and there aren’t any real patterns, at least not that I can see. More importantly, this still isn’t what I really want. What I want to know if having a higher TOTAL_IFFB% means something in the Coliseum. For this I divided the pitchers into three groups: high TOTAL_IFFB% (above 6%), medium TOTAL_IFFB% (between 3 and 6%), and low TOTAL_IFFB% (below 3%). For reference, Cain’s career TOTAL_IFFB% is 5.9%. Here are some results:
As a control, here is the same test for the groups on the road (grouped again by Home TOTAL_IFFB%):
I included K/9, BB/9, and HR/9 for each group to determine if maybe one set of pitchers was simply inferior to the others. All of these statistics are weighted by the innings pitched by each pitcher, so there’s a lot more Barry Zito in the >6% than Seth Etherton. There are approximately as many innings in the 3-6% group as in the >6% and <3% groups combined.
Obviously there is less scoring in the Coliseum because it is harder to hit home runs, but we also see a decrease in walks (perhaps because pitchers are less worried about mistakes).
Quick aside and important point: holy crap does the Coliseum depress BABIP. And it has to be the Coliseum, because the defenses, pitchers, and hitters should be relatively the same in both sets. I checked and checked and checked this, and I’m pretty sure it’s right. I’m not sure where I would have messed that up. Plus everything else makes sense.
The pitchers combined to have a 4.03% TOTAL_IFFB% at home and a 3.95% TOTAL_IFFB% on the road. The 0.08% difference is for roughly 20,100 batted balls at home and 19,500 batted balls on the road. The tenth of a percentile difference, when applied to the 20,000 batted balls, is roughly 20. 2 per season. Seems so wrong. Then I found this:
One of the more interesting effects I found is that parks have a strong affect [sic] on the proportion of batted balls that are infield flies.Team FlysIF Brewers 1.15 Mariners 1.12 Marlins 1.12 Reds 1.08 Devil Rays 1.07 --- Phillies 0.93 Royals 0.93 Indians 0.92 Diamondbacks 0.92 Giants 0.90
A player is 28 percent more likely to hit an infield fly in Milwaukee than he is in San Francisco. Why is that? My guess is that it has to do with foul territory. Since infield flies are only recorded when the ball is put into play, parks with a lot of foul territory are more likely to see foul pop-ups stay in and get caught, whereas ballparks with little foul territory will see a lot of pop-ups land in the stands and go unrecorded.
The problem with that theory is that while the parks at the bottom of the list do tend to be a little smaller in terms of foul territory, those at the top seem to be pretty average on the whole. Perhaps my data source is off, but there may be some other variable I’m not thinking of.
The A’s aren’t on either side of the list. While a couple of years old, it hasn’t changed. The park factor for infield flies in the Coliseum is close to neutral. I have to agree with David’s last sentence: “Perhaps my data source is off, but there may be some other variable I’m not thinking of.”
There’s nothing there to suggest that getting more infield fly balls makes you a better pitcher in the Coliseum. Matt Cain’s ability to limit home runs is what would continue to make him a very good pitcher in Oakland. However, there’s so much bias in the manner that I carved up the data that I can’t be sure it’s correct. Would love some reviewers to take a look…
Despite losing importance in the evaluation of players, I like batting average. I like it because it describes a series of events in definitive and simple terms: in what percentage of at-bats does a hitter get a hit? I understand that it doesn’t tell me much about a hitter’s value offensively. But it’s part of the equation. Even the formula is simple:
Same for on-base percentage. The percentage of plate appearances in which a player does not make an out. Obviously valuable, since outs are the game’s most important limited resource.I’ve never liked slugging percentage. Slugging percentage is the total number of bases a player hits for divided by the number of at-bats (average bases per at-bat). Because the numerator is a binary choice (the event either happened or it didn’t) for the first two metrics, they can be expressed as percentages. It’s tougher to do that with slugging percentage (despite its name), because the maximum would be 400% (a home run every at-bat) and the numerator can increase by 1, 2, 3, or 4, depending on the event. And I think that’s why it bothers me. It’d be like having a statistic for runs per game; that number really wouldn’t mean too much (obviously runs matter, but the rate statistic wouldn’t tell us anything).
The statistic “isolated slugging percentage” attempts to capture a player’s power only. It’s measured by subtracting batting average from slugging percentage. It’s valuable because, for example, there are many ways to have any given slugging percentage. In 10 at-bats, you could hit 5 singles or you could hit a home run and a single for a 0.500 slugging percentage. Same slugging percentage, different batting averages. ISO shows that. The units are still total bases per at-bat, which is an issue for me, and again, the number itself doesn’t mean anything. I’ve gone over and over this to make sure I define it correctly, so here I go: the numerator in ISO is the number of bases above one a batter achieves on each hit. Singles and outs are 0, doubles 1, triples 2, and home runs 3. Therefore, ISO groups singles and outs together, and I don’t like that. Dividing by the number of at-bats gives the number of “extra bases” a batter achieves per at-bat.
Instead of just complaining about it though, I tried to develop something that would make more sense. To do this I divided slugging percentage by batting average, making the units total bases per hit. For one, this makes the total number of events in the numerator and denominator the same. I’ve eliminated all at-bats that result in zero bases in my calculation. Because of this, the scale is similar to total bases: the minimum is 1.000 and the maximum is 4.000.
Next, I wanted to test to see how well it described a player’s value on offense. I plotted BA, OBP, SLG, ISO, and SLG/BA against wOBA (weighted on-base average), my favorite all-inclusive offensive statistic, for the 2010 season. wOBA is a statistic based on linear weights designed to measure a player’s overall offensive contributions per plate appearance. Using the observed run values of various offensive events from each player, (i.e. each single is worth 0.72 runs, each out is worth -0.28 runs), dividing by a player’s plate appearances, and scaling the result to the average on-base percentage results in wOBA. Here is the plot for batting average (I’ve also marked Jose Bautista’s spot on all the charts, since he’s a major outlier – I thought this might help make the point…of course, as we’re about to see, I’m dumb):
Here is the same plot three more times, against slugging percentage, isolated slugging percentage, and my new total bases per hit statistic.
The closer all of those points are to the line, the more the statistic correlates with wOBA. It’s easy to see from the plots that slugging percentage is the best, and isolated slugging percentage correlates fairly well. Here are the actual measures:
Damning evidence. For those without the statistics background, low numbers are bad. My statistic sucks at actually telling us how good a player is offensively. Important point here. I started writing this post in November. I had what you see above figured out two hours into writing, and then I was stuck. I couldn’t figure out how to make the statistic matter. And then recently I had a revelation. It matters because it makes sense. It doesn’t have to tell us anything about a player’s overall offensive value. The statistic itself tells us something, total bases per hit. And from there I kept going with what you see below. But first, I wanted to make sure I put this to bed. I checked the stats for 2008 to 2010, just to make sure 2010 wasn’t a weird year.
And 2010 was generous to me. My statistic is even more meaningless in 2008 and 2009. But from here on out, it will be my preferred measure of the damage done by a hitter. Until some smart commenter tells me why I’m wrong (which I assuredly am).
Now, instead of finding a better statistic about slugging that told me how good a hitter was, I found a better statistic about slugging. But this did not stop my search for an easily-calculated statistic that more perfectly aligns with wOBA. First attempt: slugging percentage, but with walks. Basically, the denominator becomes plate appearances, and all walks and hit-by-pitches becomes singles. It basically gives the batter credit for a base on walks. Here’s the formula:
I’ll show the results in a minute, but I also wanted to test what would happen to my statistic* if I included walks. Obviously, this gets away from the point of the measure (to better reflect slugging), but it’s possible that it could more accurately reflect a hitter’s value.
*I keep calling this “my statistic.” Someone tell me if this has been done before. I always feel like everything I create has been done before; if it made sense to me, it had to have made sense to someone years ago. Thanks for the ego check, everyone.
Formula and graph:
Completely and utterly useless. Summary:
BREAKING NEWS: Of these measures, slugging percentage and on-base percentage are most closely correlated with wOBA. Every other measure is markedly worse. My attempt to shake the baseball world = FAIL.
Over the next two weeks, Major League Baseball will announce the award winners for the 2010 season. There’s always a lot of debate about these awards; most of them, we’ll forget about quickly.
I don’t know enough about how much impact managers have and I don’t care enough about Gold Gloves to have too much of an opinion about those. But here would be my official ballots for the other awards. 3 spots on the ballot for rookies, 5 for pitchers, 10 for the MVP award. Without further ado:
American League Rookie of the Year
1. Austin Jackson
2. Brian Matusz
3. Danny Valencia
This crop of rookies isn’t great, but these guys had good seasons. The argument for Jackson centers around his 675 plate appearances and .293 batting average (helped by a .396 BABIP). Playing a good centerfield for an entire season puts him over Matusz for me, who went 10-12 with a 4.30 ERA in the very difficult AL East. He made 32 starts, pitched 175 innings, and is already Baltimore’s best starting pitcher. Valencia hit .311 at third base for the Twins and solidified a position that was killing them in the first half of the season. Valencia played just 85 games, which just isn’t enough playing time for me; it’s also my argument against Neftali Feliz, who might have been the best (and be the most talented) of all these guys, but who pitched just 69 innings to get his 40 saves. Whoever wins will be fine with me.
National League Rookie of the Year
1. Jason Heyward
2. Buster Posey
3. Jamie Garcia
Unlike the American League, the National League had a host of good rookies. In addition to the three above, Mike Stanton, Stephen Strasburg, Starlin Castro, Ike Davis, Alcides Escobar, Gaby Sanchez, Jhoulys Chacin, and Mike Leake all had promising starts to their careers. Jaime Garcia pitched 163 innings with a 2.70 ERA and still should finish a distant third. I’m not going to complain about either of the first two guys winning. Posey played the more difficult position and was the heart of his team’s offense. Heyward was slightly better, much younger (doing what he did at age 20 is very rare), and played the entire season. I know that Posey missing the first two months wasn’t his fault, but Heyward played, and played well, and thus gets the edge.
American League Cy Young
1. Felix Hernandez
2. Francisco Liriano
3. Cliff Lee
4. CC Sabathia
5. Justin Verlander
Apologies to David Price, Jered Weaver, and Jon Lester. Hernandez wins because he finished in the top three in the American League in the following categories: ERA, hits per 9 innings, WHIP, strikeouts, complete games, innings pitched, FIP, xFIP, and WAR. Yeah, that’ll do. He finished 13-12, which will be tough for some voters to choose over the 21-7 Sabathia, but he’s the only answer.
National League Cy Young
1. Roy Halladay
2. Adam Wainwright
3. Josh Johnson
4. Tim Lincecum
5. Ubaldo Jiminez
Hernandez’s season was good; Halladay’s was better. A 2.44 ERA in 250 innings, he struck out 219 hitters and walked only 30. He threw 9 complete games and 4 shutouts in 33 starts. He finished second in the NL in WHIP (1.04), third in ERA, and first in xFIP. All of the guys on this list were really good this season, but Halladay was the best. 219 K to 30 BB in 250 innings? Legit.
American League Most Valuable Player
1. Josh Hamilton
2. Evan Longoria
3. Miguel Cabrera
4. Jose Bautista
5. Adrian Beltre
6. Robinson Cano
7. Felix Hernandez
8. Joe Mauer
9. Cliff Lee
10. Shin-Soo Choo
Despite missing the final month of the season, Hamilton was clearly the best player in the American League this season. Playing 133 games is the only reason that this award wouldn’t be Hamilton’s, and no one else makes a really strong case. Hamilton can play defense (both left and center), run a little, and can flat-out mash.
National League Most Valuable Player
1. Joey Votto
2. Albert Pujols
3. Ryan Zimmerman
4. Carlos Gonzalez
5. Troy Tulowitzki
6. Roy Halladay
7. Matt Holliday
8. Adam Wainwright
9. Adrian Gonzalez
10. Aubrey Huff
Votto and Pujols had basically the same season. Look at the top two rows of this table. I’m going with Votto because Cincinnati was better. I’m generally not a big fan of “guy on better team should win MVP,” but these two guys are so close that it’s the only tiebreaker left. Also, I think it’s fair to note that this was the second worst season of Prince Albert’s career, and I’m putting him second on my fake MVP ballot.
I just wanted to let everyone (the 8 of you that read this) know that posts might be sparse for the next few weeks. The reason? I’m beginning another project (also baseball-related) set to start just after the World Series. You can check back here for more details once it’s up and running. Rest assured, loyal readers, that Knuckleballs is not dying. This new project will take some effort to get going, but Knuckleballs will still be fluttering in from time to time. Once the other project is going, Knuckleballs will go back to its current operating procedures: also sparse posts (the only difference now is that I’m flat-out telling you I’ll only be writing once a week). For anyone who is curious, topics currently in the queue:
- Roster construction: how many different ways to get to the goal? (H/T to Loyal Reader Mac, whose silly banter has provided the idea for many of these posts)
- AL and NL Award Ballots (MVP, Cy Young, and Rookie of the Year): I will publish these before the awards are announced (I have my lists, but I want to offer some analysis with them as well, so be patient).
- I’ll be going back through the first 70 posts and checking all of my predictions. All of them: the good (David Wright and Ubaldo Jimenez), the bad (Scott Baker is a sleeper Cy Young candidate), and the ugly (HEY SAN FRANCISCO GIANTS).
Thanks again to everyone for reading and helping to make this “realer.”