Archive for November 2010
Despite losing importance in the evaluation of players, I like batting average. I like it because it describes a series of events in definitive and simple terms: in what percentage of at-bats does a hitter get a hit? I understand that it doesn’t tell me much about a hitter’s value offensively. But it’s part of the equation. Even the formula is simple:
Same for on-base percentage. The percentage of plate appearances in which a player does not make an out. Obviously valuable, since outs are the game’s most important limited resource.I’ve never liked slugging percentage. Slugging percentage is the total number of bases a player hits for divided by the number of at-bats (average bases per at-bat). Because the numerator is a binary choice (the event either happened or it didn’t) for the first two metrics, they can be expressed as percentages. It’s tougher to do that with slugging percentage (despite its name), because the maximum would be 400% (a home run every at-bat) and the numerator can increase by 1, 2, 3, or 4, depending on the event. And I think that’s why it bothers me. It’d be like having a statistic for runs per game; that number really wouldn’t mean too much (obviously runs matter, but the rate statistic wouldn’t tell us anything).
The statistic “isolated slugging percentage” attempts to capture a player’s power only. It’s measured by subtracting batting average from slugging percentage. It’s valuable because, for example, there are many ways to have any given slugging percentage. In 10 at-bats, you could hit 5 singles or you could hit a home run and a single for a 0.500 slugging percentage. Same slugging percentage, different batting averages. ISO shows that. The units are still total bases per at-bat, which is an issue for me, and again, the number itself doesn’t mean anything. I’ve gone over and over this to make sure I define it correctly, so here I go: the numerator in ISO is the number of bases above one a batter achieves on each hit. Singles and outs are 0, doubles 1, triples 2, and home runs 3. Therefore, ISO groups singles and outs together, and I don’t like that. Dividing by the number of at-bats gives the number of “extra bases” a batter achieves per at-bat.
Instead of just complaining about it though, I tried to develop something that would make more sense. To do this I divided slugging percentage by batting average, making the units total bases per hit. For one, this makes the total number of events in the numerator and denominator the same. I’ve eliminated all at-bats that result in zero bases in my calculation. Because of this, the scale is similar to total bases: the minimum is 1.000 and the maximum is 4.000.
Next, I wanted to test to see how well it described a player’s value on offense. I plotted BA, OBP, SLG, ISO, and SLG/BA against wOBA (weighted on-base average), my favorite all-inclusive offensive statistic, for the 2010 season. wOBA is a statistic based on linear weights designed to measure a player’s overall offensive contributions per plate appearance. Using the observed run values of various offensive events from each player, (i.e. each single is worth 0.72 runs, each out is worth -0.28 runs), dividing by a player’s plate appearances, and scaling the result to the average on-base percentage results in wOBA. Here is the plot for batting average (I’ve also marked Jose Bautista’s spot on all the charts, since he’s a major outlier – I thought this might help make the point…of course, as we’re about to see, I’m dumb):
Here is the same plot three more times, against slugging percentage, isolated slugging percentage, and my new total bases per hit statistic.
The closer all of those points are to the line, the more the statistic correlates with wOBA. It’s easy to see from the plots that slugging percentage is the best, and isolated slugging percentage correlates fairly well. Here are the actual measures:
Damning evidence. For those without the statistics background, low numbers are bad. My statistic sucks at actually telling us how good a player is offensively. Important point here. I started writing this post in November. I had what you see above figured out two hours into writing, and then I was stuck. I couldn’t figure out how to make the statistic matter. And then recently I had a revelation. It matters because it makes sense. It doesn’t have to tell us anything about a player’s overall offensive value. The statistic itself tells us something, total bases per hit. And from there I kept going with what you see below. But first, I wanted to make sure I put this to bed. I checked the stats for 2008 to 2010, just to make sure 2010 wasn’t a weird year.
And 2010 was generous to me. My statistic is even more meaningless in 2008 and 2009. But from here on out, it will be my preferred measure of the damage done by a hitter. Until some smart commenter tells me why I’m wrong (which I assuredly am).
Now, instead of finding a better statistic about slugging that told me how good a hitter was, I found a better statistic about slugging. But this did not stop my search for an easily-calculated statistic that more perfectly aligns with wOBA. First attempt: slugging percentage, but with walks. Basically, the denominator becomes plate appearances, and all walks and hit-by-pitches becomes singles. It basically gives the batter credit for a base on walks. Here’s the formula:
I’ll show the results in a minute, but I also wanted to test what would happen to my statistic* if I included walks. Obviously, this gets away from the point of the measure (to better reflect slugging), but it’s possible that it could more accurately reflect a hitter’s value.
*I keep calling this “my statistic.” Someone tell me if this has been done before. I always feel like everything I create has been done before; if it made sense to me, it had to have made sense to someone years ago. Thanks for the ego check, everyone.
Formula and graph:
Completely and utterly useless. Summary:
BREAKING NEWS: Of these measures, slugging percentage and on-base percentage are most closely correlated with wOBA. Every other measure is markedly worse. My attempt to shake the baseball world = FAIL.
Over the next two weeks, Major League Baseball will announce the award winners for the 2010 season. There’s always a lot of debate about these awards; most of them, we’ll forget about quickly.
I don’t know enough about how much impact managers have and I don’t care enough about Gold Gloves to have too much of an opinion about those. But here would be my official ballots for the other awards. 3 spots on the ballot for rookies, 5 for pitchers, 10 for the MVP award. Without further ado:
American League Rookie of the Year
1. Austin Jackson
2. Brian Matusz
3. Danny Valencia
This crop of rookies isn’t great, but these guys had good seasons. The argument for Jackson centers around his 675 plate appearances and .293 batting average (helped by a .396 BABIP). Playing a good centerfield for an entire season puts him over Matusz for me, who went 10-12 with a 4.30 ERA in the very difficult AL East. He made 32 starts, pitched 175 innings, and is already Baltimore’s best starting pitcher. Valencia hit .311 at third base for the Twins and solidified a position that was killing them in the first half of the season. Valencia played just 85 games, which just isn’t enough playing time for me; it’s also my argument against Neftali Feliz, who might have been the best (and be the most talented) of all these guys, but who pitched just 69 innings to get his 40 saves. Whoever wins will be fine with me.
National League Rookie of the Year
1. Jason Heyward
2. Buster Posey
3. Jamie Garcia
Unlike the American League, the National League had a host of good rookies. In addition to the three above, Mike Stanton, Stephen Strasburg, Starlin Castro, Ike Davis, Alcides Escobar, Gaby Sanchez, Jhoulys Chacin, and Mike Leake all had promising starts to their careers. Jaime Garcia pitched 163 innings with a 2.70 ERA and still should finish a distant third. I’m not going to complain about either of the first two guys winning. Posey played the more difficult position and was the heart of his team’s offense. Heyward was slightly better, much younger (doing what he did at age 20 is very rare), and played the entire season. I know that Posey missing the first two months wasn’t his fault, but Heyward played, and played well, and thus gets the edge.
American League Cy Young
1. Felix Hernandez
2. Francisco Liriano
3. Cliff Lee
4. CC Sabathia
5. Justin Verlander
Apologies to David Price, Jered Weaver, and Jon Lester. Hernandez wins because he finished in the top three in the American League in the following categories: ERA, hits per 9 innings, WHIP, strikeouts, complete games, innings pitched, FIP, xFIP, and WAR. Yeah, that’ll do. He finished 13-12, which will be tough for some voters to choose over the 21-7 Sabathia, but he’s the only answer.
National League Cy Young
1. Roy Halladay
2. Adam Wainwright
3. Josh Johnson
4. Tim Lincecum
5. Ubaldo Jiminez
Hernandez’s season was good; Halladay’s was better. A 2.44 ERA in 250 innings, he struck out 219 hitters and walked only 30. He threw 9 complete games and 4 shutouts in 33 starts. He finished second in the NL in WHIP (1.04), third in ERA, and first in xFIP. All of the guys on this list were really good this season, but Halladay was the best. 219 K to 30 BB in 250 innings? Legit.
American League Most Valuable Player
1. Josh Hamilton
2. Evan Longoria
3. Miguel Cabrera
4. Jose Bautista
5. Adrian Beltre
6. Robinson Cano
7. Felix Hernandez
8. Joe Mauer
9. Cliff Lee
10. Shin-Soo Choo
Despite missing the final month of the season, Hamilton was clearly the best player in the American League this season. Playing 133 games is the only reason that this award wouldn’t be Hamilton’s, and no one else makes a really strong case. Hamilton can play defense (both left and center), run a little, and can flat-out mash.
National League Most Valuable Player
1. Joey Votto
2. Albert Pujols
3. Ryan Zimmerman
4. Carlos Gonzalez
5. Troy Tulowitzki
6. Roy Halladay
7. Matt Holliday
8. Adam Wainwright
9. Adrian Gonzalez
10. Aubrey Huff
Votto and Pujols had basically the same season. Look at the top two rows of this table. I’m going with Votto because Cincinnati was better. I’m generally not a big fan of “guy on better team should win MVP,” but these two guys are so close that it’s the only tiebreaker left. Also, I think it’s fair to note that this was the second worst season of Prince Albert’s career, and I’m putting him second on my fake MVP ballot.