Uncategorized | Knuckleballs

Archive for the ‘Uncategorized’ Category

Fun with WAR Projections

This was originally posted at the now-defunct Baseballin’ on a Budget on December 22, 2010.

Note: This is for fun. It’s an estimate. It’s December and the A’s haven’t done anything this week. There will be reactions to this that I don’t understand. “HOW COULD YOU SUGGEST KOUZ WILL ONLY WALK THAT MUCH?” “Brett Anderson WILL STRIKE OUT EVERYONE!” My estimates are wrong. They might even be really wrong. But this took me probably 30 minutes to do on FanGraphs, so if you really disagree or have 30 minutes to spare, I ask you only to try it for yourself. I bet our answers aren’t that different. Post your results in the comments and let’s have some civil discourse. Also, sorry for defending myself before you read anything. Maybe no one will care that much.

I love FanGraphs*. I love the amount of information. I love how many ways I can cut it up. I love that it can summarize a season in a number or two, but also let dig into all the minutiae I want. And they keep adding to it.

*I suppose it’s somewhat strange, given the name of the website, that one of the things I’m not in love with are the graphs. Not that I don’t find them useful. But I usually don’t go to FanGraphs looking for pictures. Just pictures that the numbers paint ZING! Ahh, digressions…

One of the features I like are the fans’ projections. I like seeing what a large of group of (presumably) knowledgeable fans think about particular players and teams. Of greater interest to me for this exercise though are the WAR results for each player that I project. Therefore, if I projected every player on a given roster, I should get an estimate of how good a team might be.

A couple of notes first. I was conservative about talent levels, or at least tried to be. Nothing too much better than what a player had accomplished previously. The only thing that I was a little aggressive on (which, as an A’s fan, might scare you) was playing time. I don’t think that I was too bad about it though. I also was very conservative with defense, mostly because I don’t fully trust the metrics yet. I rated no player more than 5 runs above or below average.

I got about 1,350 innings pitched by the entire team, and I estimated roughly 6,300 plate appearances. A typical major league team will pitch roughly 1,450 innings in a season and have about 6,100 plate appearances. I’m assuming that the remaining 100 innings will be replacement-level for the pitchers, and you can knock off somewhere in the neighborhood of 0.5 WAR for the extra plate appearances that will be given to replacement level players.

So without further ado, I bring you your 2011 Oakland Athletics (if you click on each graphic, you should be able to see larger versions):

(I HAVE NO IDEA WHERE THESE GRAPHICS WENT. SORRY)

Upon looking at the tables, I see where I had extra plate appearances. There’s about an extra 500 in the outfield and an extra 500 for designated hitters. If you want to make that adjustment, take away from Ryan Sweeney, Conor Jackson, and Chris Carter. I’ve also got 166 starts by pitchers in there, but there’s only so much I can do about that.

Other things to note from my predictions: Josh Willingham and Hideki Matsui, the two hitters the A’s signed last week, will be their best hitters. I like David DeJesus, but if he struggles adjusting to rightfield, he’s going to have to hit a lot to be an average rightfielder. And a re-birth for Kurt Suzuki? He really can’t be as bad as he was last season, and it’s not hard to be an average catcher.

I don’t trust Gio Gonzalez to stop walking people. Despite what we’ve said about the team BABIP possibly being sustainable with the defensive talent, I tried to bring all of the ERAs up to more reasonable levels (except for Andrew Bailey, who is, of course, dreamy). And I really like Brett Anderson. Too much.

In the tables above, the hitters account for 19.6 WAR (of which 1.3 is accumulated defensively). The total for the pitchers (which is conservative talent wise but aggressive playing time wise) is also 19.6 WAR. With 39.2 WAR, that leaves us with about an 85-win team, based on my back-calculations of replacement level last year.

That feels about right to me, particularly given my conservative estimates for the defense and pitching. Last season, this was an 81-win team that accumulated 35.6 WAR. To describe the WAR components in the chart below, the 2010 position players generally were average hitters, caught the ball really well, used the DH, and fielded a full lineup for every game. The starting pitchers were good and the bullpen was ok. (Side note: the 2010 Pittsburgh Pirates were really awful).

(I HAVE NO IDEA WHERE THESE GRAPHICS WENT. SORRY)

I wanted to do one last check on my predictions. I checked each individual’s WAR estimate above with their WAR totals from last season, pro-rated to match the playing time.

Clearly I’m bullish on the pitchers. The red above is where I’m fairly high on the WAR estimate (per playing time) and the yellow is where I’m low. As I said, I was conservative about defense, and having that drive so much of the positions players’ value, I was bound to have some lower estimates. The other thing I notice is that I’m expecting too much from the 5th starter candidates. They all could be good pitchers, but if Bobby Cramer, Tyson Ross, Brandon McCarthy, and Josh Outman combine for 2 WAR, that’s probably pretty good.

As cautiously optimistic (passively pessimistic?) fans, this team has improved this offseason, but mostly on the margins. A half-win here, a half-win there. Those matter, don’t get me wrong. And it appears they’ll have every shot starting in April. Texas hasn’t added anything yet (though they’ll still be good), Anaheim’s big spending so far was on a guy who will pitch 70 innings, and Seattle stinks.

But this hasn’t become a juggernaut overnight, nor do I think anyone was/is expecting it to. If the A’s are going to make the playoffs this year, it’s probably going to involve a painful regular season (see: Giants, San Francisco, 2010). That’s better than a painless regular season though, because at least it matters.

Written by Dan Hennessey

November 17, 2011 at 7:26 PM

Posted in Uncategorized

Expectations Rising in Cleveland

Infield Fly Balls and the Coliseum

with one comment

In his most recent entry, the acclaimed sabermatrician Paapfly does a lot of research regarding what makes Matt Cain good. The whole entry is worth a read, but one part caught my eye:

Season in and season out, Cain gives up a lesser percentage of home runs on fly balls than the average pitcher, both at home and away. Why? I can’t say with confidence, but Dave Pinto at Baseball Musings has come up with a theory that’s as good as any I’ve read: he surmised that Cain’s fastball is dropping less than the hitter expects, resulting in the ball being struck somewhere below the center of the ball, and either going straight into the air (as his 12.9% career infield fly ball percentage (IFFB%) might suggest) or at least somewhere inside the park*.

*Quick aside that I had not considered at any point until NOW: wouldn’t Oakland be ideal for him, given the enormous foul ground territory? *Checks* In a small sample, granted, he’s pitched to a 1.16 ERA in 3 starts, his best ERA at any park where he’s had two or more starts. This could be coincidence, but I’m intrigued.

Do more infield pop ups matter more for pitchers in Oakland? I too was intrigued, so I tried to look it up. Not for Matt Cain. But for every Oakland pitcher to pitch during the period for which we have the data.

Batted ball data has been recorded for nine seasons (2002-2010), creating a sample of 99 pitchers and 190 pitcher-seasons. For all these pitchers, I created a database with about 40 columns, with everything from wins and losses to balls and strikes thrown to FIP and tERA. That might seem like a lot of data, but after it’s parsed, there’s very little left.

Another item to note is that the batted balls are recorded only when they are hits or outs. Many foul balls that are caught in Oakland go unnoticed in other ballparks and result in something else (or another foul ball). There should be more fly balls recorded in Oakland because of these additional outs.

Of the 99 pitchers, only 38 of them threw at least 100 innings for the Athletics (I only looked at time with the Athletics for this study), allowing for roughly 50 innings in the Coliseum and 50 on the road. If I further parse the data, only 15 pitchers threw both 100 innings at home and on the road while with the A’s in the last nine seasons.

At first, I checked to see if there was any correlation between infield fly ball percentage (IFFB%) and ERA, RA, BABIP, FIP, xFIP, and ERA-FIP using the home and road stats. More infield pop ups would lead to more outs thanks to the additional foul territory.

I used all three sets of pitchers (all 99, the 38, and the 15); if the theory has any weight, we would expect to see a negative correlation with the first three statistics (they go down as IFFB% goes up) for the home data, and there should be no correlation for the road data. FIP and xFIP are theoretically supposed to normalize batted ball noise, and I’m really not sure what to expect from those, though I suppose there should be no difference. If ERAs are lower though, then ERA-FIP should also be lower, and that would have a positive correlation.

Here are a few sets of results:

HOME	N	ERA	R/9	BABIP	FIP	XFIP	E-F
ALL	99	-0.078	-0.041	-0.127	-0.031	-0.006	-0.064
100+ IP	38	-0.049	-0.094	-0.175	-0.123	-0.196	0.066
200+ IP	15	0.110	0.203	-0.100	-0.035	0.032	0.174

AWAY	N	ERA	R/9	BABIP	FIP	XFIP	E-F
ALL	99	-0.166	-0.143	0.010	0.201	0.056	-0.244
100+ IP	38	-0.031	-0.044	-0.305	0.026	0.162	-0.071
200+ IP	15	-0.279	-0.279	-0.134	-0.500	-0.088	0.094

I could analyze them, but I realize I made a mistake. IFFB% is the percentage of fly balls that are infield fly balls. What I really needed was the percentage of batted balls that are infield fly balls. To get that I multiplied IFFB% by the fly ball percentage (FB%), to get what I called TOTAL_IFFB%. Here are those results:

HOME	N	ERA	R/9	BABIP	FIP	XFIP	E-F
ALL	99	-0.127	-0.081	-0.166	-0.021	0.010	-0.130
100+ IP	38	-0.019	-0.086	-0.210	-0.062	-0.051	0.041
200+ IP	15	0.027	0.120	-0.280	-0.035	0.085	0.077

AWAY	N	ERA	R/9	BABIP	FIP	XFIP	E-F
ALL	99	-0.164	-0.132	0.017	0.152	0.104	-0.223
100+ IP	38	-0.096	-0.131	-0.434	0.014	0.281	-0.153
200+ IP	15	-0.257	-0.272	-0.210	-0.423	0.058	0.048

I also did R-squared tests with the same data:

HOME	N	ERA	R/9	BABIP	FIP	XFIP	E-F
ALL	99	0.016	0.007	0.027	0.000	0.000	0.017
100+ IP	38	0.000	0.007	0.044	0.004	0.003	0.002
200+ IP	15	0.001	0.014	0.078	0.001	0.007	0.006

AWAY	N	ERA	R/9	BABIP	FIP	XFIP	E-F
ALL	99	0.027	0.017	0.000	0.023	0.011	0.050
100+ IP	38	0.009	0.017	0.188	0.000	0.079	0.023
200+ IP	15	0.066	0.074	0.044	0.179	0.003	0.002

I would call it “mixed results,” and there aren’t any real patterns, at least not that I can see. More importantly, this still isn’t what I really want. What I want to know if having a higher TOTAL_IFFB% means something in the Coliseum. For this I divided the pitchers into three groups: high TOTAL_IFFB% (above 6%), medium TOTAL_IFFB% (between 3 and 6%), and low TOTAL_IFFB% (below 3%). For reference, Cain’s career TOTAL_IFFB% is 5.9%. Here are some results:

HOME	N	ERA	R/9	K/9	BB/9	HR/9	FB%	BABIP
>6%	28	3.87	4.25	6.91	3.32	1.04	44.4%	0.273
3-6%	35	3.56	3.90	6.90	3.04	0.83	37.2%	0.277
<3%	36	3.56	3.90	5.54	2.99	0.72	28.9%	0.281
ALL	99	3.62	3.97	6.48	3.08	0.84	36.1%	0.277

As a control, here is the same test for the groups on the road (grouped again by Home TOTAL_IFFB%):

AWAY	N	ERA	R/9	K/9	BB/9	HR/9	FB%	BABIP
>6%	28	4.17	4.44	7.19	3.79	1.12	42.9%	0.281
3-6%	35	4.16	4.53	6.89	3.35	0.96	37.0%	0.295
<3%	36	4.66	5.15	5.56	3.22	1.02	28.8%	0.305
ALL	99	4.31	4.69	6.57	3.41	1.01	35.9%	0.295

I included K/9, BB/9, and HR/9 for each group to determine if maybe one set of pitchers was simply inferior to the others. All of these statistics are weighted by the innings pitched by each pitcher, so there’s a lot more Barry Zito in the >6% than Seth Etherton. There are approximately as many innings in the 3-6% group as in the >6% and <3% groups combined.

Obviously there is less scoring in the Coliseum because it is harder to hit home runs, but we also see a decrease in walks (perhaps because pitchers are less worried about mistakes).

Quick aside and important point: holy crap does the Coliseum depress BABIP. And it has to be the Coliseum, because the defenses, pitchers, and hitters should be relatively the same in both sets. I checked and checked and checked this, and I’m pretty sure it’s right. I’m not sure where I would have messed that up. Plus everything else makes sense.

The pitchers combined to have a 4.03% TOTAL_IFFB% at home and a 3.95% TOTAL_IFFB% on the road. The 0.08% difference is for roughly 20,100 batted balls at home and 19,500 batted balls on the road. The tenth of a percentile difference, when applied to the 20,000 batted balls, is roughly 20. 2 per season. Seems so wrong. Then I found this:

One of the more interesting effects I found is that parks have a strong affect [sic] on the proportion of batted balls that are infield flies.
Team            FlysIF
Brewers         1.15
Mariners        1.12
Marlins         1.12
Reds            1.08
Devil Rays      1.07
---
Phillies        0.93
Royals          0.93
Indians         0.92
Diamondbacks    0.92
Giants          0.90
A player is 28 percent more likely to hit an infield fly in Milwaukee than he is in San Francisco. Why is that? My guess is that it has to do with foul territory. Since infield flies are only recorded when the ball is put into play, parks with a lot of foul territory are more likely to see foul pop-ups stay in and get caught, whereas ballparks with little foul territory will see a lot of pop-ups land in the stands and go unrecorded.

The problem with that theory is that while the parks at the bottom of the list do tend to be a little smaller in terms of foul territory, those at the top seem to be pretty average on the whole. Perhaps my data source is off, but there may be some other variable I’m not thinking of.

The A’s aren’t on either side of the list. While a couple of years old, it hasn’t changed. The park factor for infield flies in the Coliseum is close to neutral. I have to agree with David’s last sentence: “Perhaps my data source is off, but there may be some other variable I’m not thinking of.”

There’s nothing there to suggest that getting more infield fly balls makes you a better pitcher in the Coliseum. Matt Cain’s ability to limit home runs is what would continue to make him a very good pitcher in Oakland. However, there’s so much bias in the manner that I carved up the data that I can’t be sure it’s correct. Would love some reviewers to take a look…

AWAY	N	BABIP	E-F	FIP	XFIP	ERA	R/9
ALL	99	0.010	-0.244	0.201	0.056	-0.166	-0.143
100+ IP	38	-0.305	-0.071	0.026	0.162	-0.031	-0.044
200+ IP	15	-0.134	0.094	-0.500	-0.088	-0.279	-0.279

Written by Dan Hennessey

February 2, 2011 at 9:54 PM

Posted in Uncategorized

On Slugging Percentage

with one comment

Despite losing importance in the evaluation of players, I like batting average. I like it because it describes a series of events in definitive and simple terms: in what percentage of at-bats does a hitter get a hit? I understand that it doesn’t tell me much about a hitter’s value offensively. But it’s part of the equation. Even the formula is simple:

Same for on-base percentage. The percentage of plate appearances in which a player does not make an out. Obviously valuable, since outs are the game’s most important limited resource.I’ve never liked slugging percentage. Slugging percentage is the total number of bases a player hits for divided by the number of at-bats (average bases per at-bat). Because the numerator is a binary choice (the event either happened or it didn’t) for the first two metrics, they can be expressed as percentages. It’s tougher to do that with slugging percentage (despite its name), because the maximum would be 400% (a home run every at-bat) and the numerator can increase by 1, 2, 3, or 4, depending on the event. And I think that’s why it bothers me. It’d be like having a statistic for runs per game; that number really wouldn’t mean too much (obviously runs matter, but the rate statistic wouldn’t tell us anything).

The statistic “isolated slugging percentage” attempts to capture a player’s power only. It’s measured by subtracting batting average from slugging percentage. It’s valuable because, for example, there are many ways to have any given slugging percentage. In 10 at-bats, you could hit 5 singles or you could hit a home run and a single for a 0.500 slugging percentage. Same slugging percentage, different batting averages. ISO shows that. The units are still total bases per at-bat, which is an issue for me, and again, the number itself doesn’t mean anything. I’ve gone over and over this to make sure I define it correctly, so here I go: the numerator in ISO is the number of bases above one a batter achieves on each hit. Singles and outs are 0, doubles 1, triples 2, and home runs 3. Therefore, ISO groups singles and outs together, and I don’t like that. Dividing by the number of at-bats gives the number of “extra bases” a batter achieves per at-bat.

Instead of just complaining about it though, I tried to develop something that would make more sense. To do this I divided slugging percentage by batting average, making the units total bases per hit. For one, this makes the total number of events in the numerator and denominator the same. I’ve eliminated all at-bats that result in zero bases in my calculation. Because of this, the scale is similar to total bases: the minimum is 1.000 and the maximum is 4.000.

Next, I wanted to test to see how well it described a player’s value on offense. I plotted BA, OBP, SLG, ISO, and SLG/BA against wOBA (weighted on-base average), my favorite all-inclusive offensive statistic, for the 2010 season. wOBA is a statistic based on linear weights designed to measure a player’s overall offensive contributions per plate appearance. Using the observed run values of various offensive events from each player, (i.e. each single is worth 0.72 runs, each out is worth -0.28 runs), dividing by a player’s plate appearances, and scaling the result to the average on-base percentage results in wOBA. Here is the plot for batting average (I’ve also marked Jose Bautista’s spot on all the charts, since he’s a major outlier – I thought this might help make the point…of course, as we’re about to see, I’m dumb):

Here is the same plot three more times, against slugging percentage, isolated slugging percentage, and my new total bases per hit statistic.

The closer all of those points are to the line, the more the statistic correlates with wOBA. It’s easy to see from the plots that slugging percentage is the best, and isolated slugging percentage correlates fairly well. Here are the actual measures:

Damning evidence. For those without the statistics background, low numbers are bad. My statistic sucks at actually telling us how good a player is offensively. Important point here. I started writing this post in November. I had what you see above figured out two hours into writing, and then I was stuck. I couldn’t figure out how to make the statistic matter. And then recently I had a revelation. It matters because it makes sense. It doesn’t have to tell us anything about a player’s overall offensive value. The statistic itself tells us something, total bases per hit. And from there I kept going with what you see below. But first, I wanted to make sure I put this to bed. I checked the stats for 2008 to 2010, just to make sure 2010 wasn’t a weird year.

And 2010 was generous to me. My statistic is even more meaningless in 2008 and 2009. But from here on out, it will be my preferred measure of the damage done by a hitter. Until some smart commenter tells me why I’m wrong (which I assuredly am).

Now, instead of finding a better statistic about slugging that told me how good a hitter was, I found a better statistic about slugging. But this did not stop my search for an easily-calculated statistic that more perfectly aligns with wOBA. First attempt: slugging percentage, but with walks. Basically, the denominator becomes plate appearances, and all walks and hit-by-pitches becomes singles. It basically gives the batter credit for a base on walks. Here’s the formula:

And the graph:

I’ll show the results in a minute, but I also wanted to test what would happen to my statistic* if I included walks. Obviously, this gets away from the point of the measure (to better reflect slugging), but it’s possible that it could more accurately reflect a hitter’s value.

*I keep calling this “my statistic.” Someone tell me if this has been done before. I always feel like everything I create has been done before; if it made sense to me, it had to have made sense to someone years ago. Thanks for the ego check, everyone.

Formula and graph:

Completely and utterly useless. Summary:

BREAKING NEWS: Of these measures, slugging percentage and on-base percentage are most closely correlated with wOBA. Every other measure is markedly worse. My attempt to shake the baseball world = FAIL.

Written by Dan Hennessey

November 24, 2010 at 4:58 PM

Posted in Uncategorized

Knuckleballs

Archive for the ‘Uncategorized’ Category

Fun with WAR Projections

Expectations Rising in Cleveland

Infield Fly Balls and the Coliseum

On Slugging Percentage

AL and NL Awards

Pages

Email Subscription

Blog Stats

Other Great Baseball Sites

Twitter

Archives

Top Rated