Brian Sabean’s “Speed” Kool-Aid
8 March 2008From the first time I heard Brian Sabean state that he’s rebuilding the Giants based on “pitching, speed, and defense”, I assumed something was amiss. To me, Sabean’s assertion sounded a little bit too much like the recently dumped boyfriend who insists his new found, lonely, bachelorhood is a good thing. Just because the Giants currently have some pitching, speed, and defense, doesn’t mean that’s what a successful team should base its foundation upon.
Since “speed” is the only factor Sabean mentioned that plays any part in scoring runs, I want to examine it first. Now, it’s been a long time since I’ve dusted off the old data-analysis tool in Excel, but this snowy Saturday afternoon is the perfect opportunity.
What follows is a simple regression where Y (the dependent variable) is team runs scored and the two independent variables are:
X1 = Times on base (hits+bb+hbp)
X2 = Stolen base attempts
Using a fairly large sample size (the 1048 teams in MLB since 1969), a team’s total runs scored in a season (Y) can be predicted by the following equation:
Y = -252.045 + 0.493 X1 – 0.105 X2
Or
Runs Scored = -252.045 + 0.493 (H+BB+HBP) – 0.105 (SB attempts)
My R-squared (or the percent of variation in runs scored due to the variation in X1 and X2) is 79.49%, which means my two independent variables account for nearly 80% of the variation in runs scored. Not too shabby. Give it a try… plug in full season H+BB+HBP and SB attempts for a team and see how close the outcome comes to the team’s total runs scored.
This is a very, very simple example and definitely makes some questionable assumptions… but what it does allow me to say is:
If you assume that runs scored are determined entirely by (1) # of times reaching base and (2) # of attempted steals, then reaching base affects runs scored positively and stolen base attempts affect runs scored negatively.
For you stat-geeks, yes, I can make the previous assertion because the 95% confidence interval for the coefficient on X2 is [-0.157, -0.053].
Now, my 80% R-squared means that 20% of a team’s total runs scored are affected by something other than reaching base, and stolen base attempts (I’d bet power has a lot to do with it). But it’s doubtful that what lies in the other 20% will change the negative effect that attempted steals have here into something significantly positive.
What does that mean? To put it simply: An offensive strategy based entirely on “speed” is not a good idea, and I’m not drinking Sabean’s crazy Kool-Aid.