Feb 22, 2012

2011 Projections Flashback - Predicting Hitter Stats




Your fantasy baseball draft preparation will only be as good as the projections that you are working from because good input equals good output. There are a multitude of great projection systems out there right now but which one should you choose? Let's look at last year's results to help answer that question.

Disclaimer: For my casual baseball fans, this is going to be a lengthy data-heavy article so feel free to sit this one out. For my analytical friends, keep in mind that I'm not a statistician and there are likely better methods to use than I chose here. For the people in between, enjoy!

Step 1
I gathered data from 7 different 2011 projections and eliminated the players that weren't shared by all of them. I also averaged the projections from the 4 free options (Marcel, Steamer, ZiPS, and Cairo) to create an 8th projection as well. The systems were:
  • Marcel: The most basic forecasting system around - it takes three years of player data, weights the most recent years heaviest and regresses the players towards a mean (age factor included)
  • Oliver: Similar concept to Marcel with a few wrinkles about how minor league stats are calculated (park/league factors included)
  • PECOTA: A bit more complicated in that it finds comparable players to each projected player and bases the projections on the history of those comparable players
  • Roto Value: Projects a stat, does age regression and then historical skill regression
  • Steamer: I haven't found a full explanation of their current model but historically it takes three years of player data and regresses certain stats more heavily than others without an aging factor
  • ZiPS: Does a little of the weighted regression like Marcel but for four years and does a bit of the comparable player regression based on aging trends
  • Cairo: It's like Marcel but with more bells and whistles (stat-specific regression and position-specific regression for instance)
  • MSZC: Averaging the four free projections for a player into one projection
For fantasy purposes, I eliminated players who ended up with less than 300 AB in 2011 as a majority of them would be part-time players that weren’t necessarily relevant to projecting for fantasy baseball. There were 214 players left over at this point.

Step 2
From those 8 projections and actual 2011 stats, I only included the 5 stats that are most commonly used for fantasy baseball hitters: AVG, HR, R, RBI and SB. As a separate sixth stat, I converted each of those stats for a player to a z-score and then added the five z-scores to create a “total 5x5 value” which gave us 6 points of comparison between the projections and the results.

Step 3
I threw all of the projections into a magical machine, which spat out the correlation to 2011 results (r-value) and the root mean squared error (RMSE). The correlation value helps us see how well the systems ranked the players in each statistic. RMSE will show us badly the projections missed the mark on their projections (with larger errors receiving extra punishment). Basically, in this instance, larger correlation values are better while smaller error values (RMSE) are ideal.

Results – Actual 2011 Stats vs. 2011 Projections (300+ AB)
Correlation
AVG
Rk
HR
Rk
Runs
Rk
RBI
Rk
SB
Rk
5x5
Rk
Marcel
0.42
6
0.71
7
0.50
7
0.57
7
0.78
6
0.51
8
Oliver
0.47
4
0.73
5
0.53
5
0.64
3
0.81
2
0.57
3
PECOTA
0.49
2
0.73
4
0.49
8
0.61
6
0.80
5
0.57
4
RotoValue
0.41
8
0.69
8
0.51
6
0.57
8
0.70
7
0.53
7
Steamer
0.45
5
0.74
3
0.59
2
0.67
1
0.81
1
0.60
1
ZiPS
0.49
1
0.75
1
0.57
3
0.63
4
0.63
8
0.56
5
Cairo
0.42
7
0.72
6
0.55
4
0.61
5
0.81
4
0.55
6
MSZC
0.47
3
0.74
2
0.61
1
0.66
2
0.81
3
0.58
2
RMSE
AVG
Rk
HR
Rk
Runs
Rk
RBI
Rk
SB
Rk
5x5
Rk
Marcel
.026
6
6.94
6
19.1
4
20.5
6
7.1
5
3.38
7
Oliver
.026
4
7.04
7
19.8
6
20.7
7
6.8
3
3.26
4
PECOTA
.025
1
6.82
4
19.9
7
19.8
4
7.1
4
3.26
3
RotoValue
.030
8
7.61
8
22.8
8
24.1
8
8.4
7
3.39
8
Steamer
.026
5
6.72
3
19.0
3
19.4
2
6.7
1
3.16
1
ZiPS
.025
2
6.50
1
18.7
2
19.5
3
11.4
8
3.26
5
Cairo
.027
7
6.88
5
19.2
5
20.2
5
6.7
2
3.30
6
MSZC
.026
3
6.50
2
17.3
1
18.5
1
7.1
6
3.16
2
In terms of correlation, Steamer did quite well across the board here while RotoValue, Cairo and Marcel lagged behind. ZiPS did great with the exception of stolen bases which were so bad that they also hurt the correlation to the 5x5 total roto value stat.

When factoring in the frequency and size of the errors, Steamer and the combination of free projections seem to be kicking the most butt thus far. Towards the end here, we’ll come up with a definitive ranking. But, first, there’s more work to do…

Step 4
Comparing projections to actual results brings back some good information. However, it should be noted that forecasters tend to start by projecting base stats and then adjusting for playing time at the end. We've already compared to that final result but I also want to know how well each system does before playing time is factored in. So, I took all of the projections and actual stats for each player and adjusted them onto the same 500 AB scale (though it could be any amount and the results would be the same). Would the projections change? Are some projections good at predicting player output but not as good with getting playing time correct?

Results – Adjusted 2011 Stats vs. Adj. 2011 Projections (300+ AB)
Correlation
AVG
Rk
HR
Rk
Runs
Rk
RBI
Rk
SB
Rk
5x5
Rk
Marcel
0.42
6
0.77
6
0.59
3
0.65
7
0.82
6
0.58
5
Oliver
0.47
3
0.78
3
0.61
2
0.70
2
0.83
5
0.63
1
PECOTA
0.49
2
0.78
4
0.46
8
0.68
4
0.83
4
0.62
3
RotoValue
0.41
8
0.74
8
0.56
7
0.61
8
0.79
7
0.58
6
Steamer
0.45
5
0.78
1
0.58
5
0.71
1
0.84
2
0.62
2
ZiPS
0.49
1
0.78
5
0.59
4
0.67
5
0.66
8
0.57
7
Cairo
0.42
7
0.77
7
0.57
6
0.66
6
0.84
1
0.57
8
MSZC
0.47
4
0.78
2
0.63
1
0.69
3
0.84
3
0.61
4
RMSE
AVG
Rk
HR
Rk
Runs
Rk
RBI
Rk
SB
Rk
5x5
Rk
Marcel
.026
6
5.65
4
11.3
3
14.6
5
6.1
4
2.71
5
Oliver
.026
4
5.73
7
11.0
1
14.0
4
6.2
5
2.71
4
PECOTA
.025
1
5.65
5
12.4
8
14.0
3
6.1
3
2.71
3
RotoValue
.030
8
6.05
8
12.2
7
16.0
8
6.7
7
2.79
7
Steamer
.026
5
5.51
2
11.8
5
13.5
1
6.0
2
2.65
1
ZiPS
.025
2
5.63
3
11.8
4
14.8
6
11.1
8
2.74
6
Cairo
.027
7
5.70
6
12.2
6
14.8
7
5.8
1
2.82
8
MSZC
.026
3
5.51
1
11.2
2
13.9
2
6.4
6
2.65
2
The results are somewhat similar to what we saw from the results with playing time included except it seems that Oliver seems to improve quite a bit in this scenario. But, let's break this down and see who the actual winners are...

Step 5
We have a ton of funky numbers on all sorts of different scales and we still don't have an answer on which system does the best for fantasy baseball hitters. If I were to add up the rankings for each projection, we would have an answer but it wouldn't recognize those times when 1st, 2nd and 3rd were a virtual tie and when last place was far, far behind the others. To account for that, I converted the rankings to standardized z-scores to show how far above or below average each projection was for each stat. So, in comparison to the actual 2011 statistics (playing time included), here are the overall results for correlation, RMSE and the combination of the two:

Correlate AVG HR Runs RBI SB 5x5 Corr. Total
MSZC 0.5 0.8 1.5 1.0 0.6 1.0 5.3
Steamer -0.1 0.7 1.2 1.3 0.7 1.3 5.1
PECOTA 1.1 0.3 -1.1 -0.2 0.4 0.4 1.0
Oliver 0.5 0.3 -0.3 0.5 0.7 0.5 2.2
ZiPS 1.2 1.1 0.5 0.4 -2.1 -0.1 1.1
Cairo -1.0 -0.5 0.1 -0.2 0.6 -0.5 -1.5
Marcel -0.9 -1.0 -1.0 -1.4 0.2 -1.6 -5.7
RotoValue -1.3 -1.8 -0.8 -1.4 -1.0 -1.0 -7.5
RMSE AVG HR Runs RBI SB 5x5 RMSE Total
MSZC 0.5 1.1 1.4 1.1 0.3 1.2 4.4
Steamer 0.2 0.4 0.3 0.6 0.6 1.3 2.1
PECOTA 0.8 0.2 -0.3 0.3 0.4 0.2 1.3
Oliver 0.3 -0.5 -0.2 -0.2 0.5 0.2 .0
ZiPS 0.8 1.1 0.5 0.5 -2.3 0.2 0.5
Cairo -0.2 .0 0.2 0.1 0.6 -0.4 0.7
Marcel -0.1 -0.2 0.3 -0.1 0.3 -1.3 0.3
RotoValue -2.3 -2.1 -2.1 -2.3 -0.5 -1.4 -9.2
All
MSZC 9.7
Steamer 7.2
PECOTA 2.3
Oliver 2.2
ZiPS 1.6
Cairo -0.8
Marcel -5.4
RotoValue -16.7
The combination of the free projections is the winner here mainly because of how much better those projections are at minimizing the size of the errors as seen by that great RMSE z-score. That shouldn't be all too surprising since any extreme projection is brought closer to normal when projections are combined with each other. It takes some of the crazier data and brings it all closer to a safe middle ground.

Now, when we look at the results that remove playing time from the equation, the rankings end up shifting around quite a bit with Oliver and Marcel taking huge leaps while ZiPS takes a huge drop:

300 Adj. Corr. Total RMSE Total All
MSZC
4.0
3.5
7.5
Oliver
4.3
2.3
6.6
Steamer
3.5
2.6
6.2
PECOTA
0.9
0.7
1.7
Marcel
-2.0
1.1
-0.9
ZiPS
-1.5
-1.9
-3.4
Cairo
-2.5
-1.1
-3.6
RotoValue
-6.8
-7.4
-14.2
When all is said and done, Steamer handily wins when it comes to actual results yet Oliver narrowly wins when playing time isn't factored in. However, neither can beat the power of a simple combined projection system in this experiment.

Step 6
Maybe I picked the wrong amount of AB’s to use as my filter though. Perhaps, if I included players with less playing time then the results would shift around. Well, let's see! I ran the same experiment to include all shared players above 100 AB in 2011 (321 of them). Here is what the final z-score rankings were in that case:

100 ActualCorr. TotalRMSE TotalAll
MSZC
5.5
4.0
9.6
Steamer
5.0
2.0
7.0
PECOTA
1.7
2.4
4.1
Oliver
2.9
0.8
3.7
Marcel
-0.6
3.5
3.0
Cairo
-1.5
-1.3
-2.8
ZiPS
-1.7
-3.1
-4.7
Roto Value
-11.3
-8.5
-19.8
100 Adj. Corr. Total RMSE Total All
MSZC
3.8
2.5
6.2
Oliver
3.6
2.1
5.8
PECOTA
2.0
2.9
4.9
Steamer
2.7
2.1
4.7
Cairo
0.4
0.3
0.7
Marcel
-0.9
1.0
0.1
ZiPS
-1.4
-2.4
-3.7
RotoValue
-10.2
-8.5
-18.7
The gaps aren't quite as wide but the standings are similar with Steamer doing well when it comes to actual results but Oliver doing better when playing time isn't a factor. However, it should be noted that Marcel does markedly better here when taking into account these players who got less playing time.

When it comes to 2011 forecasts for hitters for fantasy baseball purposes, Steamer gets the gold medal with Oliver and PECOTA getting silver and bronzes. Despite that, I still bow down to the power of combining projections to help reduce the size of any projection errors.

Filed Under:

2 comments:

  1. The brand new 2012 version of Steamer Projections are now available, btw, for free at www.steamerprojections.com

    ReplyDelete
  2. Thank ya, Dash. Unfortunately, only the pitchers are posted at this time but he says the hitters should be up by the end of the week too. At that time, I'll get them added in to the cheatsheets here!

    ReplyDelete