How We Developed Our 2015 Projections
By Tanner Bell
Sep 1, 2014; Atlanta, GA, USA; Atlanta Braves right fielder Jason Heyward (22) steals third base against the Philadelphia Phillies in the first inning at Turner Field. Mandatory Credit: Brett Davis-USA TODAY Sports
We are about to begin releasing our team previews for the upcoming 2015 season and each team preview will include projected starting lineups and a full projection for each player.
If you are like me, when you see a projection, you are probably wondering, “How do I know if I can trust this projection?” There are likely hundreds of fantasy baseball sites littering the landscape of the web and each has their own set of projections and rankings. But few will share the approach they use to come up with their projections.
We are about to embark on a six-month long fantasy baseball season. You have bragging rights and probably at least a few dollars on the line here. Do you really want to rely on a set of projections when you do not understand how they were developed?
I believe in transparency. What follows is a detailed look at the methodology used to develop the 2015 projections you will see on the site.
Inputs Into the Projections
At a high level, here are the ingredients used in developing our projections:
- Three-year averages of component stats like K%, BB%, batted ball percentages, and more.
- Regression
- Human judgment
Let’s take a closer look at each.
1. Three-Year Averages of Component Stats
Baseball is a series of big events that can be broken down into smaller, easier events to predict outcomes. Let’s start with a home run, for example. We could just take a simple approach at a player who has his 20, 26, and 23 home runs the past three seasons, average those out, and estimate he will hit 23 again. But that is awfully simplistic.
I think it makes a lot more sense to estimate how many times he will come to the plate this season by taking his age, spot in the batting order, and injury history into account. Out of all those plate appearances, we can look at the hitter’s K% and BB% in recent seasons to estimate how many strikeouts and walks he will have. If he does not strikeout or walk, then he will put the ball into play. Then after we know that, we can use his batted ball history for the past three seasons and estimate how many fly balls he will hit.
After we have an estimate of how many fly balls he will hit, we can also use the player’s history to determine the percentage of his fly balls that become home runs.
We can more accurately predict home runs by breaking things down and looking at the player’s skills. This will help us to minimize the impact of “luck” that might be included in a home run total.
These same principles can be used to project batting average and stolen bases for hitters and to project strikeouts, wins, ERA, and WHIP for pitchers.
How did you project runs and RBI’s you ask?
2. Regression
“Regression to the mean” or “regression to league average” is a staple principle in many of the most well recognized projection models out there (PECOTA, Steamer, Marcel). I will do my best to give an explanation if you are not familiar with the concept.
Until we have a history of performance for a given player, we should assume that a player is only “league average”. As we obtain more Major League statistics for a player we begin to assume their future performance will more closely mirror their past performance.
The more Major League history we have for the player, the more and more credit we give and the less we assume they are just “league average.” Eventually, a player may have three full seasons of Major League data and we can strongly rely on that and only include a small component assuming they will return to “league average”.
There is math to support all of this and Marcel (a projection system based purely on regression and nothing else) performs admirably when stacked up against much more complex models.
I am yet to find an objective method of projecting R and RBI and there are no component stats that break R and RBI down into more digestible parts (like the HR example above). So in my mind, the best way to estimate runs and RBI’s was to take a pure regression approach to the last three years of actual results.
To do this, I took each player’s raw run and RBI totals and converted them into per plate appearance numbers. These per PA figures were then run through the Marcel calculation method, which calculates a weighted average of the last three years. Then, depending on how much MLB experience the player has, it regresses that weighted average towards the MLB average.
This method makes me a lot more comfortable than simply trying to guess how many runs Billy Hamilton is going to score this season. Over time, Marcel has proven to be an accurate projection system, and I have no reason to believe I can more accurately project stats like R and RBI.
A special note on regression and averaging in past seasons: this leads to devaluing of players that had strong “breakout” seasons last year such as Devin Mesoraco, Hector Rendon, or Jake Arrieta. Before 2014, Devin Mesoraco had hit 16 Major League home runs in 175 games. He hit 25 HR in 114 games last season. From past experience, we know that it would be a mistake to project him to continue to hit home runs at his 2014 rate. We are much better off to assume some kind of an average of his past three seasons.
3. Human Judgment
Let me tell you some more about bullet number one “Three-Year Averages” from above. While looking at a player’s recent history, I look at each player one at a time. It takes a long time to do this, but looking at each player individually allows me to spot trends and oddities that a simple average cannot.
That is what I mean by human judgment, or “human touch.”
The biggest element of the human touch comes in projecting a player’s playing time. Take Jason Heyward for example. If we take an average of his games played for the last three years of 158, 104, and 149, then we would come up with an estimate of 137 games played. But if you inject the human element into your projection, then you would realize that he missed time because of an appendectomy (missed nearly one month) and a broken jaw when being hit by a pitch (again, nearly a month).
Should a fluke health problem and a fluke hit-by-pitch really be held against him to the same extent as if he had experienced recurring hamstring injuries? I do not think so. I think it is highly unlikely that he will experience another one of these freak events (especially since his appendix is already removed), so I might project him for 155 games played.
I think there is an argument to be made that estimating playing time is one of the most important elements in the projection process, and I believe that it is the component that humans can most accurately do.
We can take a look at depth charts, player history, playing time battles, and Minor Leaguers on the verge of getting called up and incorporate all of that into our estimate of playing time. A projection model based only on three-year averages and regression to the mean would miss out on some of this.
Playing time matters. If you take 50-100 PAs from a player’s projection, it makes an 8-16% difference in all their statistics.
You might have also noticed that I have not mentioned saves yet. And that is because there is no way to do it! I am not proud of the method I use to project saves. It is 100% human touch.
“This guy is a reliable closer that’s firmly entrenched in the job on a good team, so I will give him 35 saves.” Or, “This guy is a strong reliever, but he does not have a proven track record and there are two other solid candidates to close. There is a decent probability he could lose the job at some point, so I will put him down for 20 saves.″ It is not pretty, but you deserve to know where it came from.
There You Have It.
This process is truly an amalgamation of several different projection approaches, but I feel it’s a strong one with merit.
A Couple More Notes
The projected depth charts and playing time projections were developed as of late-December. If there’s been a trade or signing since then, I’ve tried to incorporate it, but I might have missed something. I relied upon RosterResource.com for many of the projected batting orders, but I did make some small adjustments for situations I disagreed with (there are only a few of those).
Want to Make Your Own Projections?
If you’re interested in following this process to make your own projections, you can do this! It’s easier than you might think, and the process of developing your own projections will help take your fantasy game to the next level.