In defense of WAR: My response to Jeff Passan
By JJ Keller
Over the past weekend, Yahoo MLB writer Jeff Passan put together a piece outlining his problems with the sabermetric statistic of Wins Above Replacement, or WAR. Doing so caused quite a stir on social media, bringing out the reactionary fans who are dead-set against new-age metrics such as WAR, as well as the “sabermetricians” who came to its defense. I was one of the latter.
Now, before we get too far, let me preface something. I respect Mr. Passan and appreciate his work, and that includes the article in question. Though I don’t always agree with his sentiments (as you will see in just a moment), he does often advocate and defend the analytics community, and uses them at times himself. That is more than can be said for most national writers, and he deserves to be acknowledged for that.
Unfortunately, that doesn’t mean he wasn’t incorrect with some of the points he made about WAR. I, along with Dave Cameron, Sean Forman, and many others, disagree with various points he made in what was ultimately a well-meaning and otherwise fair argument.
For those of you that are unfamiliar with either Cameron or Forman, know that they hold a tremendous amount of authority when it comes to advanced stats. Cameron is the lead editor at Fangraphs, in addition to contributing to a multitude of other sites. Forman founded Baseball Reference, a go-to stat site for many readers and bloggers alike. They are both clearly authorities on the matter, and you should check out what they had to say about the situation.
With that said, here is my take. Passan was most likely well-meaning, and he did bring up an idea that is always worthy of discussion: improvement. The entire purpose of the sabermetric movement is to expand and enhance our knowledge of the game; what makes for a valuable player, what an individual player has control over, and a general objectivity about the sport. As such, bringing up potential points of weakness in a metric with the intent to fix them should never be frowned upon. Far too often, these numbers are criticized for the sole purpose of criticism, with no real intent to make improvements.
But at the same time, the improvements being posited have to actually be correctable problems in the first place. Bringing up aspects of the metrics that have already been addressed, likely due to a misunderstanding of the metric or what it measures, isn’t all that helpful. In order for actual progress to be made, everyone involved in the process must fully grasp the metric. There cannot be incorrect assumptions about it, because then you run the risk of either trying to fix something that isn’t broken, or just not fix anything at all.
Passan calls for many changes, and points out what he thinks are problems with WAR, but there are clear gaps in his understanding. He calls for improvements in areas that either don’t really need improvements, or perceives problems that we aren’t even sure exist. I will not comment on everything I disagreed with in the piece, as in all honesty, that would probably dictate a 2500-3000 word response. I will instead highlight a couple points that I found particularly worthy of response, the latter of which I haven’t seen many others go after, despite it seeming like the biggest “mistake” made in my opinion.
Our first example of this is his outright condemnation of the defensive metrics that go into WAR.
"Not having wholesale data – where a fielder started, where he went, how fast he did so – caused those who calculate defensive metrics to compromise. They essentially grouped plays made in areas, without any context as to whether a player was helped or hurt by his positioning….Questionable, too, is how a pair of metrics considering the exact same plays can so often come to such different conclusions. With 10 “runs” considered a “win” – each of which are supposed to be worth somewhere in the $6 million range – and a player like Adam Eaton a +12 in DRS and -2.4 in UZR, that disagreement covers nearly $10 million in value. Perhaps the most difficult part to reconcile is how defensive metrics take the subjective and present it as objective. In lieu of a widespread camera-and-radar-tracking system, BIS uses video scouts who watch games, plot batted-ball locations and manually time a ball in the air to estimate how hard it was hit."
Are defensive metrics perfect? Of course not. There is a level of subjectivity, they often take multiple seasons to fully stabilize, and they can vary between systems. But the effects of these three “problems” are overblown here. The video scouts do not justpick and choose what is a line drive, or where the ballshould be grouped. There are strict criteria that they follow, which renders the subjectivity fairly moot.
The variance between Defensive Runs Saved and Ultimate Zone Rating could be somewhat of a concern, but consider this. They aren’t the same stat. They measure similar things, but in a different way. We have multiple stats for offense, and no one is worried about that. It also seems that, more often than not, the numbers will be somewhat similar, and we have a good idea what kind of defender each player is.
The only real criticism I could see myself agreeing with is that they take multiple years to become stable, but Passan seemed to almost completely skip that over. It would have been his best case, as even the creators admit this flaw, but instead he hampered on other things that aren’t really a huge concern.
And despite all of these supposed flaws, despite how young and incomplete our current defensive metrics are, WAR correlates to actual wins better with the defensive metrics included than without. With defense included, the fit comes out to about .88, meaning 88% of the variance in wins is explained by WAR. Without defense? That number drops to about .68, or 68% of the variance. That leads me to believe that defensive metrics do hold at least some value, as they make WAR more accurate than it otherwise would be, and that is the entire purpose. WAR is designed to give you a snapshot of a player’s all-around value and contribution to his team as expressed in wins, and it does that more accurately when you take into account defense.
This doesn’t mean that WAR is perfect, or that nothing could be improved. It very well could. But as it stands, the information we have doesn’t seem to suggest that defensive metrics are bringing WAR down, as Passan is arguing. What Passan sees as a lame-duck, subjective compromise of a system seems to help a whole lot more than it hurts. Defensive metrics can (and will if we gain access to Field F/X or MLBAM tracking) improve, and with them, WAR will become even more accurate. But even in their infancy, defensive metrics play a positive role in WAR, and that should, in my estimation, outweigh most every other concern.
The next piece I had a particular problem with, and what I will end on, are the conclusions Passan draws from the fact that there are some players near the top of WAR leaderboards who compile most of their value from their defense. Rather than examining all of the possible resolves, he instead went with, presumably, the explanation that best supported his argument. Passan tweeted:
Then, in his article, wrote:
"Still, defense almost alone places Gordon behind only Mike Trout among position players in fWAR (FanGraphs’ version of WAR) and boosts him to a tie for sixth in rWAR (Baseball-Reference’s) despite an OPS more than 100 points lower than Trout’s (and 150-plus points lower thanGiancarlo Stanton‘s)."
He reiterates this sentiment with Jason Heyward and Juan Lagares, both of whom find themselves towards the top in WAR leaderboards due to their defensive excellence.
Essentially, Passan is saying that because these players are not some of the best in the league offensively (all rank outside of the top 40 in wRC+), there is no way they are actually as valuable as WAR says they are. I have a major issue with this line of thinking.
The entire purpose of analytics in baseball — and in everything — is not to feed our biases. We shouldn’t be looking for the data that will prove our point, and act like anything that doesn’t must be wrong. Not if we are actually looking for the objective truth, like Passan seems to be. If objective, falsifiable evidence doesn’t jive with what our preconceived notions suggested, we don’t get to just brush off the data and label it false or misguided. More often than not, the data will be correct, and we will have been wrong.
This becomes a common theme throughout Passan’s entire argument. The idea that because a player like Alex Gordon is considered as valuable as Mike Trout in terms of WAR, that it must be overrating the value of defense, never once considering that maybe, just maybe, he was in fact underrating defense, and that WAR had it right.
The entire purpose of analytics in baseball is not to feed our biases.
Does this mean that WAR necessarily was correct? Well, no. Again, it isn’t perfect, and there is a chance Passan could be right in wanting to modify the influence defense has on WAR. But there isn’t much real evidence to suggest that that is a viable course of action, apart from gut feeling, rendering the crux of his argument completely subjective, and mostly baseless at this time.
On the contrary, we can again look at the 88% correlation to actual wins, and be pretty darn sure that WAR is telling us the truth, or at least something close to it. So while improvements are surely possible, there also need to be modifications to the way we think. Any chance to make these changes and have an honest discussion go out the window when right off the bat, one side is unwilling to even consider the possibility that what they initially believe was in fact wrong, while WAR was correct.
Wins Above Replacement isn’t perfect. It isn’t the be-all-end-all. Most people, myself included, don’t advise using it as a conversation ender, but rather as a conversation starter. This especially applies for small gaps, i.e. a 3.5 WAR player isn’t automatically better than a 3.1 WAR player. You need to dig deeper, as even a stat as solid like WAR leaves room for error.
Maybe there is something to his idea to scale back how much weight is given to defense (though Dave Cameron points out problems with that as well in his response), and there is even an off chance that as we gain access to better defensive analysis, Passan’s arguments could ring somewhat true
As stated earlier, there is a discussion to be had about what improvements can be made going forward. Unfortunately, Passan’s article doesn’t look to be the ignitor of that movement like he may have hoped. It was a worthy idea, and it certainly made me (and plenty of others) rethink some things, and I sincerely hope I do not come off as though I am attacking Passan personally. As I said, I respect his work, and can appreciate the goal he had in mind. But ultimately, I felt it was mostly just a rehash of past critiques, and there remains too much misinformation for his call to action to really break much ground.