
Freelance Friday is a semi-regular series at Nylon Calculus where we solicit contributions from the wide community. This weekās selection comes from Bo Schwartz Madsen, discussing some of theĀ inputsĀ to ESPNās āReal Plus/Minus stat.. Bo writesĀ about the NBA and is co-host on the Danish NBA podcast āUnder Kurvenā, when heās not studying climate science as part of his PhD. Follow him on twitter @BoSchwartz.
The Mighty Prior

ESPNās Real Plus Minus (RPM) is often cited when comparing players.[1. Ā For example.]Ā WhileĀ not designed asĀ definitive ranking of players,RPM isĀ Ā often used as such. Especially when comparing players of the sameĀ position, people have becomeĀ fond of using RPM in just that manner.
To some degree, this āRPM-as-rankingsā habitĀ represents a misinterpretation of how RPM is created and what it is intended to measure.Ā RPM is presented as āevery playerās contribution on-court, where teammates and opponents have been accounted forā. Which is both accurate and slightly incomplete.Ā Ā To better explain, a brief description of how RPM is produced is in order.
The basis for RPM is called Regularized Adjusted Plus Minus (RAPM). RAPM adjusts raw oncourt plus/minus numbers of each player accountingĀ for teammates and opponents. It does so using a technique called ridge regression, instead of ordinary linear regression to remove collinearity and noise.[2. Thatās a lot fo terminology in one place, but please read on, as the rest of the article is in much more laymanās terms.]
This adjustment controlling for context is what a lot of people find attractive about RAPM and adjusted plus/minus models in general. But RPM is not RAPM. Real Plus Minus is a more advanced version of RAPM, where each player is not expected to be at league average from the outset. Instead they are given a āpriorā ā an estimation of value based on box scoreĀ statistics. This box score prior can haveĀ major effect on the final RPM rating that should not be understated.
For example, Klay Thompson has a Defensive RAPM of +1.2, but his Defensive RPM is -1.6.[. 5Ā According to data fromĀ Jeremias Engelmann, one of the creators of RPM, and ESPNās website.]Ā DeAndre Jordan has a Defensive RAPM of +1.15 and a Defensive RPM of +5.84. In other words, the box score prior has a very large large effect. Ā Klayās box score numbers makes him seem like a bad defender and gives him a very low prior that his seemingly positive contributions on court cannot drag him up from. The reverse is true for DeAndre. His box score numbers makes him look like a world-class defender and so his Defensive RPM ends up being among the best in the NBA.
Effects of the prior can be seen below[1. Min 250 minutes played.]:

The difference in range of values along the axes showĀ DRAPM and DRPM have two different scales with a much wider distribution for DRPM.[3. This is a feature, not a bug, asĀ the inclusion of a prior in part is meant to combat ridge regressionās tendency to pull estimates towards the mean of zero in this case.] Tim Duncan has a DRAPM of +3.08 and a DRPM of +6.78.Ā That seems like a big difference, but actually he leads the league in both measures. To make a better comparison, here is the same graph with the scales equalized:

Players like Thompson orĀ Cory Joseph look like bad defenders because of theĀ box scoreĀ prior. While at the same time, gaudy block and rebound numbers helpĀ Hassan Whiteside and DeAndre Jordan to look like good defenders.
The prior is built from a Statistical Plus Minus model akin to Box Plus Minus, developed byĀ Daniel Myers. Much like Nylon Calculusās own DRE methodology, thisĀ model regresses box score stats on a long multi-year RAPM data set.Precisely what factors are included inĀ RPMās Ā prior is not public information, but as shown below BPM provides a reasonable estimate.
A similar impact can be seen on the offensive side as well:

The appearance of quite a few Warriors as best in Offensive RAPM should not be a huge surprise. As Daniel Myers noted on Twitter:
@SethPartnow @BoSchwartz That said, short term RAPM will overly share value on good defensive teams just to avoid outliers. Truth in between
ā Daniel Myers (@DSMok1) February 17, 2016
This is just the offensive equivalent. The Warriors are really, really good at offense (especially with those players playing together) and RAPM will spread the value among them to avoid outliers.[3.Ā This is probably where synergy and a lineup being better than the sum of its parts can play tricks on adjusted plus/minus models like RAPM.]
Though the statistical prior for RPM isnāt publicly available,Ā Box Plus Minus numbers for all players can be found. So I constructed a simple linear model trying to predict RPM from RAPM and BPM.[11.Ā The relationship between the adjustment based on plus/minus and the prior in RPM is of course not a simple linear relationship. This is only an exploratory exercise.]Ā A toy model to see if there were any patterns in the outliers.
Actually, I split it into predicting ORPM and DRPM separately and combined the two. The model for DRPM also improved by adding Height as a variable. It is not used in BPM, but is used in Engelmannās prior.[2.Ā The model also improved slightly if PER was used to predict Offensive RPM, but I did not use that here.]

Overall a linear combination of RAPM and BPM is quite good at predicting RPM.Ā BPM is actually a better predictor than RAPM. That shows in part how important the prior is.
If we look at the residuals, i.e. the difference between RPM and the linear model RPM, there is a pattern in the outliers:

Aside fromĀ Rubio, there certainly seems to beĀ a similarity between the players with a high positive residual. My guess is that the way the prior is constructed favors big men more than BPM does. That is good to know, when we use the results from RPM. We cannot just rely on BPM to explain discrepancies between RAPM and RPM. So how exactly the prior, and thereby the box score, affects RPM remains cloudy.

I think RPM is a decent stat. Really, I do. Itās one of our tools, but like a tool it has uses and misuses. Sometimes it is stretched a bit too far and used for something it cannot do. It has trouble with describing the players that are outliers, but that goes for all one-number stats. It is also not meant to describe every single player accurately. It is meant to be correct overall and overall it does really well. But when used to rank players, one has to be very, very careful.