Nylon Calculus: Reinventing PER

LOS ANGELES, CA - MARCH 09: Kevin Garnett
LOS ANGELES, CA - MARCH 09: Kevin Garnett /
facebooktwitterreddit

Well over a decade ago, John Hollinger developed the omnipresent NBA metric Player Efficiency Rating, or PER, which is an attempt to stuff all the available information into a single statistic to help judge and compare players. It’s a wonder the metric is still used, as we’ve had years of basketball analysis to update beliefs and add useful information. It’s a metric from another era unable to reap from the benefits of the big data revolution or any new statistical techniques.

Despite its shortcomings, PER is still used frequently, and in fact may still be the most cited NBA metric. This is probably due to Hollinger’s overall influence as the advanced stat media guru for years, ESPN’s ubiquity, and PER’s subsequent appearance on Basketball-Reference.

If the metric isn’t going away any time soon, perhaps it’s time to update the stat.

Let’s reinvent PER.

How PER works

There’s a narrow spectrum of understanding for the metric. Most people who know PER see it as a way of summarizing box-score stats into one number, and when people try to learn more they’re referred to the formula where they become more confused and seek simpler explanations. PER, in fact, has been deconstructed and rebuilt as a linear weights formula multiple times. That’s how far people go to understand PER because the formula itself eludes easy digestion. But if you break it down piece by piece it’s not impossible to comprehend.

Read More: Embiid’s minutes restrictions are more logical than Strasburg’s were

Most PER guides out there are too basic and miss the important details and assumptions. There are exceptions, like this article, which is a handy breakdown I’d recommend reading too. But the best way to understand PER is to calculate it, which isn’t too difficult when you know how to manipulate a lot of data quickly[1.]. Crunching the numbers, you actually see there are a few levels of stats, and they’re there for a reason: there are the individual player stats, of course; then there are team stats, so you can adjust some of those individual stats based on the team’s average; and then there are league-wide stats because the value of all the individual stats are tied to league average expected rates.

If you want to see how it’s calculated, here’s the R code to do so. It just plucks the player data from Basketball-Reference for a specified season, calculates the league stats for said season, and then cycles through every player building PER[2.]. Note that I, unfortunately, couldn’t get 100% accuracy in matching the PER displayed on ESPN and Basketball-Reference, but the error is less than a tenth, which I think is fine[3.]. The discrepancy may be due to the assumption about how a possession is calculated, or they’re just rounding errors.

Here’s what PER is doing: it’s looking at net points created by a player per minute, which is then adjusted for pace and league averages. Points, obviously, are easy to translate into points, but things like rebounds and steals are transformed by the ‘Value of a Possession’ statistic. Since a steal is a net possession, the player is credited with the league average point value of a possession. It’s a little more convoluted, but it’s done with missed shots too. A missed shot, or free throw, is a loss of a possession, which is also adjusted by league averages for defensive rebound rates. Player fouls are transformed too, and it’s done so by league-wide rates of free throws made and total fouls. Think of it this way: it’s an estimate of the damage caused by one foul based on the proportions of free throws made and attempted to all other fouls adjusted by the value of a possession.

Calculating PER

calculating-per
calculating-per /

Finally, we get to the tougher segments of PER. The second and third factors (in order) are actually related. Players get credit for assists, but consequently they get debited for assisted shots, where a major assumption is made: since Hollinger had no data on assisted shots, he used team assist rates applied equally to every player. The same adjustment is applied for free throws; it’s just more complicated because assumptions had to be made about how often free throws end a play.

This leads to a controversial consequence of PER: the idea that the field goal percentage break-even point for raising one’s PER is well below average, and thus the metric awards gunners unfairly. First of all, it isn’t true that all a player needs to do is take shots, even if they’re just misses. You may also see people complain about how the metric is all about usage rate, but that’s missing the point. You can see this with a bit of calculus, and it’s simple enough to see in the formula for yourself. The only term with a missed field goal is [- VOP * DRB% * (FGA – FG)], and as you can see it’s a negative number since it leads with a negative sign and both VOP and DRB% are positive. I mentioned calculus because it’s all about rates of change, and the change to PER[4.] with respect to a missed shot is negative. But there are two terms associated with field goals, ignoring 3-pointers for now; so is it actually true that there’s a break-even point, and is it surprisingly low? With a bit of algebra, we can set the two field goal terms equal to each other and solve for that break-even field goal percentage[5.].

Break-even FG% for PER = VOP*DRB% / [ 2 – factor*TmAST/TmFG + VOP*DRB% ]

For the 2016 season, using inputs from the average team, that field goal percentage is about 32.4%. This means that if a player only increased his 2-pointer field goal attempts with even just a field goal percentage of 33, his PER would increase, all other things being equal. Let’s take an example: say, oh I don’t know, Brook Lopez had decided to take 400 more midrange shots last season, because he’s funny like that, and we’ll assume this didn’t affect his free throw totals. We’ll also assume his team’s ratio of assists to field goals also didn’t change. Out of those 400, he made 140 for a field goal percentage on those shots of 35. What would happen to his PER?

Running through the numbers, his PER would increase slightly from 21.7 to 22.2. I know that’s not much, but there’s no penalty either for some awful efficiency. His true shooting percentage would fall from 56.2 percent to 51.3 percent, and his team would have almost certainly lost more games. Of course, it’s better to be efficient; if he had made 60 percent of those shots, his PER would be 27.3.

The issue here is that the Player Efficiency Rating awards based on points, and the metric doesn’t suffers from one key blindspot: if a player doesn’t take a field goal, or just score points in general, that doesn’t mean those points vanish into thin air. Another player on the team could use that shot, and while we can debate on the expected value of those shots from the theoretical teammates, that break-even point around 33 percent is almost certainly way too much. This is why having a monster usage rate, as long as your efficiency isn’t abhorrent, is so strongly correlated with PER. It’s about points.

But are there any solutions?

Updating PER

The first reinvention here is in regards to assisted field goals. As I discussed earlier, every player on a specific team is assumed to have the same assisted field goal ratio. But we can just use direct assisted field goals using data from play-by-play logs and replace the team ratio. The same team ratio is used in a free-throw factor as well, and it’s an easy substitution to change it so it’s reflective of a player’s actual assist to field goal ratio.

Free throws are a little more complicated, however, and it’s because through basic stats we don’t know which free throws end a possession — and that’s how PER works; it’s built on possessions, ending or creating them. But with play-by-play data, I can calculate (with decent accuracy) how many of those free throws end a possession (or play), which is a crucial change for the sixth term in the PER formula.

I want to do something else to free throws though, and it might be controversial: I’m ignoring all points derived from technical fouls. The guys who take them often had nothing to do with creating the free throws, and it’s as replaceable as an action can be in basketball. They should not be treated the same as free throws that are created by the players who take them.

Additionally, I noticed that 3-point field goals are not adjusted at all, meaning that essentially getting a point from a 3-pointer field goal is more valuable than a normal field goal in PER. Since 3-point percentages are so chaotic, I thought it wouldn’t do any harm to apply the assisted factor to these shots. To balance the sheets, I gave proper credit to 3-pointer assists as well. Thus, with every assist going to a teammate behind the arc, the passer gets an additional one-thirds of a point[6.].

One of the bigger assumptions in PER is the one linked with personal fouls. With a league-average ratio of points to personal fouls, and factoring in the use of a possession, you can get an expected number of negative points from a single foul. I’ve done the same but with shooting fouls, and in the style of PER it’s done with respect to league average expected values. I should add here that some of these new variables have the possibility of the dreaded division by zero, as sometimes players will, say, go without a field-goal for the season. My solution was three factors that also serve to regress ratios to the league average: an assist/field goal ratio, a free throw ratio showing the proportion of free throws that use a possession, and a 3-point assist/3-point ratio. The basic form was ( AstFG + LgAvgRatio*50 )/(FG + 50).

Here’s another statistic I can tweak: rebounds. Field goal and free throw rebounds are fundamentally different, so I’ve split those wherever appropriate and have two different league average rebounding rates on defense. It’s a pretty easy input when you have the data, and it makes a significant difference for players who pad their totals with easy free throw rebounds.

The last minor revision has to do with blocks. In PER, the value of a block is only about how it can potentially end a possession — note the factors used in front of blocks in the formula. Thus, I can just replace this term with the “Russell,” which is just a block that ends in a defensive board.

We can add variables to PER too. It’s not too difficult — I can just follow the previously established rules. My favorite miscellaneous variable, offensive fouls drawn, can be treated like a steal, so I just multiply that by VOP. Then there’s personal fouls drawn, which are calculated the same as personal fouls, as in the original formula, but with a positive sign attached[7.]. Goaltending violations are important too, and I’ll assume those are worth two points subtracting out a VOP.

Another term I can add has to do with a change in VOP: after steals, offenses are much more efficient, so I gave a slight boost to steals and a penalty for live-ball turnovers. The boost is based on the historic percentage difference in how teams perform compared to the average. Via inpredictable, the coefficient is 0.14, meaning players get a total of 1.14*VOP for steals[8.]. In concert, this also hurts passers who net a lot of stolen turnovers. Lastly, I included a term for spacing, which is a product of 3-pointers attempted and the proportion of made 3-pointers that were assisted. The power of spacing can’t be ignored. This is for role players who don’t create their own shots but help others; thus, it’s weighed by assisted shots, as I’m mirroring the added benefit of catch-and-shoot players.

Finally, that brings me to a major revision: I need to do something about how points are overvalued — and, with my changes, players who create assists and unassisted shots are overvalued too. It doesn’t make sense that a player with a 35 field goal percentage, and with no outside shots or free throws, can increase his PER by taking more shots. Instead of completely overhauling the metric, I decided to make an adjustment based on the “spirit” of PER. I included a new factor, FGCoef, that is applied to all field goals and free throws made, as well as assists. It’s based on this equilibrium point: assisted field goals should not increase one’s PER unless the net value is greater than the league average Value of Possession[9.]. This means inefficient scorers get penalized, but shot creators on bad teams operating only a little under the league average will not — their own individual field-goal assist rates move that equilibrium point downwards.

Results

You can see the results below or through this link directly.  What you’ll note first is that the distribution of this new PER is much wider — the highest PERs are now in the upper 40’s, instead of around 31, and players regularly dip into the negatives. This is largely because assisted field-goal ratios are no longer assumed to be the same across teammates, so high-scoring playmakers can get huge boosts to their PERs, which were already quite good because a lot of credit is given to points and assists.

You can view the top seasons in a summary table below. And yeah, this is all too perfect — Kevin Garnett has the highest rated season with one just a shade below a pretty and memorable number, 50. I have extolled the virtues of Garnett relentlessly, and I’m all the more delighted because it’s his underrated 2005 season when I believe he should have won MVP over the controversial pick for Steve Nash. It’s a decent list too, represented by LeBron James, Chris Paul’s early monster seasons in New Orleans, and a few legends. PER 2.0 seems to really love high-rebounding playmakers quite strongly though, which is actually true with most modern statistical plus/minus metrics — look at those late-prime Charles Barkley seasons ranking high.

Table: top PER 2.0 seasons, 1997-2017, min. 1500 MP

PlayerSeasonTeamPERPER 2.0
Kevin Garnett2005MIN28.249.9
Chris Paul2009NOH30.048.1
Kevin Garnett2004MIN29.447.0
Charles Barkley1999HOU23.146.5
LeBron James2009CLE31.745.5
Charles Barkley1997HOU23.045.1
Tracy McGrady2003ORL30.344.8
Charles Barkley1998HOU21.643.4
Kevin Garnett2003MIN26.442.9
LeBron James2013MIA31.642.8
Chris Paul2008NOH28.342.8
Tim Duncan2005SAS27.042.6
Grant Hill1997DET25.542.0
David Robinson1998SAS27.842.0
LeBron James2010CLE31.141.3
Elton Brand2002LAC23.641.3
Ben Wallace2002DET18.641.3
Tim Duncan2002SAS27.040.7

If you’re wondering who has the worst rating in this time frame, it’s John Amaechi by a country mile. There are, in fact, actually few seasons rated below 0 by players with a large number of minutes, but Amaechi managed to rate far into the negative range. As a center, he rarely scored, scored inefficiently when he managed to, rebounded like a guard, and received additional penalties from the play-by-play, like his low rate of blocks rebounded by the defense. Players with low PER’s are usually soft big men who don’t rebound or score, or non-scoring, non-assisting perimeter players — but some of those guys are defensive stalwarts whose contributions aren’t measured by these countable stats.

Table: worst PER 2.0 seasons, 1997-2017, min. 1500 MP

PlayerSeasonTeamPERPER 2.0
John Amaechi2001ORL8.7-12.8
Junior Harrington2003DEN6.4-6.7
Rodney White2003DEN9.8-6.6
Chris Kaman2004LAC9.6-5.3
Tony Battie2005ORL8.6-4.2
Marcus Fizer2001CHI11-3.9
Dickey Simpkins2000CHI5.5-3.9
Pat Garrity2001ORL10.1-3.7
Sean Rooks2001LAC11.5-3.7
Ron Mercer2002TOT10.2-3.5
Jim Jackson2001TOT10.7-3.5
Malik Allen2003MIA9.9-3.4
Rasual Butler2003MIA9.5-3.3
Doug West1997MIN8.5-2.9
George McCloud2002DEN9.8-2.7
Felton Spencer1997GSW9.1-2.7
Adam Morrison2007CHA7.9-2.5
Ron Mercer2002CHI10.8-2.3
Felton Spencer1997TOT9.2-2.1

If you’re wondering how much “better” this version is, I found that the correlation between PER 2.0 to RAPM was much higher than PER’s correlation for seasons 2013 through 2015: a correlation coefficient of 0.417 compared to 0.490[10.]. That’s a good sign, though I would not recommend using this for prediction; it’s more of an experiment.

Remember that John Hollinger’s own explanation of the metric includes a warning that it’s just a way to summarize all the available information on a player and we should, then, make further subjective adjustments based on what’s missing. PER, even this advanced form, is not an accurate reflection of defense, and much of the subtleties of the game are lost in league-average transformations and basic box-score stats.

I don’t think this version of PER fixes all the issues people have. While I think my attempt at fixing the inefficient scorer problem was a valiant one, it all comes down to the formula’s fundamental structure: how do we distribute credit based on possessions? A rebound isn’t an additional possession, and from box-score stats, or the play-by-play, we can’t understand if that rebound was just an uncontested one any teammate could have grabbed. Defense, in particular, is tougher to parse, and the possession model of building a metric will not work well with the available data — defending is about more than just controlling possessions; ignoring made field-goals, for one, is a massive oversight. Imagine if you had to grade a team’s defense just with steals, blocks, and rebounds: it’s madness.

Next: Stephen Curry, a case study in visualizing plus-minus

I can understand wanting to value a player by building up his score, brick-by-brick, through possession counting and weighing, but the data available is not fine-grained enough. That’s why statistical plus-minus stats are so successful — we need approximations and estimates. The box-score is not enough, even with miscellaneous stats thrown in. But, hey, if we’re going to use PER, we might as well use this more advanced version. And like the old PER, mine has plenty of warts too.


[1. The greatest issue I had was not knowing I had written FT instead of FG in one place, which caused me to curse John Hollinger’s name and wonder if he had placed some tricky adjustment that wasn’t explained in the online formulas.]

[2. Note that most of the code is for data cleaning, which is pretty typical. Also, yes, I like loops, and I didn’t put in the effort for something more elegant because it was still fairly quick for an entire season.]

[3. Hollinger’s early articles actually display a slightly different PER than the ones currently displayed — the discrepancies are because of what team and league data are used.]

[4. Technically, just uPER. But the rates of change do correspond ultimately with PER.]

[5. For those curious, start with (2 – factor * (team_AST / team_FG)) * FG = VOP * DRB% * (FGA – FG). Expand the right side: VOP * DRB% * FGA – VOP * DRB% * FG). Then move terms so FG is isolated and FGA is too on the other side: FG = VOP*DRB/((2 – factor * (team_AST / team_FG))+VOP*DRB)*FGA. Then you can create the FG% with simple division, as FG% can be cleverly rewritten as FG/FGA.]

[6. In the original PER, assists get two-thirds the value of a field goal, which is assumed to be two points, so yes, the math checks out.]

[7. Unfortunately, it’s not possible to separate personal fouls drawn by team, so I had to use season totals and prorate based on minutes played for guys who switched teams midseason. Plus the data is only available going back a decade — but that’s true of offensive fouls drawn too. But I thought it was interesting enough to include, and the league average calculation will make it possible to compare values from 2016 to one without fouls drawn, like 1998. If you think that’s not truly PER, I’ll refer you to PER from the 60’s where they didn’t have steals, blocks, turnovers, or rebounds separated by offense or defense.]

[8. I may change this in the future by using season league averages, but this is pretty stable and much more computationally efficient.]

[9. I calculated the formula for the coefficient with some algebra, so it changes year by year.]

[10. Why am I comparing it to RAPM? It’s a player rating system that’s not formed by any of the variables PER has, and it’s all about how the player is associated with his team outscoring his opponent, which is what truly matters. It’s how the built public metrics today are built too.]