During the offseason, I released a metric for all the “most impact players” of 2013-2014 that was attempting to measure how “consistent” players are (and please, feel free to look at the resulting data HERE). Fans like to talk all the time about players who are more consistent than others, and, to this point, there has been no real metric used to define that consistency, despite that fact that it’s an incredibly basic statistical principle.
For as long as statisticians have been using averages to try and understand the nature of large sets of information, they’ve been using the variance to understand how similar to the averages the data is. Variance is an essential tool for telling us how close a set of data comes to its average, or, how consistent a data set really is.
None of this is to say that basketball statistics don’t use variance: things like RPM, the Margin of Error on any stats, significance in regressions, etc., all use variance in their calculations because it’s such an important part of interpreting data.
What I do find interesting is that, despite using consistency as a talking point for players, no one has decided that variance would interesting or valuable to know for players in and of itself. I suspect that the devaluing of variance as a tool of information in and of itself lies fundamentally in the fact that it’s not a tool that we can use to “rank” players. Players who have greater variance than others aren’t really better or worse. They just have some really good games and then some worse ones.
Despite the fact that the general desire is for information that can be used to rank players, we here at Nylon Calculus like having data that helps us learn more about the game of basketball, independent of its implementation in terms of calling players “better” or “worse.” Knowing more about what makes players more or less “consistent” and what happens if a player is more or less consistent would seem to be important to understanding why certain things happen on a basketball court.
Finding the variance itself was simple enough: I took the game logs of each player, found the game-to-game variance for each statistical category of interest, and I aggregated the variances along the lines of a typical fantasy sports aggregation[1. I actually used Standard Deviation — the square root of variance — for ease of interpretation. The aggregation was also a linear transform that didn’t affect the results in any significant way. I biased the “Variance” score a tiny bit for points per game over the other categories as it seems to be what most fans are most interested in, but otherwise, it was close to an average of the variances with the necessary weights to make things like 3pt% and TS% comparable to PPG]. As variance increases, a player is “less consistent.”
Variance is certainly not the end-all, be-all of consistency, however, despite the fact that it’s what we traditionally think of when we consider a player to be consistent or inconsistent. As well, it has some issues that, in constructing a “consistency” stat, I did my best to address.
For one: variance increases linearly with volume, which makes some natural amount of sense. The more a player scores, rebounds, shoots, passes, etc. the larger the difference between bad games and good games is going to naturally be. You can view this graph as a demonstration:
This isn’t necessarily a problem per se, in that higher volume players are necessarily less “consistent” in that they have a harder time keeping to a smaller range of points, rebounds, shooting percentage, etc. This is counter to our intuitive sense of consistency, however, in that we naturally understand that better players have a different “range” by which they’re likely to score. What we’re more interested in is how often and by how much player score outside of that range.
For that, co-writer here at Nylon Calculus Krishna Narsu brilliantly suggested a model for “consistency” that I then executed, where consistency is slightly different than variance, though they’re very closely related. For consistency, instead of taking the raw variance for each measure, we subtracted the difference between each game’s result and the player’s average and divided it by the player’s variance (or, the player’s usual “range”)[2. For the more mathematically oriented, I took the absolute value of the z-score (Z= (Game-Avg)/St. Deviation) of every game played so far. The average of those scores is my “consistency” measure. You may notice that z-score is, itself, represented as the standard deviation of a standard normal distribution, so I in essence have just found the normalized variance for each measure for each player.]. Consistency, rather than telling us the raw spread for a player over any given statistic, instead tells us how likely a player is to player beyond or below the expectations we typically have for that player. Without further ado, here’s the consistency for the first 9-to-12 games of the NBA season for all players (some were excluded due to lack of a sample to actually calculate the variance. Small sample size warning applies to all players).
The stat can also be found on the “Our Stats” page and will be updated through the year. I would recommend, if you plan on seriously perusing the data, opening the chart in a new tab or downloading it:
Variance: By how much a player is likely to perform above or below his average on any given game. Represented by a value in which about 68% of games will be within a range of the Average +/- Variance. Higher variance = lower consistency. Highly dependent on volume of scoring. About 75% of players fall between a variance of 2.1 and 6.8, so anyone outside of that range can be considered to exceptionally consistent or exceptionally inconsistent. The average is 4.8.
Consistency: How likely a player is to perform outside of the typical “range” set by his variance. Higher consistency score means the player is more likely to play within the range set by his variance. Lower means their performance is more erratic. Far less dependent on volume, but the extreme highs and lows are less dramatic. About 75% of players fall between a consistency of 1.1 and 3.6, so anyone outside of that range can be considered exceptionally consistent or exceptionally inconsistent. The average is 2.3.
In the definitions above I have deliberated noted that either Variance or Consistency can be used to describe a player’s “consistency” dependent on how your choose to define being “consistent.” There’s a bit more to be said about consistency if you’ll bear with me:
- High consistency or high variance and vice versa has little to nothing to do with a player being “good” or “bad,” even though it’s really tempting to conflate inconsistency with inefficiency. Really, really good players have exceptional games where they play much better than their average all. the. time. There’s a reason why LeBron has a relatively low consistency and a high variance: he has awesome games just as often as he has standard ones. See, as a demonstration, this graph comparing TS% Consistency with the actual TS%. There’s little relationship, AND it’s an insignificant one:
Although, as Variance increases (“less efficiency”), TS% also appears to increase, and the relationship is highly significant, if loose:
If I were a betting man, I would speculate that this is because as variance increases, so does the usage of a player, and — over the whole league — the talent level increases with increasing usage. Thus, TS% increases as variance increases.
- Minute distributions play a really interesting role in player consistency. Despite the fact that you’d imagine consistency in PPG or REB or AST would be heavily determined by points, the relationship between minute consistency and PPG consistency is less strong than the relationship between PPG consistency and AST consistency, TS% consistency, and so on. For example: