Dec 9, 2014; Los Angeles, CA, USA; Sacramento Kings forward Rudy Gay (8) takes a shot in the first half of the game against the Los Angeles Lakers at Staples Center. Mandatory Credit: Jayne Kamin-Oncea-USA TODAY Sports
Freelance Friday is a project that lets us share our platform with the multitude of talented writers and basketball analysts who aren’t part of our regular staff of contributors. As part of that series we’re proud to present this guest post from Michael Murray. Michael is a college kid with dreams of the association while he should be doing his homework. You can follow him on Twitter, @michaelmurrays.
At the time of writing this most teams have played 18 games or more and the closing credits of the Small Sample Size Theater of the early NBA season seem to be starting to roll. Knicks fans are losing patience with the Triangle, Cavaliers fans are joining in an ever louder chorus of “SEE!?”, and Lakers fans are staring at the horizon thinking about the fragility of human life. For some the fate of their team seems to have solidified for the season while for others their team seems to be just getting started.
We know teams “regress” to the mean, but how soon? To answer this I calculate cumulative moving averages, the average after each game is played, for three efficiency measures (TS%, 3pt%, and FT%) for all teams last season. Teams that have large changes in performance (think Westbrook and Durant coming back from injury) will take longer to converge to the true mean and very consistent teams will take less time. Since these cumulative moving averages will eventually converge to the final year average they can give us a feel for how long it takes before we can be reasonably confident the current average is close to what will be the year average.
The table below shows how far away averages at five points in the 13/14 season were from the final season average. 3 point shooting is a high variance event, explaining why it is generally farther away from the year-end mean than other the other metrics. By the 75th game, and earlier for some teams, there is a less than a 1% difference between the cumulative average and the year-end average. It’s not groundbreaking news to find out that the 75 game average is robust, at this point in the season there more eulogy than prophecy for teams.
Games Played | 15 | 30 | 45 | 60 | 75 |
TS% | 0.026854 | 0.019434 | 0.011867 | 0.007658 | 0.003153 |
3pt% | 0.064709 | 0.036409 | 0.022018 | 0.015084 | 0.007981 |
FT% | 0.029585 | 0.015261 | 0.010019 | 0.007744 | 0.003305 |
Breaking these out by metric and visualizing them gives better insight into when the variance in averages levels out. Looking first at TS% among eight teams for clarity sake:
The general trends reflect obvious stuff. Some teams start higher, some teams start lower, they all end up in about the middle. In general once some of the small sample size variance is ironed out, teams get better. Systems get refined, rookies and trades get integrated. Even though it still looks like a high variance mess, early season (Games Played = 15-20) TS% correlates with end of season TS% between 0.79 and 0.82. Increase that to 30 games played and the correlation jumps to 0.88. The correlation between cumulative average and year-end average looks like this:
Big jumps in correlation come early. Three times the correlation jumps by >.05 – after the 2nd, 3rd, and 11th games. The correlation crosses the 0.90 threshold after the 31st game, and the 0.95 mark after the 41st game. Other than that it’s a tale of diminishing marginal returns.
To see if these patterns are consistent I wanted to look at other measures of efficiency. Here’s FT%, which is a component of TS%:
This doesn’t look too different from the TS% graph. You’ve got the Small Sample Size Theatre on the left with lots of variance, a pretty snappy convergence to the mean and minor fluctuations otherwise. Because the y-axes are on different scales for the first graph and this one it appears that team FT% falls closer to the league average. However, there is a 14.4% difference between the league’s lowest TS% (76ers) and the highest (Heat), while there is an 18% difference in FT% between the Pistons and Trailblazers.
Everyone knows the three-pointer is a high variability shot—live by the three, die by the three. But 3pt% regresses to the mean just like any other metric. You can see that high variability visualized in the first 20 or so games as 3pt% fluctuates and resists the mean slightly longer than TS% and FT%. Other than that there does not seem to be too much new information here so I want to look at a correlation curve for season average.
Now this is different, there is obvious difference between this correlation curve and the one for TS%. The grain of salt to take with anything that is said about this is understanding the increase in the number of threes being attempted. It is difficult to say if the increase in threes by players who are not comfortable with the shot or the natural variance of the three is what is causing the shape of this curve. There is a big slump in the 6th-11th games which is probably just noise, if I expand the data beyond one season it probably disappears. The correlation of cumulative 3pt% to season average passes the 0.90 mark after the 44th game of last season and passes 0.95 after the 58th game.
What cumulative averages for 3pt% show more than anything is that we are still a ways from having a good robust sample size from which to make sweeping generalizations.. These efficiency metrics are baked into a lot of the things we use as big, one-stat-like metrics such as Ortg and Drtg. At the same time, you don’t really need to wait until current stats are nearly perfectly correlated with what their season average will be. The samples we have now for teams definitely are not “small”, but we’re not out of weeds yet.