Freelance Friday: When Does a Small Sample Size Stop Being Small?

Dec 22, 2014; Houston, TX, USA; Houston Rockets forward Trevor Ariza (1) during the game against the Portland Trail Blazers at Toyota Center. Mandatory Credit: Troy Taormina-USA TODAY Sports

Freelance Friday is a project that lets us share our platform with the multitude of talented writers and basketball analysts who aren’t part of our regular staff of contributors. As part of that series we’re proud to present this guest post from Johannes Becker. Johannes is interested in basketball and statistics and is a PhD student in bioinformatics. You can follow him on twitter @SportsTribution and on his blog, SportsTribution.blogspot.ch.

Introduction

As we all know, life is about small sample size. Of course, if we accept this fact, we could just as well go back to trusting our good old gut feeling. Identifying the dreaded small sample size is a keystone for analytics and there are several articles that try to find the data point for which a statistic like three-point percentage or batting average stabilizes[1. Examples: https://fansided.com/2014/08/29/long-take-three-point-shooting-stabilize/, http://www.fangraphs.com/blogs/stabilizing-statistics-interpreting-early-season-results/, http://www.baseballprospectus.com/article.php?articleid=17659]. Most of these articles use methods which try to find one general value (a sample size) for which a statistic stabilizes.

I have a more gory explanation of why I think this is unnecessarily complicated on my blog, but I promised Ian to compress my points so that the number of people that get bored to death remained small[2. Note the irony in writing an unnecessarily complicated blog post about something you find unnecessarily complicated]. The point is, the general idea of methods that are based on the work of Prof. Dr. Pizza Cutter compare all players with a big enough sample size against each other, trying to figure out how separable they are. This can be misleading for some stats like three-point attempts, as in general only people that are good at shooting threes are allowed to take a lot of them (insert Josh Smith joke here).

This of course leads to the problem that most of the people with large enough sample sizes are relatively good shooters and therefore barely separable. For example, during the 2013-14 season, 50% of players with at least 50 attempts shot between 32.7% (Pero Antic) and 38.7% (John Salmons) on three point attempts. This means that the needed margin of error is relatively small if you want to separate those players. So, if recent methods state that it takes on average 750 shots for three pointers to stabilize, this means that we need a lot of shots for a lot of players until we can say something with certainty.

But life is not an average player. We are more interested in outliers with relatively small sample sizes. So, a more common question would be: Luke Babbit shot 49.3% on 75 attempts. How likely is it that he is an above average three point shooter?

In the following, I will explain a simple method that attributes a ‘stabilization value’ for each player individually. Because life is to0 short to wait until Luke Babbitt shot 750 times.

A Story About Players and Coins

The method I’ll try to explain in the following is relatively simple and works for everything that is a yes/no situation (like ‘did the shot go in?’). To calculate the probability that a specific player is above or below average, we simply use the average shooting percentage as our probability for ‘yes’ and assume that the player percentage is a binomial distribution around this percentage. Then you look at how likely it is that you get at least the amount of ‘yes’ that the player got (I hope this was not too confusing. Alternatively, read the last paragraph in the voice of Jeff Goldblum).

Going back to Luke Babbitt and the average shooting percentage for three pointers this season (around 34.8%), this is like asking: Luke Babbitt flipped a coin that shows head 34.8% of the time 75 times. How likely is it that this coin shows head at least 37 times? If you ignore the caveats that I mention at the end of this article, the result is that the likelihood of Luke Babbitt is an average three point shooter is 1.3% — so, very unlikely.

What You Can Do With All The Coins

In short, looking at statistical stabilization for each player individually gives you a good idea for a lot of players without asking for them to shoot 750 three pointers. The following plots show the results of this method for three-point shots for the season 13-14, focusing on players with at least 50 attempts.

Both values for correlation and linear fit include only players with more than 50 attempts (dotted vertical line). The dotted curves indicate the 2.5 and 97.5 percent probability for the assumed binomial distributions. For three-pointers 10.9% of players are outside of this 95% confidence interval. 10.9% is not that much, given that the theory of binomial distribution would assume that 5% of outliers would happen just by random chance. Two obvious factors reduce the number of outliers for three pointers. The first is that bad shooters are usually quickly asked to stop shooting. The second is that good shooters are usually asked or required to take more complicated shots, something that for example would not be the case for free throws (see Howard, Dwight).

Our Null hypothesis is that every shooter is an average three point percentage coin flip. If we now want to see the pValue (the probability that an outcome is the result of this Null hypothesis) for each of our three point shooters, we can use a slightly different plot:

The horizontal dotted lines mark where there is an only 5% probability that the result is a random occurrence. So, if you want to read this plot for a specific player, you could say for example: disregarding effects like differences in defensive pressure, there is an exp(-6.82)=0.13% probability that Josh Smith was in reality an average three-point shooter during the 2013-14 season or disregarding effects like differences in defensive pressure, there is an exp(-12.9)=0.00051% probability that Kyle Korver was in reality an average three-point shooter during the 2013-14 season. Which in both cases doesn’t need 750 attempts.

In short, when your rookie shoots 50% on 25 three-point attempts it is probably still a little bit early to pee your pants. But when he does the same for around 50 attempts, you probably should start ordering some Pampers…

Where To Go With This?

This method is relatively straight-forward applicable for all kind of yes/no coin flip like questions. For example, Heat fans would probably like to get an idea about this: Hassan Whiteside allows only 40.2% on shots at the rim but played just 287 minutes up to now. How likely is it that his rim protection is better than that of an average player like Timofey Mozgov? Or: Some defenses allow a much lower percentage on uncontested three-pointers. Is this more likely luck or voodoo?

You can find the respective R code (less than 100 lines), for calculations and visualizations here. Feel free to use it for your own enjoyment. Before I leave you alone for today, I want to answer some of your possible critique points preemptively.

A List of Caveats

First of all, I know that life is not a coin flip. While this of course is true, there are several advantages to looking at it like it is. The biggest one is that it is by far the most simple way to look at it. While there are certainly models that are more realistic, you then would have to add further very complicated assumptions during your ongoing steps, making any statements about likelihoods much more complicated.

Speaking of much more complicated: The thing is that good shooters are usually asked or required to take more complicated shots. So, that Kobe and Shawn Marion have almost the same career three-point percentages does not mean that they are similarly good shooters. So, you have to read it as something like: Player A takes context specific shots that leads to him being a significantly above/below average shooter. But that doesn’t really roll of the tongue, right?

The second thing is that stabilization per se completely neglects the idea of player development, injuries or just general fatigue. But this is as well the case for the stats that I previously mentioned. And arguably more so, because these methods often require much larger sample sizes, where slumps of injuries or anything get mushed into one sample. For example, Nicolas Batum has a torn ligament in his shooting hand to explain the differences between this and last season. So we usually want to say: During this time interval, Player B was a significantly above/below average shooter. If you combine this point with the last one about context-specificity, you are starting to get a tongue twister.

The last thing we need to address is something called ‘Multiple comparisons problem‘. For example, if we look at 300 players and set a threshold of 5% for statistical significance, around 15 players would be expected to be significantly above or below average just by chance. A possible example for this phenomena could be Trevor Ariza. Looking at his yearly three-point shooting percentages, it could well be that his good shooting percentage last year is basically an outlier by chance. Because someone has to get lucky when it goes about coin flips.