NBA Positions by Clustering
By Justin
Positions are notoriously difficult to pin down in the NBA, unlike a few other sports with fixed positions like baseball. Positions are essentially fluid and bleed into each other with definitions that vary based on who you ask. Even some of the best position estimates, like the ones derived from play-by-play logs found at 82games.com, basketball-reference, and now NylonCalculus, depend on some initial ordered set to determine who’s at which designated position for every lineup. You either have to subjectively choose that order, which is prone to bias and can be time-consuming depending on the number of players, or you use stats like assists or height, which can lead to obvious errors for players with unusual skills like LeBron James’s passing or for short centers like Chuck Hayes. Building on my philosophy that position is mostly about who you guard, I used a statistical clustering tool to group players together based on the shots they’ve guarded, according to data from stats.NBA.com.
Method
To prepare the data for clustering, I needed to form it into something usable. I created a list of players from 2015 who were listed as the nearest defender for at least one shot, and then I calculated the average height and weight for the shooters that particular defender “defended.” I did not want to use something like height to muddy the results, but it actually skirts the real issue: calling a player a center because of his height can be incorrect because that’s only one data point and says little about the actual player. However, pooling all the players you defended together and taking an average is surprisingly accurate, and using both weight and height is even better.
With the kmeans function in R, you can set a number of cluster groups and quickly receive a list of positions for every single player[1. A different clustering method, like where group sizes are set to be equal, might be more appropriate, but this method worked surprisingly well right away.]. For the more granular results (i.e. not rounding all the positions to whole numbers), I just computed the distance from the centroid — this is useful for hybrid players who find time at multiple positions. The results were surprisingly accurate as compared to a more colloquial understanding of positionality, even for players with just a handful of defended shots, but I regressed the stats to the mean[2. The weight was 15, so a player needs at least 15 field goals attempted to have an even split between listed position and calculated.] so a player with only one field goal defended would mostly just be labeled by his listed position (via basketball-reference.)
Naturally, you can reverse that path and calculate positions that are essentially offensive — the average cluster position of the players defending your own field goals. The results can be obtained quickly, and they become stable a lot faster than expected — very useful with new data and when a season is ongoing.
Lastly, because effective position is about who you play with in a lineup, I grabbed lineup data from stats.NBA.com and calculated positions using the initial set of clustering positions for lineup ordering.
Results
The results are packed below in the table. There’s a lot of information included because this is more of an exploratory mission. Total shots refers to the sum of both field goals taken and field goals “defended” (i.e. being listed as the nearest defender to a field goal.) The next five columns are the results from the clustering function where “position” is just the average of OffPos and DefPos. Defensive positions are based on who you’re defending — if you’re the nearest defender largely for centers, then you’re a center. And thus, offensive positions are based on who’s defending you. The last columns have minutes played for each position with a final weighted average for position[3. A careful observer will note the minutes don’t accurately reflect the actual season totals for every players; this is due to imperfect lineup data.].
Most of the results conform to conventional wisdom, but numbers that show you nothing new aren’t very valuable — there were a few surprising results too. Interestingly, Chris Paul is listed as more of a shooting guard on offense than JJ Redick; that says something about who guards Paul and his ability to play bigger despite his listed height. The Clippers were fascinating in this respect. DeAndre Jordan on offense was treated more like a power forward, which is either an error due to his low usage and how often he shoots in a pick and roll or it says something about how teams use a bulkier, longer post player for Griffin.
DeAndre didn’t have the largest disparity between his offensive and defensive positions, however. That distinction belongs to Nikola Pekovic, whose nearest defender was on average much larger than the players he defended on the other end of the court. His teammate Kevin Garnett was the opposite: because he’s only a spot-up shooter now on offense, he’s treated like a power forward but on defense he looks more like a center. A few guards like Aaron Brooks, Cory Joseph, and Dennis Schroder among others also had large disparities; cross-matching is common in the backcourt.
LaMarcus Aldridge is another good example of how positions can slightly change on offense or defense — teams often use bigger defenders on him because of how long he is, but he’s nimble enough to match-up against power forwards and sometimes even smaller players. And if you question Charlotte’s implication that Batum will be used sometimes at shooting guard, checking the numbers will see that he was already a part-time shooting guard in some respects.[1. Ed. This might also have something to do with the fact that Portland’s titular two-guards last season, Wes Matthews and Arron Afflalo were both far more likely to play in the post than Batum and thus were possibly guarded by the bigger of the opposing wing players a fair amount.]
As a final note, the extreme ends are mostly predictable but Pau Gasol’s high position number was intriguing. He’s famously a power forward, but at this point in his career there’s very little evidence he’s anything other than a center. The numbers are also probably influenced by Pau’s inability to guard outside the paint and his tendency to drop back, but that’s just more evidence he’s not a power forward anymore.
Conclusion
The results here are solid despite the limited data — this is only using nearest defender data, which does not accurately reflect who’s guarding whom for most of a possession. With full SportVU data, position labeling can be more accurate and enlightening. Even now, there are a few obvious errors on major sites, like assuming Harrison Barnes is the power forward when he plays with Draymond Green or LeBron James’ time as a “power forward” where Battier did not receive due credit for battling larger players inside. For decades, we’ve depended on manually listing positions and arguing over the details, but here’s an objective method that can quickly sort through hundreds of players without human bias.
And Popovich, you can’t hide the secret any longer — Tim Duncan is a center.