Feb 21, 2015; Houston, TX, USA; Houston Rockets forward Josh Smith (5) shoots the ball during the first quarter against the Toronto Raptors at Toyota Center. Mandatory Credit: Troy Taormina-USA TODAY Sports
Freelance Friday is a project that lets us share our platform with the multitude of talented writers and basketball analysts who aren’t part of our regular staff of contributors. As part of that series we’re proud to present this guest post from Johannes Becker. Johannes is interested in basketball and statistics and is a PhD student in bioinformatics. You can follow him on twitter @SportsTribution and on his blog, SportsTribution.blogspot.ch.
The (semi-publicly[1. If you are a little bit familiar with programming, check out these snippets for Python or R]) available data for all kind of NBA stats is becoming more and more amazing. Starting from the 2013-14 season, it is relatively straight forward to get the shot distance for every shot an NBA player took. The question at hand is, how to make use of this wealth of information.
Two of my favorite techniques at the moment are heatmaps and hierarchical clustering[2. You can find another example on my blog, where I made a case for leaving the Pick and Roll ball handler alone]. They are intuitively understandable and allow you too look at a lot of information at the same time (the danger of course is that if you look at them for too long, you start seeing things that do not exist). They are perfect for anything that has a distance or temporal component, as you will see in the following.
Here, I was simply interested in finding player clusters after shot frequency. In the following plot, red denotes a low shooting frequency and green, a high frequency. The leftmost column is indicating the number of attempts, red meaning less and green meaning more attempts. I only show players with at least 300 attempts so far. This way, you get a good first indication of all types of players, but it doesn’t become too messy.
It is still a bit messy. On the plus side, you are now looking at more than 30 observations for 208 players. Or as John von Neumann said ‘If people do not believe that mathematics is simple, it is only because they do not realize how complicated life is’.
The two white vertical lines you see are indicating shots that are only possible as corner threes. This shows that Josh Smith likes 22-foot two-point shots much more than 22 foot three point shots (this was my obligatory Josh Smith joke of the day). I am rounding the available data from NBA.com (example page). I don’t know why dunks do not count as 0 feet shots, but that’s not my decision. In case you are color-blind, just shoot me a message and I’ll make a black and white version 😉
We can now separate players into groups of shooting distance similarity. You can cluster the players in as many groups as you like and there are also rules on how to do so. I used the dendrogram as a vague indicator for group limits and selected 10 groups. Here they are:
1. “We are bangers or pick and roll rollers and only shoot from further than 6 feet if you twist our arm”
I guess the lengthy group name says it all. I think it is interesting to note that besides Andre Drummond and Greg Monroe, none of the real ‘only close range players’ has much more than 300 attempts. There is a limit to how often you can go point blank per game. Enes Kanter and Taj Gibson have a few more attempts, but…
2. “We bang, but there is no need to twist our arm for a long two”
… they are on the edge of being in this group. In this group you find the typical high scoring big men (Blake, the Brow, Nene, Z-Bo…). Side note: It is interesting to see the few jumper-less guards that got into these two clusters—Tony Allen, Elfrid Payton, Rondo & MKG)
3. “We really like the long two more than we like the three”
This group is less frequently at the rim than the previous two. It consists mostly of guards without a high three-point frequency and bigs like Aldridge, Marc Gasol and Serge Ibaka that are decent enough midrange shooters.
4. “We have an old school post-up game and we are not afraid to use it”
This group can easily get lost if you don’t use the right distance resolution. But in comparison to the bangers, their highest frequency is around five feet instead of 3. Players you see here are Brook Lopez, Roy Hibbert, Al Jefferson and old man Duncan. I did not necessarily expect Kenneth Faried or Giannis in this group. I have no idea what Tony Wroten is doing here, other than that 76ers guards probably shoot floaters way more often than other players.
5. “We have probably the most normal shot distribution”
A lot of threes, sometimes getting to the rim with a sprinkle of mid-range. There are a lot of different ways how you can get this shot distribution and there are definitely subgroups in there. Like the cluster that goes from Bojan Bogdanovic to Mirza Teletovic that shows a higher frequency in corner threes.
6. “We are similar to 5, but without the drive”
A group of shooting guards or secondary ball handlers. Only Jamal Crawford and Caldwell-Pope are players with a lot of attempts (Nick Young missed a few games).
7. “We only allow future Hall of Famer into this group”
Those three combine for a lot of 15-footers. Speaking of 15-footers: Looking at the data it seems like 50% of all shots between 13 and 17 feet this season are taken by Nowitzki, Paul, Bryant, Marc Gasol, Dwayne Wade and LaMarcus Aldridge. Something to keep in mind if you want to measure shot quality after distance.
8. “We are a bit everywhere”
This is a very mixed group and a lot of players could as well be in group 5 or 6. Let’s just say that’s what happens with clustering.
9. “We put ourselves in the corner”
The group with the most corner threes. Interesting to note that only two players in this group have a peculiarly high amount of attempts (Arron Afflalo, who still has a lot of midrange shots and Trevor ‘I am the definition of MoreyBall’ Ariza). You can only shoot so many corner threes per game…
10. “We are Channing Frye”
The definition of a stretch four. Almost seven-feet tall. 73 percent of his shots are threes. Pro: He makes 39% of them. Con: He has an offensive rebound percentage of 1.3%.
I hope that those plots are understandable and you liked it. I don’t know if these plots taught us anything, but I think that they are a good starting point to summarize and understand certain information. For example, which players have a similar playing style. The amount of follow-up questions is (as always) sheer endless and I will try to tackle a few of them in the coming weeks.