Not Everything That Can Be Counted Counts. A Quick Note On Meaning And Misleading Stats

facebooktwitterreddit

Jan 11, 2014; Washington, DC, USA; Washington Wizards point guard John Wall (2) shoots the ball over Houston Rockets shooting guard James Harden (13) in the first quarter at Verizon Center. Mandatory Credit: Geoff Burke-USA TODAY Sports

Nobody likes to be #WellActually’d[1. Though some of us have a habit of dishing them out with some frequency. Guilty as charged. Sorry, not sorry?], and I’m not trying to do that to Mr. Jacobsen here. Rather, I’m using him as an example of something which crops up with some frequency in stats-based discussion of the NBA; the use of highly flawed and radically misleading data. We’ve talked a fair amount about the dangers of reading too much into stats like individual shooting “allowed” on jump shots. Data from human sources like Synergy need to be taken with a grain of salt, simply because of the number of ambiguous interpretations that go into coding plays. Was that play a spot up or off a screen? Who was the primary defender? In many cases those things will be open to interpretation and while neither interpretation is wrong, the need for a judgement call all but ensures some inconsistencies.

In many cases, the over-reliance on stats which don’t quite say what we think they do is understandable. Especially on the defensive side of the ball, some numbers have to be better than no numbers at all, right? Well, actually[2. DAMNIT!] an incorrect, misinterpreted or misleading number can easily be worse. This is especially true on defense where the skill and luck elements which go into the outcome of most possessions are difficult enough to unwind at the team level before we even get into the sticky wicket of assigning individual credit. Strict adherence to partially understood metrics could easily lead to claims like Tyreke Evans is shutdown defender[3. I don’t think so, and neither should you.], or Nick Young is great wing stopper[4. Nope, nope, nope.]

But those are cases of misinterpretations. In some cases, the data we end up with is straight up wrong. For example, the “detailed” shooting stats available from NBA.com  are borderline useless. Using the example of James Harden. He is indeed listed as having hit 26 of 43 “Step Back Jump Shots” from three point range. He’s also listed as 19-36 (52.8%) on “Pullup Jump Shots” from deep, not to mention 1-1 on three point “Pullup Bank Shots.”  Meanwhile on regular old jump shots he’s 116 of 358 from three, for 32.4% So either the numbers are off, or Harden should always take a dribble before shooting from deep.

What’s really going on here is that the underlying data for these shot types is being pulled from the official play-by-play feed. Which is still done manually with several scorers sitting a table and assigning credit or blame as needed. It’s simply a fact that these “descriptors” are more consistently added on made shots than misses.[5. Though they can be missed the other direction as well — this play was scored as a jump shot and hnt a pullup jumper for Harden, for example. Without diving in and watching every shot, how are we to know any possible rhyme or reason for the inclusion or not of the descriptors? And if that level of audit is needed, the data isn’t actually useful, is it?] Which is understandable, a shot, a miss, a rebound and a possession starting the otherway is a lot to keep track of, whereas the pause for breath after a made bucket gives the scorekeeper a little time to add a little flavor to the official record. The end result is “fancier” plays end up looking more effective than they actually are if one went strictly off the PBP feed.

So, when I say the numbers are wrong, I’m not suggesting NBA.com has a bug, or NBA Savant is collating information incorrectly. Rather the underlying data is flawed to the point of unusability for analytic purposes. Knowing this can prevent saying something profoundly silly like “John Wall is shooting over 80% on pull-up jumpers this season!” And in fact when running across a stat which confounds prior expectation and observation to that degree, re-checking the underlying data is a better first step than announcing to the world you have discovered cold fusion.