Deconstructing a New SportVU Metric: Part 2

Feb 6, 2015; Houston, TX, USA; Houston Rockets guard James Harden (13) boxes out Milwaukee Bucks guard Khris Middleton (22) in the second half at Toyota Center. Rockets won 117 to 111. Mandatory Credit: Thomas B. Shea-USA TODAY Sports

Last week I began unveiling the rationale for, and elements of, a new metric I’ve been building. That post focused on the offensive side of the ball. Here we will examine the defensive elements and start looking at some of the composite numbers.

Defense

Shot defense

For most of NBA history, we have had pretty terrible estimates on shot defense. We only have blocked shots—often misleading for guys who leave their feet too much and basically worthless for perimeter defenders. Thus, the new SportVU data is a massive improvement for a previous blindspot. In fact, a recent Sloan conference paper uses the full tracking data to assign defenders more accurately and divvy credit based on who affects the field-goal percentage and volume significantly. I have something more basic, using the shotlogs to see who the closest defender is for every field goal, how far away they are, and what the shooter’s usual percentage is from that zone.

But this is far from perfect. The SportVU shotlogs are filled with errors, like miscoding field goals like three-pointers and how using nearest defender often means the player who gets the block doesn’t get the credit. Another issue is that there’s a lot of noise in shooting percentages. This depends on how you slice the data; but looking only at shots where the defender is at least within 5 feet, you can see strange results like David Lee ranking better than David West, well-respected defenders rating below average like Trevor Ariza and Mike Conley, and Amare Stoudemire looking like one of the best shot defenders in the league. The problem is that shot defense is so indirect. You can contest well but the player can still hit the shot. Does the defender deserve the blame?

For a comparison, I’d point to the BABIP statistic in baseball: batted average on balls in play. It tells you how often a ball hit in play (not home runs) falls in for a hit. This is a telling stat for pitchers, as BABIP is usually pretty stable over a long period of time but can have pretty strong short-term fluctuations. If a previously mediocre pitcher is having a great season but his BABIP is unusually low, one may chalk this up to luck. Conversely, pitchers with high strikeout rates are even more valued now because strikeouts are more stable and the pitcher generally has more control over them.

I fear shot defense in the form of opposing FG% is more like BABIP. We might be putting too much stock into noise, and it could take another year of data to understand the patterns and figure out which players, and why, control opposing FG%. Our old standby steals, and perhaps blocks, are more like strikeouts. Players influence those rates more directly and they have a high degree of correlation even when switching teams[1. Except, arguably, for a case with blocks where a player switches positions.]. There’s an additional issue analyzing these shotlogs. Players with a healthy opponent FG% will correlate well to a measure like RAPM because it includes a missed shot, which helps a player’s defensive RAPM, but that doesn’t necessarily tell you that the defender deserves the credit.

Nevertheless, there is probably some amount of value in this shot defense measure. While there’s a few whacky results at the top like Amare and Quincy Acy, the list is mainly guys like Duncan, Hibbert, Noah, and Gobert. Based on my initial analysis, this shot defense measure works best by layering the results: points saved via who’s the nearest defender, points saved where the defender is within five feet, and points saved where the defender is within three feet. Using my conservative method, each of those has a corresponding coefficient of roughly 0.15 or more. What does this mean in practice? You get dinged a little when get a guy hits a shot and you’re far away. But if you’re within 3 feet, you get 45-50% of the blame, broadly speaking. This helps players who stay close to their defenders and it helps interior defenders because most shots where the defender is close are near the basket. Lastly, in one version of the model I included a fourth SportVU variable: defender is within 5 feet and the shot is within 9 feet of the basket. I’m still not entirely sure if defenders should get more credit if the shot is near the basket.

For other relevant variables, I didn’t find the rim protection metric you see on stats.NBA.com to be any better than the ones I created. What helps, I think, is that my stats include shots all over the court and I calculate it based on the shooter’s typical field-goal percentage. Blocks are the old-world stat for rim protection and shot defense, but they don’t appear to be very useful when you have shotlog data. I didn’t use pure blocks per possession at all because in most forms of the model they were negatively correlated with better defense. But I don’t trust those results, and I think it’s because blocks are more likely to be rebounded by the offense. However, shot defense FG%, like I discussed, could be fool’s gold and blocks are controlled by the defender more directly.

I did include one block related stat in the final models, and it’s an interaction variable that’s held up well in long-term models too: blocks divided by personal fouls. Years ago, I remember that I regularly checked the leaderboards on ESPN on a few miscellaneous stats like Blk/PF. It might be a good proxy for rim protection: the blocks show you’re defending at the rim and the low foul rate states that you’re doing so with skill. But it’s a lot less valuable than the shot defense stats — we’ll see how that holds up the rest of the season.

For an example, here’s the top ten this season (minimum 1000 minutes) before the all-star break for points saved per 100 possessions when the defender is within 5 feet of the shooter. Points saved is calculated from the difference between a player’s usual FG% in a zone compared to the real world result when the listed defender is the nearest one. Whiteside and Bogut didn’t play enough minutes to qualify; they were at 4.26 and 4.04 points saved, respectively. Also, this is a pretty noisy stat and the low extreme minute players were all over the place.

Player…………….	PtsSaved100 within 5 ft.	PtsSaved100 overall	Blk/PF
Rudy Gobert	4.35	4.15	1.01
Roy Hibbert	3.15	3.23	0.61
Draymond Green	3.09	3.36	0.44
Tim Duncan	2.64	2.54	0.89
Anthony Davis	2.51	2.80	1.33
Marc Gasol	2.44	1.78	0.66
Serge Ibaka	2.39	2.70	0.75
Josh Smith	2.37	1.63	0.52
Taj Gibson	2.37	1.98	0.46
Alex Len	2.33	2.09	0.49

Turnover creation

Turnovers are an important component of most successful defenses, but for most of NBA history we’ve only considered one class of turnovers: live-ball turnovers (or steals.) Of course, steals are quite valuable, and they correlate reasonably well to other aspects of defense like deflections. I don’t use pure steals though; it’s an interaction variable where I take the product of height and steals. Essentially, this makes a steal from a big man slightly more valuable. (Some people take issue with using non-basketball information like height. I might actually use position next time, but even if height isn’t perfect, it’s still useful, and no NBA stats are perfect.)

But there’s one more completely trackable stat that most people ignore, and it’s a different type of a forced turnover: drawing an offensive charge, which can include but is not limited to charges. This was the most surprisingly valuable stat, and even with harsh constraints on the coefficient there’s a one-to-one value in drawing an offensive foul. Actually, this is a stat that explains much of the value from the so-called “no stats all-stars.” The most cited players who fit that description are Shane Battier, Nick Collison, and Jason Collins. What do they have in common? They drew an extremely high number of offensive fouls, and there are a few plus-minus mavericks in today’s game who do the same, from Ricky Rubio and Beverley to Nene and Wesley Matthews. As an added benefit, remember they’re not just forcing a turnover but putting an additional foul on an opponent, who is often the team’s most valuable playermaker. As a side note, I’ll note that when I split these fouls by charges and non-charges, charges were less valuable. I’m not sure why; I don’t know what’s correlating with non-charge offensive fouls drawn. Plus players are more in control of charges than other fouls. Further improvements can be made in the turnover realm, like forcing a pass or player out of bounds and shot clock violations, but including offensive fouls explains a lot of what people were missing about a wide swath of underrated defenders.

Rebounding

I had the same issues with defensive rebounding that I had with offensive rebounding. I’m not sure if this is a recent trend or a sample size issue. And I couldn’t find any higher valuable for contested rebounds than uncontested ones or a significant variable using rebounding chances. However, based on my previous research, I have a version where I assume contested rebounds are worth three times as much as uncontested ones. This is where the testing for the rest of the season will be helpful. Also, there are other issues here like with the diminishing returns effect on strong rebounding clubs and missing information on box-outs. In most older metrics, defensive rebounding is overrated partly because we don’t have the full picture but also partly because it’s a team-wide effort and largely depends on your defensive responsibilities.

Miscellaneous

One glaring criticism I have of the metric PER, and a few others like it, is that it debits fouls. On a surface level, it makes sense: if you foul someone, you can put your team into the penalty or give up foul shots. However, there are many fouls that stop plays like easy layups or transition opportunities. Plus, high foul players are typically big men, and extreme low foul players are too often inactive on defense. The effect in my model is small, and there’s actually another nonlinear term that involves fouls, but models shouldn’t simply subtract fouls and we could get even more useful info by using foul types or foul location. Stopping DeAndre Jordan at the rim, for instance, by fouling him is one type of rim protection.

As discussed earlier, MPG has some benefits, but the effects are even smaller on defense. I assume this is because bench players are usually better on defense than offense. However, there’s another reason one should take MPG into consideration: it means that, generally, you play against better players (i.e. other starters.) Lineups are mixed in the NBA so the effect is small, but that’s another cause. (You can also try games started divided by games played.) Lastly, I found an odd little non box score stat that appears to be a decent proxy for punishing gambling defenders. It has a modest effect too, but it also worked in a multi-season model.

Team adjustments

Even with all that information, the defensive metric is still, frankly, a mess. It’s really tough to judge the worth of conservative defenders who don’t pick up a lot of steals or other flashy stats. Defense is extraordinarily tricky because it’s partly about indirect effects. You don’t control how well your man shoots — you just contest and hope he misses. It’s a team-wide activity as well. If you screw up a rotation, your defender can drive to the basket and you won’t get blamed via the numbers if that leads to a layup because someone else will be the nearest defender. The pick-and-roll is the most popular play in the league, and defending it is all about how well you orchestrate things with your team, and if you do things well you likely won’t see the benefits in the basic stats. There are a long list of effects that are tough to capture and seemingly impossible with public data like denying touches to a star scorer or forcing someone to take a midrange shot instead of a more efficient option.

These limitations force people to use a team adjustment of some kind for defense. The box score metrics like Win Shares, BPM, and WARP all do it, and ESPN’s RPM is a team adjustment in a different sort of way. I use a team adjustment too, but I’ve found a couple tweaks that work well in giving out the credit more appropriately. There are overfit concerns here, and there’s the “David Lee” problem of a poor defensive big man on an elite defensive team who’s usually overrated, but the performance of the metric is so much better that I have to consider it. This is where better out-of-sample testing will be required, by the way, and I’ll see what effect this adjustment has on the rest of the season.

Applying the metric

I have a couple versions of the metric I’ll test with a set of numbers for everyone. Since this is a work in progress and there are a lot of moving parts, I’ll just provide examples from a couple of teams for what I’m doing and what the numbers look like.

Below are the minutes for every player from this game with the Warriors in Milwaukee as well as their estimated ratings. A few notes on these ratings. Giannis might only need a couple dribbles to get across the court, but he needs to smooth out the rough edges, like reducing his turnovers. Michael-Carter Williams looks worse though — he was the lead point guard on a historically awful offensive team, and I’m only using numbers before the all-star break. Middleton actually looks like their star. (Knight, by the way, has a rating around +1.)

Milwaukee Bucks

Player	MP	Off	Def	Tot
Giannis Antetokounmpo	41	-0.99	-0.16	-1.15
Michael Carter-Williams	30	-2.45	1.55	-0.91
Jared Dudley	30	0.41	0.75	1.16
Khris Middleton	30	0.64	2.71	3.34
Zaza Pachulia	25	-2.01	2.57	0.57
Ersan Ilyasova	35	-0.30	1.61	1.31
Jerryd Bayless	25	-0.28	0.12	-0.16
Tyler Ennis	15	-0.03	-0.36	-0.39
John Henson	9	-0.60	2.35	1.75

The Warriors have virtually no holes in their rotation. (I’d say their biggest weakness, depending on how you value Ezeli, is not having a dependable big defensive center on the bench since Bogut regularly gets injured.) Curry has the best rating in the league; his high steal rate helps his defensive score. Green, via my metric, was the most valuable defensive player before the All-Star break, given his rating and high minutes played. The metric likes Iguodala on offense because of his passing efficiency and he’s still a capable stopper. Justin Holiday’s rating would have been higher if I hadn’t used a simple adjustment for players with minutes under 700 — they have a wealth of talent. David Lee’s defensive rating, however, seems like an error. It’s tough to parse out who’s a bad defender among the high rebounding guys on good defensive teams through the numbers.

Golden State Warriors

Player	MP	Off	Def	Tot
Draymond Green	40	0.01	4.06	4.07
Stephen Curry	37	5.93	1.37	7.29
Klay Thompson	36	3.23	-0.26	2.97
Harrison Barnes	32	0.39	-0.76	-0.37
Andrew Bogut	20	-0.99	4.97	3.97
Andre Iguodala	30	0.06	0.87	0.93
Shaun Livingston	27	-0.43	0.80	0.37
David Lee	7	0.40	1.82	2.22
Justin Holiday	6	0.29	1.65	1.94
Festus Ezeli	5	-0.54	-0.48	-1.03

Based on those numbers and the actual results, along with a simple homecourt advantage estimate using just the efficiency differential between home and away teams before the break, I have an expected offensive rating from the Bucks of 93.8 and 102.6 for the Warriors. The actual numbers, via stats.NBA.com, were 93.9 and 104.1, respectfully. That’s a total error of 1.4. Of course, I’m not endorsing this as a method to predict games — knowing the minutes distribution is huge, for one — but it’s a way to check what I’m doing and, most importantly, figure out how and where the metric fails. (I can also do this with lineup data.) There’s an axiom in science that failure is important because you learn more from it.

For some more numbers, here’s the top 30 (sorted using a simple method: minutes multiplied by rating) for the 2015 season before the all-star break.

Player	MP	Off	Def	Tot
Stephen Curry	1695	5.93	1.37	7.29
James Harden	1940	5.29	0.28	5.57
Chris Paul	1871	5.07	0.73	5.80
John Wall	1929	2.96	1.32	4.29
Kyle Lowry	1858	3.09	0.99	4.08
Damian Lillard	1925	2.83	0.62	3.46
Anthony Davis	1645	2.34	1.96	4.31
LeBron James	1649	4.66	-0.37	4.28
Draymond Green	1631	0.01	4.06	4.07
Monta Ellis	1849	2.23	0.95	3.18
Jeff Teague	1573	3.26	0.61	3.87
Jimmy Butler	1927	2.23	0.15	2.38
DeMarcus Cousins	1381	0.11	3.94	4.04
Kyrie Irving	1961	2.85	-0.59	2.26
Russell Westbrook	1276	4.21	0.19	4.40
Wesley Matthews	1801	1.73	0.76	2.49
Klay Thompson	1623	3.23	-0.26	2.97
Danny Green	1577	0.94	2.00	2.95
Paul Millsap	1764	0.70	1.55	2.25
Eric Bledsoe	1795	1.95	0.14	2.09
Khris Middleton	1353	0.64	2.71	3.34
Marc Gasol	1796	0.76	1.26	2.02
Gordon Hayward	1865	2.38	-0.53	1.85
Kawhi Leonard	1121	1.13	3.11	4.24
Tony Allen	1134	-0.16	4.22	4.07
Rudy Gobert	1160	0.34	3.37	3.71
Ty Lawson	1886	2.62	-1.34	1.28
Tyson Chandler	1597	1.20	0.65	1.86
Kyle Korver	1761	1.97	-0.50	1.46
Jrue Holiday	1247	2.18	0.64	2.83

It’s a table dominated by point guards, but there are still a few high-quality forwards and center who just didn’t have the playing time to rank higher. The list, however, is full of well-respected all-stars and other high caliber players like Matthews and Danny Green. Monta Ellis is probably the biggest outlier; the model loves his ability to draw an offensive foul and get steals. I have two last relevant notes: Chandler was probably a more deserving all-star selection from Dallas because Dirk’s defense looks quite poor now, and Gordon Hayward is one of the more overlooked players in the league.

Conclusion

There’s still a lot of work to do with SportVU data. One hugely important point is that we don’t even know how consistent some of these new stats are. If there’s no correlation year-to-year between things like the rim protection field-goal percentage, then we have no use for it. Thankfully, there is some consistency, but we need to lock this down with precision and study more examples of what happens when a defender changes teams.

One possible breakthough we could have with the data is an understanding of how a player’s value can change with a different number of touches and shot attempts. Some guys can see their value stay the same, or even increase, when they’re traded to a better team, like J.R. Smith to the Cavaliers, based on their particular skillset. Guys who need the ball in their hands can have issues playing next to other playmakers, like Rondo with the Mavericks. With this detail from the tracking data, we can pin down how offenses work structurally and which players carry the burden.

Defense is tougher to solve. It’s the indirect nature that creates so much noise. It’s tough to tell through the data if a player misses a shot because he was well-defended or it was simply luck. The public defensive data uses nearest defender, which is fundamentally problematic because the nearest defender could be behind the shooter or it could be a help defender who doesn’t deserve the blame. Nevertheless, it’s still somewhat useful, and it gives a huge boost to interior defenders.

The challenge here is integrating these new stats with our previous information. It’s tough figuring out what’s important when we still don’t even have two years of data. Plus, the public stats community hasn’t even fully utilized the extra information available in play-by-play logs, like charges. If you look at something like PER and break it down component by component it looks intimidating, but box score stats are only the tip of the iceberg; so much more is possible.

I’m not arguing for the unquestioned status of this metric. Rather I think it’s an important tool to figure out what’s important and what the scale is between different types of actions on the court. You can see the tradeoff between inside post-scorers with no range and stretch 4’s or why unselfish play (passing) matters for your top scorers. People who look at the results of a metric usually look at the top players and then immediately write off the entire metric because they don’t agree with the rank of a couple of players. That’s no way to learn or to be properly skeptical. Instead we should look at the results and ask why. What does he do well/doesn’t do well that the metric likes? Are there exceptions? What are the numbers missing? And how important is that missing information?

I’ll be tracking how well the metric does (in a few different forms) the rest of the season. Luckily, the frenzied trade deadline provides me with a more interesting dataset. And hopefully I can learn enough by the end of the season that I can completely disassemble what I’ve done so far and build anew.