Nylon Calculus Week 20 in Review: Projecting young players
By Justin
We’re nearing the home stretch for the NBA season, which is unfortunate — I don’t really want this season to end. There’s a lot to consider, and I don’t think the season was as predictable as people feared.
The Eastern Conference is a mess, and I hope people are at least entertaining the notion that we won’t see LeBron James in the Finals again. We could get a surprise. And once again, the Western Conference is marching towards some monster playoff series — I hope this time we get Golden State versus San Antonio. But first we have to finish the regular season, and with that, let’s look at the past week in basketball.
Time to talk about the MVP race
Every year, this inevitable argument appears, and there’s no quelling it: player X should not be considered for the MVP because his team is mediocre/below average. It holds up to historical precedent, as we’ve only had two MVPs from a losing team. First of all, we really should not be determining how we vote on the MVP award now based on past results. If we believe the game can evolve and grow over the years, so should the award. There is nothing written about the voting that we should have to emulate past results; that’s very limiting.
Secondly — and this is something I’ve ranted about before — this is the data-ball era and we have access to an unbelievable amount of video and data. You can watch plays from every relevant player and look up an incredible amount of statistics. We don’t have to boil it down to, “Hey, this great player is on a great team, so he must be the greatest.” We don’t have to rely on how people voted decades ago when Finals games weren’t even shown live.
Read More: Paul George is settling for bad shots
We should also absolutely disregard this notion that the most valuable player can’t be on a mediocre team. Individual players aren’t worth 50 wins on their own; we should all understand that. Kevin Garnett was kicked out of the MVP race from 2005 to 2007 because his teams missed the playoffs; then suddenly he reappeared in the race in 2008 because he got a teammate upgrade. We have enough information now that we should be able to separate a great player from poor teammates — let’s not vote based on our ignorance.
I’m stating these things because Russell Westbrook has a historic box-score stat season, but his team isn’t near the top in the standings. I wish we could have a real discussion of his value because we have never seen a season like this in the modern era, instead of people resorting to the boring argument, “Well, James Harden’s team is so much better, so let’s play it safe with him.”
By my metric HBox, the MVP race is a close one between James Harden in the lead with Russell Westbrook behind him. (Remarkably, you can see in that google spreadsheet that he’s using 41 percent of his team’s shots, while his true shooting percentage is below average. That’s a weird combination that’s tough to quantify and compare to players on great teams). However, Westbrook is the runaway leader by Basketball-Reference’s Box Plus-Minus while being far behind in Dredge, which tabulates a bunch of miscellaneous stats from play-by-play logs. I’m not quite sure what to make of Russell’s bizarro stats, and it makes for a great discussion. I’d also love it if we could discuss Giannis Antetokounmpo — he should at least get top 5 consideration, but his team isn’t in contention. The NBA is complex and deep; we shouldn’t discard a player’s value just because it’s the way things have been done. Let’s be better than that.
Rookie race update
This is one of the most depressing Rookie of the Year award races ever, and there’s a lot of discussion out there about how to properly value Joel Embiid’s abbreviated season. He was dominant and no other rookie has played anywhere near his level, which was legitimately All-Star tier; but he’s only played 786 minutes. How do compare him to steady hands like, say, Malcolm Brogdon? Well, I actually built an MVP Index for this very problem, and this is the ideal situation: you can see the Index here on the updated Dredge and HBox tabs. The index is nonlinear; it’s giving very little credit for playing at a mediocre level or below. But you get substantially more credit for playing at a star-level.
To save people some scrolling and reading, yeah, Embiid is first in HBox for rookies: he’s rated +1.9 by the metric per 100 possessions and, adjusting for his minutes, no other rookie is close to his MVP index. Dario Saric and Dave Bertrans (San Antonio guy with flukishly high shooting percentages) are the rookies under him. In Dredge it’s the same story where he’s rated +0.78; Brogdon is second many rows below. Those metrics are probably underrating Embiid too because they regress to the mean based on a player’s minutes. So if you want pure, elite value, The Process is a sane choice backed up by statistics.
Clutch: Repeatable skill?
If someone ever asks why we shouldn’t take a team’s clutch performance at face value, my go-to example will now be this season’s edition of the Warriors. Last year, they were one of the greatest teams in close games ever — arguably the greatest ever. In clutch situations, as defined on stats.NBA.com, the team had a net rating over 38 — that’s bonkers. It means they were outscoring teams by 38 points per 100 possessions at the end of close games. This season, however, their clutch net rating is near the league average, which is actually below average for a team of their caliber: it’s less than their net rating at other points of the game by a large amount. So yes, I don’t believe “clutch” is an innate ability and something teams or players can replicate, with only a few exceptions. Unless we have years of evidence showing otherwise, if a team is outperforming expectations in close games, we should regard it as randomness.
Nurkic fever
After a trade deadline swap, a team has been renewed by virtue of a new center, and no, surprisingly it’s not the New Orleans Pelicans:.Jusuf Nurkic has seemingly transformed the Portland Trail Blazers. They’ve had a disappointing season with the second biggest payroll in the league at nearly $120 million (Nurkic is only making $2 million). Portland has Nurkic fever, and it has people second-guessing his issues in Denver. Beyond small sample size theater, I think the fever is signaling at something real here. He has security as their lead big man, and they regularly looked to create through and for him. That alone can aid a player in rediscovering his mojo.
More specifically, beyond his renewed effort on defense, what’s distinct about Nurkic’s game now is his assist rate. He’s not even shooting the ball much more often than usual; it’s just that his assists have doubled. I wouldn’t mark this as an aberration yet either. The same thing happened with Mason Plumlee — his assist rate doubled in his first season as a Trail Blazer after two years with the Nets. Coach Terry Stotts stated that he wanted to run some of the same sets they used to run with LaMarcus Aldridge, and he’s filling in the same role Mason Plumlee had, where Damian Lillard and CJ McCollum can play off-ball instead of being full-time ballhandlers. He’s a nice unmovable mass, and he can shuffle one-hand passes to shooters using his screen. I don’t think Nurkic’s assist rate will remain quite this high, but it’s something to keep in mind — he could be in an ideal environment right now. He really does have a nice touch.
Going forward, the recency bias is causing this surprise: as I discussed before, Jusuf Nurkic had great translated ratings from Europe, and he was outstanding in his rookie season. So when he’s healthy and not in an awkward frontcourt arrangement, you can see how he can perform so well; this isn’t shocking. Portland’s performance will regress some, of course, but it’s a nice find for a team that struck out on the market last summer — and it’s all the better that they’re fighting his former team for the last playoff spot. (2. Yes, I wrote this before Portland’s awful loss on Tuesday night. I wasn’t exactly surprised by their regression to the mean game, but we should still be optimistic with Nurkic).
Hey now: The return of Larry Sanders
After Andrew Bogut broke his leg in under a minute, the Cleveland Cavaliers had to be creative to find a defensive replacement — they signed Larry Sanders, who last played in December of 2014 and hasn’t had a full season since 2013. It’s a high-risk move with the potential for a pretty great return given the contract, but Cleveland should be valuing stability over anything else. Kevin Love is out, and they only have Tristan Thompson and Channing Frye as their big men for now. But I understand the gamble. A player without risk wouldn’t move the needle, and Larry Sanders at his best was Defensive Player of the Year-worthy. Back during the crowded 2013 race for the award, he was lost among the names of Marc Gasol, LeBron James, and Roy Hibbert, but he had a credible argument. So I get why they’re taking a chance with him; it just has a high chance of failing.
Cellar-dwelling in Brooklyn
The Brooklyn Nets are locking in on the worst record in the league, and with a seemingly great draft coming up there’s usually a silver lining: hey, you could end up with a young superstar. But the Celtics have the rights to swap first round picks, and, of course, they own Brooklyn’s 2018 first round pick with no protections. That sounds like a dire situation for a team rebuilding, but as I’ve discussed before, they are actually on the right track. It’s just hard to see. Brooklyn is hard at work at building something from nothing.
I know most fans ignore this team, but this is a fun and interesting challenge, and one that captures my interest. Since you have no gain for tanking for this season and the next, there’s no disincentive to play poorly. It’ll only grow the apathy for the team. You can invest right away in your coaching staff and veteran players to fill out the roster and not fear about losing a chunk of your lottery odds. You could also become the marketplace for bad contracts, gaining draft picks and other assets to pay guys who are too expensive for others for the next two to three years. If you strike out in free agency, I think this should be a legitimate option. Brooklyn has the lowest payroll in the league with few long-term contracts. They need to hit the salary floor somehow, and I doubt they’ll land a franchise-altering star in free agency anyhow.
Defense wins championships — and so does offense
I’ve seen some discussion about Houston’s title chances, as well as Cleveland’s, and how they’re dependent on their offensive rankings. In the SportVU era, I would hope we could tackle the topic more appropriately. First of all, a team’s rank is not the best method of evaluating how strong a team is. It’s easy to find an offensive rating on Basketball-Reference or another site, and such numbers can be adjusted too.
Secondly, I’ve looked into offense versus defense in the playoffs, and there is a bit of truth there: defensive rating is more important than offensive rating. But it’s not enough to completely override a team’s chances. We don’t need to fall victim to the rare event fallacy, where we place way too much faith in the patterns gleamed from a small set of observations. So yes, an elite offensive team that’s only mediocre on defense can win a title — maybe they won’t have the same chances as their defensive twin, but the odds shouldn’t be nonexistent.
Rest for the weary
We have few nationally televised games on network TV. They’re a precious commodity, and those of us who read about basketball online probably don’t need to wait until ABC airs a game to watch the NBA. So I understand the league’s frustration when their marquee game — a match between the Warriors and the Spurs — is derailed when every notable player sits. Kevin Durant was injured, and Golden State was on the end of a back-to-back, so they opted to rest Stephen Curry, Klay Thompson, Draymond Green and Andre Iguodala. They had no idea Kawhi Leonard would be injured, or that LaMarcus Aldridge would be sidelined indefinitely due to a heart arrhythmia. Tony Parker was out too due to his back.
This is nothing new if you’ve been following the NBA recently. The science of rest and how to deal with a brutal schedule — and Golden State is a team that’s especially vulnerable because they’re isolated on the west coast — is a growing area of research, and if your primary concerns are with winning a title, it makes sense to keep your guys fresh for the playoffs. The result was an awkward game of benches versus benches for an ABC game, but, hey, it wasn’t completely star-less: both teams had past All-Stars in David West, Pau Gasol, David Lee, and Manu Ginobili. For basketball die-hards, seeing guys like Kevon Looney, Patrick McCaw, Dewayne Dedmon, and Kyle Anderson start is a treat, and that’s related to the silver lining here over the effects of aggressive resting: teams will develop deeper rosters, and we’ll essentially have more relevant NBA players than otherwise. This new normal isn’t something we should be fighting; change is okay sometimes.
Projecting young players
One area of study I’ve always wanted to explore in-depth was how one could predict a young player’s future or peak performance with his NBA seasons. This is akin to using a player’s college stats for pro projections; the difference is that I want to do this within the same league to see if there are any key indicator variables we should all be paying attention to. So these are just long-term player projections, and they may not be entirely useful, but it’s the kind of experiment I like to do here.
For the data, I’m looking at every season for a player listed at 22-years-old or less with at least 1000 minutes in a season. Then I match that season with the player’s peak Box Plus-Minus — the average BPM from age 26 to 29 — so I can look at the strongest link between BPM at a young age to the theoretical peak age range. However, there’s an issue that’s probably obvious to those paying attention: what happens when a player has few minutes when he’s older, or worse yet, doesn’t play at all?
This is getting into an area known as survival analysis. More specifically, if I just ignored players who dropped out of the league when they got older, the model would have a survival bias. In other words, the data-set would be overly dependent on guys who stuck around and would most likely be too optimistic. To counter this, I included every player in the analysis, and for those with under 1500 minutes in their peak seasons, I regressed their peak BPM to -2, which is roughly the BPM of a borderline NBA player. The minutes are important for the weighting in the models, so here’s what I assumed for players who didn’t play much in their peak age range: 750 + minutes/2, which means that a guy who didn’t play at all would be listed with 750 minutes with a peak BPM of -2. I realize that many of these guys are playing overseas and are better than that, but many are quite worse so I think it’s a decent estimate.
Using my favorite regression package (Glmnet in R, which uses ridge regression (the R in RAPM) and another parameter to drop irrelevant variables), I used the data to build a prediction model for young players and their estimated peak. This includes every player from 1978 and on who was at least 28-years-old during the 2017 season. The results are, honestly, underwhelming. What’s the best way to predict a player’s peak BPM? It’s, well, BPM. To break things down further, I used offensive BPM and defensive BPM separately, and I created a number of non-standard statistics, like TRB%*AST%, which is a component in BPM itself, and TS% – league average TS%. It’s interesting to see offensive BPM is more important for future success than the defensive component — corroborated by this study about which metrics are stable — and there were a number of key stats that were a part of BPM but were still important on their own. But there’s nothing truly ground-breaking.
For an alternative method, I decided to turn to gradient-boosting (specifically, this is the “gbm” method in R along with the Caret package to tune the parameters). I prefer using multiple methods because the results are more believable when they’re found by multiple paths. And what’s gradient boosting? It’s an ensemble of a number of models that are weak on their own but more powerful together. Even a great model will have an error rate, so if you can find a few stable models that can explain some of the error, you can stitch together a better ensemble with some appropriate weights. It’s like using the wisdom of the crowd to guess the number of jellybeans in a jar! Of course!
Fry: Usually on the show, they came up with a complicated plan, then explained it with a simple analogy.
Leela: Hmmm… If we can re-route engine power through the primary weapons and configure them to Melllvar’s frequency, that should overload his electro-quantum structure.
Bender: Like putting too much air in a balloon!
Fry: Of course! It’s all so simple!
The gradient boosting method had fairly similar results overall, but there were some differences. For one, it gave even less weight to defensive BPM, and it was influenced differently by a number of the variables. You can see a full summary below ranked by which variables were more important. A few things stick out: statistical plus-minus standards, like TRB%*AST%, MPG, and steal rate, do very well. True shooting percentage, which holds a lot of influence over BPM, is not very predictive, and the same is true of 3-point percentage. Interestingly, how many 3-pointers you take was also not useful — perhaps this suggests that it is indeed a learned skill.
Table: gradient boosting summary
Varaible | Relative Influence |
OBPM | 31.74 |
TRB*AST | 18.17 |
DBPM | 12.69 |
MPG | 9.88 |
STL% | 5.37 |
FT% | 4.17 |
FT rate | 2.87 |
Age | 2.68 |
USG*AST | 2.64 |
BLK per 100 poss | 2.36 |
USG% | 2.30 |
G/season games | 1.77 |
DRB% adj – lg avg | 1.19 |
TOV% | 0.81 |
TS% – lg avg | 0.58 |
3PA rate | 0.42 |
3PA per 100 – lg avg | 0.36 |
3P% | 0 |
ORB% – lg avg | 0 |
The purpose here is finding a way to predict future performance for young players, so I’ve posted the end-game below in a table. As you can see below, the players who project as the best are basically the ones who had the highest BPM scores. Due to Nikola Jokic’s historically high BPM as a rookie and as a sophomore, he’s the highest rated by far. The list is mostly one of young stars, but there are a few duds. Michael Carter-Williams never panned out, even though his assist and rebounding numbers suggested differently. If you’re wondering why some players have a projected BPM under their current BPM, it’s due to the conservative nature of the modeling and the fact that every season was included (i.e. Giannis Antetokounmpo was weighed down by his inferior early years.)
Table: prediction results on current young players
Player | Glmnet | GBM | Average |
Nikola Jokic | 5.4 | 4.9 | 5.2 |
Blake Griffin | 3.7 | 4.8 | 4.3 |
Anthony Davis | 3.7 | 3.8 | 3.8 |
Karl-Anthony Towns | 3.7 | 3.7 | 3.7 |
Kyrie Irving | 3.5 | 3.2 | 3.3 |
Giannis Antetokounmpo | 3.2 | 3.2 | 3.2 |
Paul George | 3.0 | 3.3 | 3.1 |
Greg Monroe | 3.2 | 3.0 | 3.1 |
Rudy Gobert | 3.5 | 2.5 | 3.0 |
Kawhi Leonard | 3.0 | 2.9 | 2.9 |
Ricky Rubio | 3.2 | 2.6 | 2.9 |
John Wall | 3.0 | 2.5 | 2.7 |
Michael Carter-Williams | 2.3 | 2.8 | 2.5 |
James Harden | 2.5 | 1.8 | 2.1 |
Brandon Jennings | 2.4 | 1.8 | 2.1 |
DeMarcus Cousins | 2.3 | 1.7 | 2.0 |
Jrue Holiday | 2.1 | 1.8 | 1.9 |
Tyreke Evans | 2.2 | 1.5 | 1.8 |
Kenneth Faried | 2.2 | 1.2 | 1.7 |
Kemba Walker | 2.2 | 1.1 | 1.6 |
Nerlens Noel | 2.2 | 0.7 | 1.4 |
DeJuan Blair | 1.7 | 1.1 | 1.4 |
D’Angelo Russell | 1.8 | 0.8 | 1.3 |
Marcus Smart | 1.5 | 1.2 | 1.3 |
Elfrid Payton | 1.9 | 0.6 | 1.2 |
If you’re looking for Andrew Wiggins, he’s further down the list with a projected peak BPM of 0.4. Yes, he’s young and he scores a ton of points for someone his age, but that doesn’t correlate with future success. Besides his disappointing BPM, the best predictors were things like steal rate and assist rate multiplied by rebound rate, where he doesn’t do well either. Maybe he could improve his assist rate, sure, but that’s not a stat where players suddenly improve when they’ve already greatly improved their usage rate — this is why Mason Plumlee and Jusuf Nurkic are so interesting. If he doesn’t have a high BPM now when he’s a primary scorer, it’s unlikely he’ll improve in the future.
Andrew’s future isn’t set in stone, of course, and I hope this is something people understand when they hear analysts discuss him. There’s a lot of unpredictability in how young players develop. The models I built are definitely not great — I saw a root-mean-squared-error rate of over 2 when looking at out-of-sample data. There’s also the issue of using BPM as an independent variable and as the dependent variable; this is not ideal. Obviously the best way to predict future BPM is with current BPM. Unfortunately, we don’t have something like RAPM going back for decades, and for a study where you look at how players age it’s best to have decades of data, not just one generation.
Next: How does Kawhi Leonard get so many open shots?
Nevertheless, I hope we can gain a bit of wisdom from this exploration. Yes, a young player’s BPM is important, but we should take stock more in stats like assist rate and steal rate than a shooting percentage — which is music to the ears for fans of Dario Saric and Jamal Murray — Murray has the highest projected BPM out of any rookie this season, for what it’s worth. Joel Embiid was not included due to his minutes; he would have easily won. We’ll never be certain in how a player develops, but we have a few patterns to track.