Lost and Found Production: Revisiting Preseason Predictions
Last year, my preseason predictions for the number of wins for each team in the NBA using a blend of my own Player Tracking Plus Minus (PT-PM) and regularized Adjusted Plus Minus (RAPM) via Got Buckets [1. And the NBA the pseudonymed NBA analyst Talking Practice] were among the tops at the APBRmetrics season prediction contest. Successfully predicting wins and losses over the long NBA season is at best a mix of luck and skill. There are injuries and trades. Players have unexpected breakout years or implosions. Coaches put together inexplicable line ups, which occasionally work beyond all expectation.
The methodology used by most preseason prediction models[2. Including mine.] is to pick a metric or blend of metrics to estimate player quality. This estimated is then combined with a minutes played prediction for each player over the course of the season. This floor time prediction necessarily requires a combination of modeling and guesswork.
Error can creep in to the projection from three main sources:
- Minute allocations; Kevin Durant’s injuries could not have reliably been predicted, for example.
- Player Quality; few would have expected Hassan Whiteside’s explosion in productivity, or he’d not have been sitting on the street at midseason and would already be making a lot more money.
- Positive or Negative Fit issues; think Rondo in Dallas or the alchemy that was the Golden State Warriors.
Now that the season has been played I can go back and re-test my metric blend[5. Ed. Going back and revisiting your predictions and the performance of various models and metrics is an absolutely vital analytics exercise which cuts against the natural human instinct to bury past errors. A cold-eyed assessment of what an approach does well or poorly is a simple necessity for both improving the metric used and for determining the degree of usefulness in actual decision making.] against the actual minutes distributions to generate what is generally called a retrodiction. To score the retrodiction I am using two common error measurements; the average absolute deviation[1. Simply the average prediction error whether high or low, without regard to direction?], and the root mean squared error (RMSE)[6. A measure placing exponentially greater weight on larger misses.].
The accuracy of the model improves slightly by going from estimated minutes played to the retrodiction, as shown below.
Note the improvement is fairly modest. In part that speaks to the lack of systematic error in the minutes predictions, along with probably a little luck. The RMSE measure improves slightly more because a couple of the bigger misses in terms of predictions were the most improved by substituting actual minutes played for preseason guesstimates.
Lost and Found Production
I can drill down further to the team level and it gets more interesting. Comparing the change in prediction from the pre-season estimated minutes distribution to the actual minutes played provides a measure of “lost production” over the course of the season for teams that distributed playing time to worse mix of players. On the other hand, I can see which teams gained production by adding pieces or giving more court time than expected to players better rated by my initial player quality metrics.
The teams that lost production are pretty much the ones you might expect, those that suffered big injuries and/or went into full tank mode. Squads with the most lost production last year were the New York Knicks, Minnesota Timberwolves and Oklahoma City Thunder [2. Paul George’s injury was already known by the time I made my predictions so his lack of playing time was already factored into my predictions for the Pacers].
- Knicks -7 wins: For the Knicks almost every player predicted to be productive, like Carmelo Anthony, played fewer minutes than expected and were replaced by players not expected to play well on the NBA level.
- T-Wolves -7 wins: The Wolves, of course lost Ricky Rubio for most of the season. But, beyond Rubio almost every expected productive player played fewer minutes than expected due to injury or trade last year and were replaced by rookies or someone else predictably less good at NBA level basketball.
- Thunder -5 wins: It is slightly surprising that the Thunder didn’t lose more production, but Russell Westbrook played more minutes than my pre-season estimates on a per game basis to close the gap in total minutes and Kendrick Perkins playing fewer minutes actually helped a bit.
There are fewer teams on the flip side, teams which would have been predicted more favorably preseason had I better known their playing time allocations. This “found production”[2. You see what I did there?] existed in part because a few game changing players were available during the season improved rosters while injuries giving more run to a backup better[4. On my player quality metrics.] than the injured starter also improved minute distributions for some teams. Tops in terms of found production were the Boston Celtics, picking up 6 wins with roster adjustments compared to my preseason estimate. The biggest part of that was flipping Jeff Green for Jae Crowder and Rajon Rondo for Isaiah Thomas. Crowder and Thomas were rated just above average. However, slightly above average was still better than either Green or Rondo were expected to be, even before the season. That suggests Danny Ainge may deserve a part of the credit [3. Or blame if you’re from the tank sector of the Celtics fanbase] usually given to Brad Stevens for the team’s late season turnaround.
In theory the retrodiction using actual minutes played should have an advantage predicting the record for every team. The retrodiction produces better overall numbers, modestly, but not all team predictions are equally advantaged.
The teams above are among the biggest improvements from the preseason predictions to the retrodiction. But, some predictions actually get a little worse. [4. Team by Team Prediction Change – improved predictions are in green, worse are marked in red:
]
The Utah Jazz and Philadelphia 76ers were the two teams that saw the biggest decreases in accuracy moving to the retrodiction. For the Jazz, the PT-PM blend simply missed out on Rudy Gobert, seeing much less optimism out of his turnover laden rookie year. Playing him more had the opposite effect of what the model would have predicted.[7. Oops!] For the Sixers, despite cutting Tony Wroten’s minutes and giving big minutes to the most playable members of the roster like Robert Covington and Nerlens Noel, the team still underperformed the model, failing to meet even the low, low preseason prediction. Mean regression isn’t for everyone, I guess.
So, minute allocations don’t appear to be the primary source of error for my preseason predictions. [5. I saw similar results looking at RPM based predictions]. There is the hint of a systematic error as tanking candidates like the post-Kevin Love Timberwolves and the transitional front office Knicks jettisoned quality players and embraced youth movements and castaways. But that is more evident after the fact and needs more pre-season evidence to make it truly actionable. Thus player estimates and/or roster fit appear to remain the primary causes of missed predictions. With further study and iterations looking for systematic errors and, I hope to improve the model to get even closer on next year’s predictions.