An Easier to Interpret NBA Draft Model

Mar 15, 2015; Nashville, TN, USA; Kentucky Wildcats forward Willie Cauley-Stein (15) works against Arkansas Razorbacks forward Bobby Portis (10) during the second half of the SEC Conference Tournament championship game at Bridgestone Arena. Mandatory Credit: Joshua Lindsey-USA TODAY Sports
Mar 15, 2015; Nashville, TN, USA; Kentucky Wildcats forward Willie Cauley-Stein (15) works against Arkansas Razorbacks forward Bobby Portis (10) during the second half of the SEC Conference Tournament championship game at Bridgestone Arena. Mandatory Credit: Joshua Lindsey-USA TODAY Sports /
facebooktwitterreddit
Mandatory Credit: Joshua Lindsey-USA TODAY Sports
Mandatory Credit: Joshua Lindsey-USA TODAY Sports /

One of the harder things with draft models in interpreting them is getting beyond rankings. That’s important, because while draft order naturally calls out rankings, the difference between each position isn’t necessarily equidistant as rated by the model. The difference between the eighth best prospect and tenth best as rated by the model might be negligible, meanwhile the gap between ten and twelve could be more significant. The actual rating calculated by the model helps, but they’re often still not that intuitive. For my P-AWS model that is a perhaps a particular problem since the rating basis is not familiar to many people[1. Alternative Win Score (AWS) a linear weight metric, shown to be one of the better metrics at predicting out of sample].

This is partially solved by converting the model scores to standardized values, which centers scores around the average, scaling them based on the spread of the distribution. That sort of standardization is especially helpful is comparing two different metrics to make them directly comparable.

But, most people do not work with standardized data often, so while the distance between prospects may be a little clearer [2. Or may not be], operationalizing the model’s output is not that much easier. Nor is it as intuitive to combine with information not included in the model (such as injury risk or character assessment) as we would hope.

For that reason I updated a version of a model I ran last year that uses logistic regression to estimate the odds that a prospect will be a quality player. “Quality player” is measured as scoring in the top quartile of the prospects in the training set. That score also happens to approximate being a quality rotation/borderline starter caliber player. The logistic regression is trained by running the model inputs[3.  Age, pace adjusted box score stats, high school recruiting rank, and competition level] against a Yes/No variable as to whether the prospect’s statistics in the NBA qualified them as a Quality Player by the end of their rookie contract.

One analytic downside of using the Yes/No variable is a loss of detail on the player’s actual production, especially for star players, who are not differentiated from players that just meet the Quality Player minimum[4. Layne Vashro has a very interesting and more complex similar model with four possible category breakdowns on the our stats page]. But, luckily those traits show up fairly linearly; the players who go on to be stars are rated with a very high likelihood of being at least Quality Players. The coefficients that make up the model are also in line with the P-AWS model, the main difference is a slightly lower relative value on age and high school rank, as those are factors that tend to differentiate stars. All in all this model is closely related to the P-AWS model with a linear R^2 of .937, and higher allowing for the 0% lower limit for a player’s odds.[4. For example, I technically have a 0% chance of being an NBA player, though I feel that should probably be lower somehow.]

The table below has the odds of each prospect in the top 50 prospects of becoming a quality player, along with their rank in that model:

Odds Model
Odds Model /

The whole list is here.

A couple of highlights from the table. Jahlil Okafor and Karl-Anthony Towns are the best bets according to the model, with Tyus Jones and D’Angelo Russell close behind. The odds fall pretty quickly in the model’s estimates, with players ranked in the twenties being given less than 50/50 on becoming quality players by the end of their rookie contracts.

As stated above, framing the model output as odds of success makes it easier and more intuitive to add information that is outside of the model but may have an impact on the player’s ultimate chances for success. If a player’s physiology or injury history leads the medical staff to estimate a higher risk of career altering injury risk exists, the odds can easily be recalculated with that additional information. Similarly, if a drafting organization has good empirical based evidence that a character type or work habits exhibited by a prospect make them more likely to succeed at the next level they can apply that to their estimation as well.

As a hypothetical example, if an organization estimated that prospects with Stanley Johnson’s character traits transition to the next level 55% of the time, they could apply Bayes’ theorem to Johnson’s .68 odds from the model to get an adjusted 72.2% odds. Enough to move him a few spaces up their board if the prospects ahead of him did not show similar characteristics, but not enough to put him at the top of their board.

In the end the Quality Player Odds model has similar benefits and weaknesses to most analytic draft models. But, for some it may be more understandable, and easier to combine with outside information, including qualitative information.