Evaluating Draft Prospects: A First Pass

Mar 5, 2016; Lexington, KY, USA; LSU Tigers forward Ben Simmons (25) dribbles the ball against the Kentucky Wildcats in the second half at Rupp Arena. Mandatory Credit: Mark Zerof-USA TODAY Sports
Mar 5, 2016; Lexington, KY, USA; LSU Tigers forward Ben Simmons (25) dribbles the ball against the Kentucky Wildcats in the second half at Rupp Arena. Mandatory Credit: Mark Zerof-USA TODAY Sports /
facebooktwitterreddit
Mar 5, 2016; Lexington, KY, USA; LSU Tigers forward Ben Simmons (25) dribbles the ball against the Kentucky Wildcats in the second half at Rupp Arena. Mandatory Credit: Mark Zerof-USA TODAY Sports
Mar 5, 2016; Lexington, KY, USA; LSU Tigers forward Ben Simmons (25) dribbles the ball against the Kentucky Wildcats in the second half at Rupp Arena. Mandatory Credit: Mark Zerof-USA TODAY Sports /

With March upon us, conference tournaments ramping up, the big dance quickly approaching, and other projections gradually being released; I thought it would be a good time to introduce my projection model for the upcoming draft, and provide some insights on how it functions.

My draft rating is the combination of two systems of ensemble modeling that attempt to address two separate factors.

  • How good are a draft prospect’s chances of playing in the NBA?
  • How well do NBA players of similar pre-draft measure and production perform once in the league?

These are actually very different questions. Players at low levels of play and sporting underwhelming physical tools are unlikely to play in the NBA, no matter how strong their production. While their theoretical success is up for debate, it is a relatively safe assumption to say that the players that are given a chance to play in the NBA, (you know, the 230 lb 6’10 AAU products with shooting range out to the 3 point line), would have had more success in the NBA on average than the ones that never got a chance to play in the league. Keep this assumption in mind as I discuss the first component of my model.

The first component of my draft projection is an ensemble of a gradient boosted logistic regression model, a vanilla logistic regression model, and a neural network model, all predicting whether or not a player will play in the NBA. I tried other modeling techniques as well, but the results were not as strong. For each player I use the mean of their top two probabilities of NBA play[1. A subjective choice I made to give each prospect “more of a chance” at NBA play, and allow the second component of my draft model more importance]. Since these models are each predicting NBA play, they value factors like strength of schedule, height, wingspan, combine results, and high school rank significantly more heavily than the other aspects of my  system. Other factors, such as that vitally important steal rate, weigh heavily as well.

The next component of my draft model is an ensemble of a gradient boosted regression model, a generalized liner regression model, a neural network model, and a random forest model predicting success in the NBA among all NBA players for which I have some degree of pre-draft production information. Since the main source of my data is the always fantastic DraftExpress, that pretty much limits the analysis to the players who have been drafted since 2002 that did not come straight from high school. The target variable I use here is the player’s two-year peak (in some cases one-year) of a scaled blend of NPI RAPM, WS, and BPM. Predicting WS alone actually results in the most accurate predictions from my own pre-draft production data, but since the ability to predict a number and the value of that prediction are two separate entities, I opt to use the blend, combining the predictability of WS with the often more telling value of RAPM and BPM. With four predictions in place, I take the weighted average of all four, weighted by the inverse of the out-of-sample mean squared error attributable to each model. This ensemble values similar factors to the previous component, but items such as age, steal rate, assists, and scoring efficiency carry the most weight.

To reach my overall rating, I simply take the product of the success of a player should he play in the NBA by his chances of NBA play. While this approach may have flaws, it does have flexibility. It can be applied to any prospect anywhere and is indeed applied to the thousands of potential prospects on which I have information. Further, it is not reliant on subjective filters for training or evaluation[2. These are obviously important aspects in prospect evaluation, but factoring them into this model could result in “double counting” this sort of subjective data.]. The chances of playing in the NBA for the top prospects are usually between 99-100%.

Below are the first run at my results, which when combined with DraftExpress’s own rankings, keeps scores and ranks for anyone in the top 60 by own system’s rankings and anyone in Draft Express’s top 100. For a little context, Ben Simmons’ rating of 61.56 would be good for 15th in my dataset, right after Derrick Rose and in front of Brook Lopez. Keep in mind my model is evaluating 3664 potential prospects for this incoming draft class. My system considers all but three of the evaluated DraftExpress Top 100 to be in the top 25% of all considerable prospects, and all but 1 in the top 50%.

Sheet 1
Sheet 1 /

I only intend this post to serve as a quick introduction to the methodology, and as a very loose guide for focusing one’s NBA draft exploration. I plan to both tinker with the methodology (because it needs tinkering) and provide much deeper insight into the validity and process of the system multiple times before the NBA draft. Until then, I hope the current results can be a flexible baseline[3. Rankings? *shudder*.] to all the fellow tourney-heads and draft-nits out there.