Nylon Calculus: How accurately can we predict NBA Playoff berths and similar scenarios?
Sometimes, teams that are expected to be strong playoff contenders but wind up being pretty unlucky, whether that be due to injury, strength of schedule, conference affiliation, etc., and fall out of contention for the NBA Playoffs. It can be really frustrating for fans to witness, front offices quibble, and players tend to move on if underwhelming seasons persist.
NBA Playoffs PCA model creation
I was curious about underachievers and wanted to re-use the data from a multinomial logistic regression model I made with an interactive web application.
Using principle component analysis, I identified regular season traits which strongly predict postseason appearances and successes. I chose to look into the following events:
- If an NBA team is playoff-bound: Yes or No
- If an NBA team is a championship contender (represents the conference): Yes or No
- If an NBA team is projected to make a deep run (specified as 11+ games in the playoffs): Yes or No
- If qualified for the NBA Playoffs, is an NBA team is projected to make an early, unimpressive exit (specified as five games or fewer): Yes or No
Predictive factors included:
- Offensive Rating (adjusted by league average)
- Adj. Defensive Rating
- Off. & Def. TS%
- Off eFG% & Def eFG%
- If this was a consecutive playoff appearance for the team
- ORB%, DRB
- Turnover Rates
I split the data 80/20 between the training and testing set, put the principle components into a multinomial formula in RStudio and rendered the responses. [If you’d like to see the detailed explanation of the process, check below.]
Models and accuracies
Model 1: Accuracy of Playoff Entries
Does this team make the playoffs? Yes or No?
Training Set: 92.9 percent accurate
Testing Set: 90.7 percent accurate
This is quite encouraging, as we’ll receive 90+ percent success from predicting if a team will make the playoffs with this PCA model. For further validation, I tested the training data with 2017 teams and returned 27/30 correct answers.
Model 2: Accuracy of championship contender
Overall — 92.6 percent accuracy when determining whether a team will/won’t make the Finals given their regular season statistical layout and knowledge of their recent postseason success/lack thereof.
However, too many false negatives existed and for teams who made the Finals in real life, only 45 percent of them were correctly assessed.
Overall — 94 percent accurate. For teams who made the Finals in real life, 66 percent were assessed correctly which is a welcomed sign considering how we’ve used largely regular season statistics to predict this.
The second model is best when affirming that certain teams don’t have the statistical profiles that would incite us to believe that they’ll make the NBA Finals, but of course, most teams don’t make it. So, this PCA model isn’t quite as insightful and would provide more reasonable results if formatted in a probabilistic manner (instead of a Yes/No question).
Model 3: Whether a team will make a deep run
(Where deep run = 11+ playoff games in a postseason)
Truly heartwarming results — the probability that we accurately projected if a team would have 11+ postseason games given their regular season data is 87.8 percent, overall. And among teams that actually made deep runs in the training set (103), 71 of predicted values were correct which is pretty affirming.
Testing set: 85.3 percent accurate, 22/31 accurate among deep runs, 9 false negatives
I definitely would like to keep this model around. It’s certainly more insightful than the last one.
Model 4: Whether a team makes NBA playoffs and has an early exit
(Multinomial Model — three possible categories here: ‘Didn’t Make NBA Playoffs’, ‘Early Exit [five games or fewer]’, ‘Made Playoffs – Non-Early Exit’)
Training Set: 77.7 percent accurate, quite nice. Also, to note, the 2016 Memphis Grizzlies were accurately predicted as a team to lose in the playoffs in fewer than five games; the 2016 Portland Trail Blazers were similarly given this prediction, however, they got quite fortunate with injuries to Chris Paul and Blake Griffin.
Testing Set: 73.1 percent, which is definitely acceptable because of the nature of a multinomial model. If you want to check the specifics of each testing/training set, check this spreadsheet here!
Following this, I hope to use these algorithms to keep track of the expectations we have for 2017-18 teams and potentially create an interactive R Shiny application that will render probabilities of the outcomes we’ve discussed.
Postscript with PCA methodology:
I retained Basketball-Reference statistics from each NBA team from 1980-81 to 2015-16 and decided to analyze whether we could accurately predict those four scenarios. The database has a plethora of independent variables that I later standardized by adjusting them to the league average from each season. Therefore, an offensive rating of 110 in a season with a league-average offensive rating of 111 becomes an adjusted offensive rating of 99.9, and we avoid inflated offensive and defensive stats. Then, I included a few categorical independent variables like whether a team was making a consecutive appearance to add a bit more color and potential explained variance to the model.
The initial fear of creating a probabilistic model with these variables is multicollinearity. So, to combat this, I ran a model in RStudio with principle component analysis which reduces the dimensionality of the analysis, essentially preventing the correlation among independent variables, and maximizing the variance of a low-dimensional representation of the data.
I picked a few of the produced principle components by analyzing scree plots and choosing components with eigenvalues of about one or higher. Each of the four models explain about 74-76 percent of the variation as they use five or six components.
Interpreting the dry component results is likely the trickiest, most challenging part of this type of data analysis. I configured a biplot of the principle component analysis playoff-bound model to hopefully give some more insight on how the process works.
This biplot maps my principle component analysis on a two-dimensional diagram. We are given the scores of the first two principal components and the proximity of a few variables in the model.
In the playoff-bound model, teams that are proficient in offensive effective field goal percentage are also great in true shooting percentage, offensive efficiency (both of which are intuitive) and 3-point rate (as it’s also somewhat of a proximal event). Similarly, in the bottom left of the graph, defensive effective field goal percentage and defensive rating. Furthermore, teams with similar overall component scores will be close to one another on this diagram; you can also see a pretty distinct separation between which teams qualify/don’t qualify for the NBA Playoffs.
The diagram succeeds in showing us which variables are significant to which components, but because only about 45 percent of the variance is explained by these two components, we don’t get the utmost insight picture from this particular biplot.