Who Gets Into the NCAA Tournament?
By Nick Restifo
Jan 13, 2015; Lexington, KY, USA; The Kentucky Wildcats bench celebrates during the game against the Missouri Tigers in the second half at Rupp Arena. The Kentucky Wildcats defeated the Missouri Tigers 86-37. Mandatory Credit: Mark Zerof-USA TODAY Sports
The NCAA Championship tournament will be here before we know it, and with that comes the hustle and bustle of 350-some D1 teams fighting for a chance to dance. Over the next few weeks, teams will round out their schedules and look to secure better seeds in their conference tournaments for a better chance at winning the conference’s automatic bid, the only type of tournament admission that many mid-major teams dare to hope for.
Lately I have been wondering what the exact chances of each team getting into the tournament were, and how much the strong teams of smaller conferences boxed out the good-not-great teams of the bigger conferences. Behind the ESPN Insider curtain I have seen percentage chances of teams advancing to certain levels of the NCAA tournament, but I had never really seen anyone put an exact percentage chance on a team actually making the tournament. So I did it live.
Using publically available data from kenpom.com and ESPN.com, I used two models to craft each team’s percentage chance.
The first model is a basic college game model that predicts the outcome of college games and, more importantly, provides a probability of each team winning. I had made this model prior to this analysis, but I made heavy use of it here. For the nerds, the model is a mean propensity model of logistic regression, neural network, and support vector machine inputs, trained on principal component factors that are dependent on variables like offensive efficiency, defensive efficiency, strength of schedule, and more. Historically, it picks games at about a 70% clip, which isn’t amazing, but the information the model receives is all public, making the model easy to re-train and run.
The model I made specifically for this analysis is a model that predicts whether or not a team will be selected by the committee for an at large bid, and NOT whether or not they should be. This bid-predicting model predicts at-large bids at a 96% clip from teams that didn’t win their conference, which sounds good until you realize that 89% of D1 teams don’t make the tournament, and a good chunk of those who do are pretty safe bets to definitely be in, like the Kentuckys, the Dukes and etc. The decisions of the selection committee are definitely hard to predict.
The good news, however, is that I didn’t really care how accurate the model is (as long as its accurate enough), because I’m much more interested in the probability measure the model supplies than the actual prediction. The tournament-bid-predicting model is based on logistic regression, neural network, and support vector machine techniques, uses principal component factors to account for multicollinearity, and in addition to evaluating bid-chances on measures of efficiency and strength of schedule, evaluates bid-chances on RPI , team ranking, and other bad measures that the committee seems to enjoy evaluating teams on.
Using the game-predicting model and the game logs for every game for the remainder of the D1 season, I calculated the expected final regular season winning percentage of each and every team. Using this expected season-ending winning percentage and a variety of criteria to account for ties in rank, I assigned each team in the conference to their expected conference tournament seed. With each team assigned a seed, I evaluated each team’s chances of winning their conference tournament by feeding my game-predicting model every possible tournament game, (the number of possibilities isn’t that big, conference tournaments are only 8-15 teams in size), and accounted for the possibility of each game occurring.This yields a percentage chance of each team winning their conference tournament that is in no dependent on simulation, but rather on a team’s actual mathematical chance. This part took me a while.
For the oh-so-special Ivy League that doesn’t have a conference tournament, (except in the case of a tie), I was forced to do a simulation of the remaining regular season (because this number of possibilities IS too big), rather than the actual probability number crunch that I did with the conference tournaments. The remaining Ivy league games were simulated 10,000 times based on the probabilities supplied by my game-predicting model, to provide each Ivy League team with a percentage chance of placing #1 in their conference standings.
Almost no assumptions were made in this process. But the ones that were made include:
- 10,000 iterations is enough to adequately evaluate the remainder of the Ivy League season.
- For conferences that do not field all of their teams into their conference tournament, those teams who are not currently projected to make their tournament have a 0% chance of winning their conference. (Notice the teams from the bottom of the Southland Conference. Those team’s chances would likely be less than 0.02% anyway.)
- For ease of updating, I used each teams OVERALL record to seed them in their conference tournament, rather than their conference record. This is only reflective of their seed, their probabilities of winning each game are not influenced, if the opponent is the same. I imagine the rank of records will be very similar anyways, especially as we move into the conference schedules.
Each team’s percentage chance of making the tournament is both a reflection of a) how likely they are to win their conference and receive an automatic bid, based on my game-predicting model, or in the case of the Ivy League, simulations; and b) how likely they are to receive an at-large bid from the NCAA, based on my tournament-bid-predicting model.
So without much further ado, I present each team’s chance of making the 2015 NCAA D1 Men’s Basketball tournament:
The top and the bottom are pretty self-explanatory, and the middle is clearly the most interesting. I sorted the numbers by their expected probability of getting in the tournament, except around the 68 team mark, where I made sure to have at least one team from each conference in the top 68, as the tournament is formatted to provide for.
Note how teams with higher probability marks, like Providence, Saint Mary’s, Green Bay, Connecticut and more, are getting boxed out by small conference automatic bids like Lafayette, even though Providence has almost three times the chance to get into the tournament over Lafayette specifically. As expected, the model’s expectation of a team getting an at-large bid plummets around the fiftieth team or so, and many teams are much more reliant on their conference tournament than others. Hofstra, for example, is given a 45.37% chance of getting into the NCAA tournament. But they have only a 6.10% chance of getting an at large bid. Much of that percentage comes from the 41.82% chance they win their conference. If they don’t win their conference, they’re probably not getting in. My hope with this analysis is that you give your favorite team a look and find out where they stand on making the tournament right now, and whether more of that chance comes from their chance of winning their conference tournament, or their overall performance.
The chart here helps to visualize the tournament probability landscape, and demonstrates the quick, expected falloff after the fiftieth team. It also clearly shows the probability spike at the divide of the teams being shafted by the format.
There’s still plenty of season to be had and teams still have considerable influence over their percentages. Anything can happen in each conference tournament too, and that’s why they call it March Madness.