Diamonds Dancing: Predicting The Madness

Apr 4, 2015; Indianapolis, IN, USA; A general view of the NCAA Tournament bracket on display on the J.W. Marriott before the 2015 NCAA Men
Apr 4, 2015; Indianapolis, IN, USA; A general view of the NCAA Tournament bracket on display on the J.W. Marriott before the 2015 NCAA Men /
facebooktwitterreddit
Apr 4, 2015; Indianapolis, IN, USA; A general view of the NCAA Tournament bracket on display on the J.W. Marriott before the 2015 NCAA Men
Apr 4, 2015; Indianapolis, IN, USA; A general view of the NCAA Tournament bracket on display on the J.W. Marriott before the 2015 NCAA Men /

Before the NCAA tournament begins, many different media outlets like to speculate on their favorite potential “Giant Killers.” These are lower-seeded teams that have a good chance of pulling off an upset in the first round. However, often times these teams are one-and-done deals, flaming out in the second round after their game of glory in the first round. There is a big difference between Giant Killers and teams that can become a Cinderella and advance into the second weekend of the tournament. Using a decade’s worth of data, I’ve created a model attempting to find the teams who might just fit the glass slipper.

Recent evidence suggests that non-conference strength of schedule could be a solid predictor of tournament success. The following model does not include strength of schedule, but it does include a whole lot of other things. The model is built with team-level KenPom advanced statistics from 2004-2015. Because success in the NCAA tournament weighs heavily on how well each team matches up with its opponent, this is one way of trying to quantify that success based on match-ups at the team level. That means no strength of schedule, no seeding information, no lottery player bias, etc.

Finding Diamonds in the Rough

Starting with 26 advanced statistics for both favorites and underdogs, the model estimated is a logistic stepwise regression, estimating the win probability for each team. When trying to predict first-round success, a few indicators stood out. First, both an underdog’s and a favorite’s Adjusted Offensive and Defensive Efficiencies are great predictors of a team’s tournament success[1. Duh!]. Other important factors that play into determining a first round upset seem to hinge on turnovers and three-point shooting. According to the model, the most important traits that an underdog can have are the ability to force turnovers and shoot a high rate of threes.

The most important traits for favorites hoping to avoid a Round 1 loss is a suffocating defense and the ability to make free throws. This makes some intuitive sense, as a high scoring underdog may be able to get hot and run-and-gun its way to an upset; a good defense would be less likely to allow that possibility to occur. Free throw shooting becomes important if an overmatched, underdog defense becomes reliant on fouling in order to slow down their more-talented opponents.

 After the first round, the model moves on to selecting important factors for upsets in Round 2. When teams get to Round 2, many of the same factors that predict Round 1 success also seem to carry over to Round 2, such as the favorite’s stifling defense. However, as teams get out of Round 1 and get deeper into the tournament, it seems that overall athleticism comes to mean a bit more. For Rounds 2-4, the most important indicator for an underdog is not a traditional stat, but rather the percentage of its shots that it has blocked. This pattern may define some amount of athleticism[2. Yes, future NBA draft picks do matter, even if we don’t account for them in the model.] (or lack thereof) for the underdog, as a smaller, less athletic team may be prone to having its shots blocked by longer, quicker defenses. For the favorite, it’s more of the same sentiment. In Round 2 and on, the rates at which favored defenses get steals and defend the three-point line are often indicative of success. Both of these stats could reflect athleticism, as longer, quicker teams can get their hands into passing lanes and up to defend the outside jump shot.

Giant Killers and Cinderellas

Below are the results of the model, with each team’s percentage chance of winning in Round 1, Round 2, and their chances of making the Sweet 16.

As we look at the results of the model, a few patterns emerge:

  • The model is incredibly confident in a lot of 8/9 and 7/10 picks. At times, the model is too confident. As these match-ups are the hardest to predict, this is where the model is the least accurate, though not by much. Testing the model on tournament data from 2002 and 2003, the model still managed to pick about 75% of the match-ups correctly. In terms of total accuracy, the model correctly predicted 83% of all match-ups.
  • In terms of Giant Killers, or teams that may be in line to pull a big upset in Round 1, the model likes the First Four participants, 11-seeds Wichita State and Michigan. Interestingly enough, a team from the First Four has made the Round of 32 each year since the First Four’s inception in 2011. Also look out for Little Rock, Northern Iowa, and Gonzaga as double-digit seeds.
  • Stephen F. Austin is an interesting case this year. In Round 1, they have a 12.1% chance of beating West Virginia in the East Region. However, if they somehow manage to pull the upset, their chances of winning in Round 2, sit at 35.84%, against either Notre Dame (more on them in a bit) or Michigan[4. Gonzaga and Arizona are other teams that have significantly higher win percentages in Round 2 compared to Round 1].According to the model, S.F. Austin’s Round 2 winning percentage would be the highest for a 13-seed or worse facing at least an 11 or better seed since KenPom started keeping advanced stats.
  • If we’re talking Cinderellas, the model particularly likes Iowa, giving them a 73% chance of making the Second Weekend, likely taking down Villanova on the way. Along with Iowa comes Wichita State and VCU as the most likely double-digit seeds to make the second weekend.
  • According to historic model estimates, UNC Asheville sits as the 6th mostly likely 15-seed to win a first round match-up since 2002. 2 of those 5 15-seeds[5. 2013 Florida Gulf Coast and 2012 Lehigh.] with at least a 5% chance to win in Round 1 won their match-ups. If you HAVE to pick a 15-seed, you might consider UNC Asheville.
  • The model doesn’t seem to like Notre Dame[6. Notre Dame is the only Top 6 seed that has a winning percentage lower than 30% in both Round 1 and Round 2] in either the first or the second round. This could be because Notre Dame doesn’t really excel in many categories that the model values. They are an average team on defense, allowing 1.037 Points Per Possession[7. Equal to the NCAA D1 Average]. They don’t turn the ball over, though they don’t force turnovers either, giving the ball away on 14.8% of their possessions and forcing the same percentage on defense. Finally, they don’t guard the 3-point line particularly well, allowing opponents to shoot 37.6% from behind the arc on nearly 20 attempts per game.

Disclaimers

A point to remember when looking at this model: It is a logistic regression, meaning that it measures a team’s chances of winning a game in percentages. That seems fitting for March Madness, considering that anything can happen. Even teams with less than a 1% chance of winning have won in the past. Models like these can give you a solid prediction of what may happen, but nothing is certain. That’s why the boys lace ‘em up and why we watch them. Good luck picking your brackets, enjoy the Madness, and continue to pray for the return of Gus Johnson to the NCAA Tournament sidelines in the near future.