Simulating the 2015 Three-Point Contest

facebooktwitterreddit

Jan 28, 2015; Atlanta, GA, USA; Atlanta Hawks guard Kyle Korver (26) celebrates with fans after their win over the Brooklyn Nets at Philips Arena. The Hawks won 113-102. Mandatory Credit: Jason Getz-USA TODAY Sports

A year ago, I experimented with a custom-built simulation for the three-point contest, but I didn’t get it up and running until after All-Star weekend. After some pain in completely rewriting it, due to the changing rules, thankfully I have it working now for what could be the greatest three-point contest of all-time. While this is seemingly a needless diversion during a weekend of exhibitions, I can’t help but explore an untapped area, and it’s fascinating to search for patterns in all the madness. Plus, there are usually countless posts on who should win and why, including this comprehensive piece, but someone should break it down, test which factors matter, and come up with some objective odds.

What makes a contest winner?

After grabbing the data from the years 2000 to 2014 — I skipped the years before 2000 partly because of the shortened line for a three years but also because three-point shooting has changed so much in the past — I searched for patterns via a variety of angles. (These two sites had a bulk of the data, but I also cross-checked with other news sources.)

To see what matters in winning a contest, first you need a model. Since this is count data (i.e. whole, positive numbers) with an upper limit, a simple linear regression model will not work. Instead a beta regression model is used where the dependent variable (what you’re trying to predict) is the proportion of total points scored in a round[1. Beta regression is preferred when the scale is limited to 0 to 1, useful for rates and proportions. (Specifically, it’s the logit link model used in the betareg function from R.) For example, the all-time record of 25 by the old rules would mean a proportion of 25/30 or 0.833. The functional form is: y = exp(beta*x)/( 1 + exp(beta*x) ).]

Three-point percentage is the most obvious variable, but are the previous seasons important? I tested a few variations with differing weights and came up with a 3-3-2 weighted system where you take a weighted average of three seasons: before the all-star break, the previous season, and the one before that. So yes, that means three-point percentage the previous season is as important as the percentage in the current season before the contest. This runs counter to how the NBA typically chooses participants: they sort by three-point percentage and choose the guys with the most attempts near the top. And the season two years ago has an effect too, but a smaller one since the weight is 2 and not 3.

Given that players generally take more three-point attempts the previous season as compared to the half-season before the break, this means that three-point percentage in the previous season, generally, is more important. That seems odd, but there’s a selection bias here: if you choose the players with the highest percentages before the break, as a group their percentage will fall the rest of the season due to the regression to the mean effect. For example, from 2006 to 2013, the average difference in three-point percentage between the pre- and post- all-star break for participants was 3.1% (among players with at least 40 attempts after the break.) Translation: the average contest participant is a significantly worse shooter the rest of the season. However, there’s a wrench in the selection bias: champions are invited back unless they’re injured. This is probably the best comparison test since they are not selected because of their pre-break percentage. And among returning champions, the average difference is … 0.4%. Clearly, there’s selection bias when only looking at a half season of stats. Thus, a multi-season average is the preferred method, and we should be wary of inviting guys with unusually high percentages before the break.

Three-point percentage is the obvious one, but what else?  I tested a few other things like height, usage rate, three-pointers attempted per minute, percentage of three-pointers that are unassisted, returning contest participant, returning champion, and a dummy variable for the final round, implying that players perform differently in the last round because they need to warm-up. I even tried a few interaction terms, like percentage multiplied by volume or usage. But no matter the order, the only other variable that had any significant was the dummy variable for the final round.

This wasn’t surprising because as I was collecting the data, and going from my own memory, I noted that winners came in all types and sizes. You had tall, legendary shooters like Bird and Dirk, but you also had shooting specialists like Craig Hodges and Voshon Leonard. Surprisingly, both Kevin Love and Kyrie Irving have each won, so maybe star players have an advantage, but Curry and Durant were disappointing. Maybe you want to say smaller guards struggle like Nash and Curry, but Hornacek’s the same height, roughly, and he won twice, along with the aforementioned Irving. Super-high three-point volume guys like Curry and Ryan Anderson have come up short against players who take roughly half as many three-pointers as they do per minute — there are no patterns. For a visual representation, I included the three plots below. (Note: I prorated the 2014 contest scores to a 30 point contest, instead of the 34 possible with a moneyball rack.)

Sep 20, 2012; Atlanta, GA, USA; Rory McIlroy (left) and Tiger Woods talk on the 17th hole during the first round of the TOUR Championship at East Lake Golf Club. Mandatory Credit: Debby Wong-USA TODAY Sports

Sep 7, 2012; Carmel, IN, USA; Tiger Woods (left) and Rory McIlroy (right) walk down the 8th fairway during the second round of the BMW Championship at Crooked Stick Golf Club. Mandatory Credit: Brian Spurlock-USA TODAY Sports

In fact, the contest itself is fairly unpredictable. The pseudo R^2 is 12.0%, meaning only about 12% of the variation is explained by the model. For those unfamiliar with regression, that’s a fairly low number for what should be a straight-forward contest. A good rule of thumb is to go with the field because there’s more noise in the contest than people think. There’s also a problem in that we only have a small number of participants each year; it’s not a good sample size.

I’m still considering other variables I missed, but I’m afraid there’s not much to be done. Shooting style is the one I think has potential, but I wouldn’t bet everything on it. Set shooters don’t always win because Beal nearly won last year and he jumps higher than the vast majority of players, and Ray Allen has won before too. Then there are players with smaller verticals like Peja, Pierce, and Bird, who have excelled too. Perhaps with another year of SportVU data and another contest I can create a better composite measure — but not now.

Simulation model

With a formula to estimate contest scores completed, there’s still some work to be done. There’s no direct way to solve the odds on who will win because it’s based on the other players and ties are possible, so you need to simulate the odds.

Considering those facts, the model was built by varying the coefficient for shooting percentage based on the standard deviation from the regression results. Essentially, this gives a “real world” set of varying results where Klay Thompson can shoot 20 one round and 12 the next. With the link logit (beta) form, there are also realistic limits where a score of 25 or 26 is rare, and so are scores of 4 and 5. Out of the 98 first round scores from 2000 to 2014, there was only one score of 23 or higher (Arenas) or 1.0%. Last year, running the simulation with 32,000 games played (4000 simulation seeds with 8 players), there were only 326 such games, which translates to 1.0%. (Korver’s an outlier, so I used the players from last year for a better representation.) Only one case doesn’t prove the model is reflecting real world conditions, so I plotted a histogram below showing how the first round in the real world and virtual compare.

July 19, 2012; St. Annes, ENGLAND; Jamie Donaldson hits a fairway shot on the 2nd hole during the first round of the 2012 British Open Championship at Royal Lytham

Sep 28, 2012; Medinah, IL, USA; European golfer Justin Rose tees off on the 8th hole during the 39th Ryder Cup on day one at Medinah Country Club. Mandatory Credit: Brian Spurlock-USA TODAY Sports

Basically, I just wanted to show that the results and the spread in scores were reasonable. The histogram with the “real world” scores is only using 90 scores from 2000 to 2013, so it’s not a perfect normal curve but it is starting to form that familiar bell shape.

(Technical note: I’m still not sure how to handle the moneyball rack in the model. I didn’t want to try anything too complex because I really only have one year with the extra rack to validate the results. It probably increases the variation of the results a little, but further investigation is needed here since players choose where the rack goes. Perhaps after this year’s contest I can start to analyze the variation.)

The results

Based on 400,000 simulation runs to get stable odds through the first round and finals, the most likely winner is, surprise, Kyle Korver at 33.6%. (Yes, there were 400,000 different contests simulated; the good thing is that my rewritten code is super fast and it takes under a minute) Curry, naturally, is the second most likely at 14.9% with Thompson at 13.4%. However, even a supposed longshot like Harden nearly has 5% odds, which isn’t terrible for a contest with eight guys.

Odds:

Wes Matthews: 8.4%

J.J. Redick: 9.1%

James Harden: 4.7%

Kyrie Irving: 6.5%

Stephen Curry: 14.9%

Klay Thompson: 13.4%

Kyle Korver: 33.5%

Marco Belinelli: 9.4%

Expected first round scores:

Wes Matthews: 16.7

J.J. Redick: 16.8

James Harden: 15.9

Kyrie Irving: 16.3

Stephen Curry: 17.7

Klay Thompson: 17.5

Kyle Korver: 19.6

Marco Belinelli: 16.7

(One fun thing about a simulation with 400,000 different outcomes is looking at the extremes. For instance, Kyle Korver had a score of 33 twice in the final round. The maximum possible score is 34.)

If you’re wondering if Korver is the best contest shooter ever by my model, somehow he’s actually been beaten. Jason Kapono had the highest three-year weighted average (since 2000) back in 2008. Somehow his average was 50.1%, while Korver this year (remember this is a three-year average) is at a more mortal 48.4%. If you’re wondering how Kapono did, he put up a 20 in the first round and beat Dirk in the finals with a 25, which ties the all-time contest record for highest score in a round.

Based on the Vegas odds for the contest, Korver is actually being underrated despite all the hype he’s had this season. Curry’s biggest advantage with shooting, after all, is that he can hit those shots off the dribble, but that won’t matter here. Also, the field is generally underrated for the contest, with the possible exception of Harden because he’s a star. If you’re wondering how a guy like Matthews can beat Korver in a shooting contest, I’ll just point to history where plenty of underdogs like James Jones have triumphed.

As a final note, let me reiterate what I said last year on the strategy of where to place the moneyball rack. Players are afraid of using it in the last corner because they fear they won’t be able to finish the rack and could waste the extra moneyballs. This is flawed thinking for a few reasons. Besides how much closer the line is from the corner and how many players do better after they warm-up after a few shots, players either finish rounds or have the clock expire as they reach and try to shoot the last ball. It’s rare that a player leaves two or more balls unused, Joe Johnson notwithstanding. But no matter where you place the rack, the last ball will always be a moneyball, and if you’re afraid of time expiring with two or more balls left you won’t have a good chance at advancing anyway. (Although I’d prefer the old rules be reinstated because we have so few era-neutral basketball aspects to judge players. Adding moneyballs only increases luck and variability, and this contest already has that in spades.)

With all that said and after a complicated simulation model built and tested, there’s still a lot of mystery in the contest. Kyle Korver is having arguably the best outside shooting season ever and Stephen Curry is already arguably the best shooter ever, but it’s very possible neither competitor wins. And with everything I’ve learned and discarded, the most important thing I picked up is how unpredictable the contest is. Which is partly why I still care about the contest.