Freelance Friday: The Altitude Effect

facebooktwitterreddit

Nov 19, 2014; Denver, CO, USA; Denver Nuggets forward Wilson Chandler (21) and Denver Nuggets guard Arron Afflalo (10) celebrate during the second half against the Oklahoma City Thunder at Pepsi Center. The Nuggets won 107-100. Mandatory Credit: Chris Humphreys-USA TODAY Sports

Freelance Friday is a project that lets us share our platform with the multitude of talented writers and basketball analysts who aren’t part of our regular staff of contributors. As part of that series we’re proud to present this guest post from Nick Restifo. Nick does statistical analysis (with an emphasis on lineups) for a few college basketball teams and is working on finishing his Master’s in Data Mining at Central Connecticut State. You can follow him on twitter at @itsastat.

Somewhat recently, notable NBA personalities Stan Van Gundy and (to a lesser extent) Kyrie Irving have come out as “altitude truthers[1. Term credited to Matt Moore of CBS Sports and Hardwood Paroxysm]”, deniers of the now reasonably well-held theory that the high altitude of arenas in cities like Denver and Utah provide the Nuggets and Jazz with a slightly better home court advantage. The idea behind the theory is that athletes living and playing in Denver and Utah are more accustomed to the thinner air of higher altitudes than their visiting opponents are, and this advantage accumulates into more wins over the course of the season than the skill of their team would suggest.

Let me be one in a long line of sports analysts to show you they’re wrong (famous stathead Arturo Galletti did a piece on altitude here, and there have even been studies [potentially] linking altitude to free throw percentage). Altitude does in fact provide Denver and Utah with an extraordinary home court advantage. The altitude effect is real.

For data, I compiled the stats.nba.com game logs of every single NBA game from the 2001-2002 NBA season up to and including the 2013-2014 NBA season. I often cut off at the 2001-2002 season because that is the first season where illegal defense rules were eliminated, and this subsection of NBA history best represents (in my humble opinion) a more modern NBA while still being a quite large chunk of data. For the altitude measurements, I scraped the Wikipedia page of every NBA city for the “elevation” of that city and took the value in terms of feet. When there was two values (high and low), I simply averaged them.

There are a lot of ways to begin looking at the relationship between altitude and home court advantage but the best place to start is usually the simplest. The table below details every team’s altitude, home winning percentage, and away winning percentage since 2001-2002, sorted by home win percentage.

Row LabelsHome Win%Away Win%Win%Altitude
SAS80.56%61.50%71.03%650
DAL75.07%57.02%66.04%430
LAL70.83%49.92%60.37%305
OKC69.67%53.70%61.68%1201
DEN67.65%39.56%53.60%5410
UTA66.06%38.50%52.28%4226
IND65.89%40.60%53.24%715
BRK65.85%47.56%56.71%33
MIA65.45%46.25%55.85%24
PHO64.98%46.87%55.93%1086
HOU64.37%45.94%55.16%43
DET62.51%44.85%53.68%961
BOS61.99%46.67%54.33%141
POR61.72%38.91%50.32%50
CHI60.69%40.26%50.48%615
League Average60.16%39.84%50.00%660
ORL59.22%39.97%49.59%82
NOK58.54%35.37%46.95%1201
CLE58.00%35.83%46.91%653
MEM57.82%36.54%47.18%337
SAC57.37%34.92%46.14%30
ATL56.71%32.31%44.51%894
LAC56.32%33.85%45.08%305
GSW56.27%32.57%44.42%43
NOH56.08%38.83%47.46%13
NJN55.89%35.88%45.88%6
MIL53.72%31.39%42.55%617
NOP53.66%29.27%41.46%13
SEA53.31%37.28%45.30%260
PHI51.98%38.82%45.40%39
WAS51.88%30.30%41.09%205
NYK51.37%33.87%42.62%33
TOR51.09%34.11%42.60%249
MIN50.71%33.69%42.20%830
CHA48.02%28.30%38.16%751

This table, while it oversimplifies the problem, in and of itself is interesting. The teams we would expect to top the chart, (the best teams since 2002), are there, represented by the Spurs, the Mavericks, the Lakers, and the Thunder (since they moved from Seattle). These teams have had the best home court records since 2002 simply because they were really good, and good teams win games altitude or not. The point of this study is to say that altitude is a factor, but clearly not the final determinant.

Denver and Utah (the two highest altitude cities in the NBA by far) both show up at fifth and sixth overall in terms of home win percentage. Have these teams also simply been better than average since 2002? In a manner that is almost too convenient, no, they have not. In fact, the Denver and Utah teams have been hilariously, perfectly average NBA teams over this timespan. If Denver and Utah were in fact better than average teams overall, we would expect them to have higher away win percentages as well. The Spurs, Mavericks, Lakers, and Thunder all have away win percentages AT LEAST ten percentage points higher than the league average. Despite enjoying an above average record at home over this timespan, the away win percentages of the Nuggets and Jazz were only 39.56% and 38.50%, almost exactly the league average of 39.84%, indicating that there may be a reason other than overall team skill that allows them to win so much at home.

The correlation coefficient of home win percentage and altitude in this table is 0.2842, meaning altitude can explain roughly 8% of variance in average team home win percentage in the NBA since 2002. This suggests that there is a relationship between altitude and home winning percentage, but boy do I hate correlation as a substitute for actual statistical analysis. Let’s go deeper.

There are many reasons why NBA teams win and lose games, and while chief among them is how good the teams are (Surprise!), there are other factors that tilt the scales either way. Another notable and deservedly talked about factor in deciding the outcome of an NBA team’s game is rest, and this must be considered when analyzing altitude as well.

First, procedure: To better look at rest and altitude, I binned the home rest, visitor rest, and altitude variables three ways. Binning is a process where you take a numerical variable, like days of rest or altitude in feet, and make a new variable which represents how large or small that record’s value for the original variable is. For example, if I had a random number variable that had a random number between 0 and 100 and I wanted to bin that variable into three bins of equal size, my cutoffs would be 33.33 and 66.67, and a record of 75 in the random variable would have a bin value of 3, if I number my bins with values 1, 2, and 3. Relatively simple.

At first I analyzed the actual value of rest days (truncated at a max value of 10 days of rest). But after running several models it becomes very clear that an increase of rest for both the home and visiting team beyond one day is NOT statistically significant in terms of the home team winning games. So for the purposes of this study, I binned both home and visitor rest into bin variables where games were represented by the following values; 0, a game a team played back-to-back; 1, a game where a team had 1 day of rest; and 2, a game where a team had 2 or more days rest, including season openers.

For the altitude bin, I binned altitude into three values. Why bin altitude? Because Utah and Denver are extreme outliers. Also, are we really concerned with each incremental foot difference between New Orleans at 13 feet and New York at 33 feet? Probably not. (And from a statistical standpoint, it’s also irrelevant.) The cutoffs for the altitude bin are 660.31 and 1773.82 feet. If this seems arbitrary, it’s not, the cutoff values are based on the mean and standard deviation of this skewed altitude population. The altitude bin sizes are also far from equal, and to me they adequately portray different classes of altitude. The first bin value of 0 represents home games for the 24 teams which play/played at an altitude below 660.31 feet. The second value represents the home games of the eight teams that play/played at an altitude between 660.31 and 1773.82 feet. And the third bin represents the home games of Utah and Denver, in a class of their own at 4,226 feet and 5,410 feet respectively.

Now what to do with all these binned variables and game log data? Run a logistic regression of course! Logistic regression is the less-used brother of standard, least-squares regression. While normal regression uses one or more numeric predictors to predict a numeric target variable in a strictly linear fashion, Logistic regression is used to predict binary variables like say, whether or not the home team wins a game, with both categorical (like binned variables) and numerical variables. Even better, logistic regression is able to capture the non-linear relationships between variables in ways least-squares regression can’t, making it a very powerful predictive tool.

By predicting the home team winning based on altitude, rest, and other factors, we can observe how much of a role altitude plays in a home team’s home court advantage, if at all.

Now for the fun. The first model here is a logistic regression model that predicts a home team win based on altitude alone.

The most important parts of this image are the Exp(B) and Sig. values. The Exp(B) values represents the increase in odds between the assigned category and the reference category, which automatically defaults to the highest value, 2, which is representative of the home games at Denver and Utah. This model finds that, without considering other factors, games in the highest altitude class are about 1.38 times more likely (1/0.721) to result in a home win than games in the middle altitude class, and 1.34 times more likely (1/0.741) than games in the lowest altitude class. Both of these relationships are found to be statistically significant by any threshold. Combining the two relationships and accounting for the amount of games played in each bin, games played in Denver and Utah are just under 1.36 times more likely to result in a win for the home team. 1.36! That’s a 36% better home court advantage than the other teams, which is absolutely massive.

This approach is flawed, however, because no other factors were considered other than altitude. If the Nuggets and Jazz had advantages other than altitude, had advantages that were masked by altitude, or were just really good, this first model would be none the wiser. Building a logistic regression model that accounts for several of these factors at once will enable us to determine how many home wins are due to altitude, and how many are due to factors mentioned before like rest.

In this next model I included altitude, home rest, visitor rest, the efficiency differential of the home team and visiting team, and the pace of the home and visiting team. Since pace and efficiency differential are both numerical variables, they are normalized using min/max normalization so the higher values of pace (like 94), don’t overemphasize the lower values of efficiency differential (like -3.6 and 4.9) in certain models. I used efficiency differential as a placeholder for the true skill of the team. Using the season ending efficiencies of the teams to predict their game results is kind of like the letting the model know the result of the race before it has to bet on it, but the purpose of running the model is to determine the effect of altitude, not pick games, so I’m okay with it.

This model incorporates the contribution of many factors in addition to altitude. This table may be a lot to handle, but I’ll attempt to explain it all. Like altitude in the past model, altitude, home rest, and visitor rest are all represented by a categorical variable, and each Exp(B) value for each variable bin are odds ratios comparing the chance of the home team winning games in the given bin to the chances of the home team winning in the reference bin (By default the bin with the largest value). An Exp(B) value above 1 represents an increased chance of the home team winning when compared to the reference bin, while an Exp(B) value below 1 represents a decreased chance. For numerical variables like efficiency differential and pace where there are no reference categories, the Exp(B) value represents the change in winning per increase of one variable unit. It is important to remember that each of the numerical variables has been normalized to a 0-100 scale.

Using all these factors, a few things jump out here. The first is that, as I mentioned before, the difference between one day of rest and two days of rest is not deemed significant at all by the model in terms of determining a home win, citing significance levels of 0.267 and 0.428 for the home rest variable and visitor rest specifically. This isn’t to say that rest isn’t important, because it almost certainly is. The model estimates that home teams with two or more days of rest are 1.22 times more likely to win then they are if the game is a back to back. The home team is also 1.289 times more likely to win if the visiting team is on a back to back.

As expected, the efficiency differential of each team is a massive determinant of whether or not the home team wins the game. The model estimates that for each unit increase of the normalized home efficiency differential, the home team’s chances of winning increase by 1.04 times. Translated, this means that for every 1 point per possession increase in the home team’s efficiency differential, the home team is 1.16 times more likely to win the game. For every 5 point per possession increase, the home team is 2.13 times more likely. The result is similar for the visiting team’s efficiency differential, but slightly smaller and obviously in the other direction. For every 1 point per possession increase in a visitor’s efficiency differential, the home team is 1.15 times more likely to lose. For every 5 points per possession increase, the chances of the home team losing increase by a factor of 2.03.

Neither pace value was found to be significant, but what’s fascinating is the model almost considered visitor pace to be significant. The effect the model picked up on was rather small, but this relationship seems logical for a variety of sneaky reasons. Visiting teams are playing games with less rest on average, and it’s possible that high paced teams are both more tired and more disadvantaged by the higher altitude advantage than slower paced teams. The relationship was just short of significance, but it’s possible that more research could discover something interesting here. Or it could be noise. Thus is stats.

Like in the first model, the altitude effect was found to be very important, even when considering other factors. Home games in Denver and Utah were about 1.31 times more likely to result in a home team win than home games in lower altitudes, even when considering rest and strength of the teams. While 1.31 is smaller than the 1.36 odds advantage found when the model considers altitude as the only predictor, this is still a massive improvement on the typical home court advantage.

With this information it is now possible to construct the “cleanest model”, a model with only statistically significant predictors and binning thresholds. For this model I included only home and visitor efficiency differential, home and visitor rest, and altitude. For the home and visitor rest variables, I binned all games into only two categories, based on the significance findings of the previous model. It seems it doesn’t really matter how much rest a team has past one day, as long as it’s at least one day. These are, after all, world-class athletes. A value of 1 now represents games where a team had 1 day or more of rest, and 0 represents games where a team is playing a back to back.

All of the variables now pass the test of statistical significance and what’s left is a simple model that can predict the result of an NBA game, a model that heavily values altitude as input. The relationships are all similar to those of the previous model, but they have been fine tuned. I’ll highlight them in bullet-points because they represent the final take aways of this study.

  • For every 1 point per possession increase of the home team’s efficiency differential, the chance of the home team winning is 1.16 times greater. For every 4.79 points (the efficiency differential’s standard deviation), a home team win is 2.06 times more likely.
  • For every 1 point per possession increase of the visitor team’s efficiency differential, the chance of the home team losing is 1.15 times greater. For every 4.79 points, a home team loss is 1.97 times more likely.
  • A home team playing a back to back is 1.26 times more likely to lose than a home team with 1 or more days of rest.
  • A home team playing a visiting team on a back to back is 1.33 times more likely to win than a home team playing a visiting team with 1 or more days of rest.
  • A home team playing at higher altitude is 1.31 times more likely to win than home teams that play at lower altitudes.

This simple model is able to retroactively predict the NBA games from 2002 to 2014 with up to 69.18% accuracy. According to this model, altitude is more important than the home team’s rest, and almost as important as a 2 point per possession efficiency advantage between one team and another. Denver and Utah enjoy a 31% stronger than normal home court advantage, even after accounting for the strength of teams and their rest coming into the game. What a massive edge! If the altitude effect was not real, we would expect a model trained on over 15,000 games to identify it as randomness and be unable to reject the null hypothesis that there is no relationship. But each model finds that not only is altitude significant, but that it has a very noticeable and consistent effect.

As this model demonstrates, many factors play a role in an NBA team winning games during the regular season. There is always the chance in any statistical study that a discovered trend is just random noise or that the analysis over or under-valued the tested effect, but the fact that altitude passes significance tests with flying colors and demonstrates a consistent, strong relationship with winning makes it very unlikely that altitude plays no role. In fact, teams like the Oklahoma City Thunder and Phoenix Suns, playing at altitudes above 1000 feet, may also enjoy a smaller, but discrete altitude benefit, and this is an area where additional research could be used. Altitude has a real effect on winning in the NBA, and may actually be more important than some aspects of rest, making it easily the most underrated of contributing factors. So the next time you hear someone bash altitude as unimportant and having no effect, or hear Stan Van Gundy label altitude as “b.s.”, just know that you know that altitude may be just as important to a team’s winning chances as a day of rest.