Freelance Friday is a project that lets us share our platform with the multitude of talented writers and basketball analysts who aren’t part of our regular staff of contributors. As part of that series we’re proud to present this guest post from Nick Restifo. Nick does statistical analysis (with an emphasis on lineups) for a few college basketball teams and is working on finishing his Master’s in Data Mining at Central Connecticut State. You can follow him on twitter at @itsastat.
By now, many basketball fans are aware of The Four Factors and their importance on winning in the NBA. Most know that a team should seek to maximize its effective field goal percentage, offensive rebound percentage, free throw rate, and defensive turnover rate; while minimizing those numbers for their opponent.
This is all well and good. Obviously we want our favorite team to make shots, not commit turnovers, get rebounds, get to the line, and not allow the same from their opponents. But where is the breaking point? What is the exact value for which teams should aim? How good do teams have to be in each category before their chances of winning significantly improve?
With the help of a statistical technique called classification and regression trees, we can determine so-called “winning thresholds”, a value for each of these important statistics where the historical odds of winning dramatically increase. Classification and regression trees are usually used to classify records of data, but the method can be used to determine “splits” in the data that maximize the difference in winning percentage between the teams on either side of the split. The number at which the decision tree decides these splits should occur is our “winning threshold” — the value of the statistic where the chances of winning significantly improve.
Using data going back to the 2002 season, where illegal defense rules were changed, I looked at the values of every Four-Factor statistic and the winning percentage for every season for every team, 387 in all, to determine where these winning thresholds lie. For those statistics that are more significant in predicting wins; (like EFG%, illustrated below); the decision tree can find an entire decision tree worth of splits.
The decision tree image above can be interpreted as follows: The “n” in each box represents the number of teams in that branch of the decision tree. The number above the box represents the value of the split, or our “winning threshold” — the value at which the decision tree decides to best divide the teams by winning percentage. The “predicted” value in each box represents the actual combined winning percentage of the teams in that decision tree branch. Looking through the decision tree, you can see how significantly eFG% affects winning percentage.
Not every statistic is as illuminating as EFG%, but the decision tree is able to find at least one split for all four factors as well as three-point rate. The results are tabled below:
Statistic | Initial Split/ “Winning Threshold” | Winning Percentage Below Threshold | Winning Percentage Above Threshold |
EFG% | 49.2% | 43.0% | 58.3% |
TO% | 13.65% | 54.6% | 45.3% |
OREB% | 22.85% | 55.1% | 49.7% |
FTA/FGA | 21.8% | 45.4% | 52.7% |
Def. EFG% | 50.2% | 54.9% | 37.9% |
Def. TO% | 14.75% | 49.0% | 55.4% |
DREB% | 74.05% | 47.1% | 57.8% |
Def. FTA/FGA | 22.4% | 54.5% | 46.3% |
3 Point Rate | 23.8% | 47.5% | 56.0% |
The astute reader may notice that the combined winning percentage doesn’t always equal 100%. That’s OK! The splits do not divide evenly in terms of count, and there are a different amount of teams on each side of the line. A really astute reader may notice something else interesting: teams with a lower OREB% have done better since 2002. This is no typo. This is indeed true. This may be because offensive rebounding is not as important as teams think it is, it may be because winning teams are prioritizing other areas, or it may be because offensive rebounding is affected by how often you get to the line, another positive offensive trait. (The more you get to the line, the lower your OREB% will be.) This is a subject for additional research.
But the story the thresholds do tell is interesting. For example, it’s REALLY hard to be a winning team that has a Defensive eFG% above 50.2%! Since 2002, the teams that have allowed such a high eFG% against them have had a combined winning percentage of only 37.9%. If your team is allowing above 50.2% eFG%, they better be really good at everything else if they want to make the playoffs. The three teams above this threshold last season? The Philadelphia 76ers, the Detroit Pistons, and the Milwaukee Bucks.
As mentioned before, for the more significant variables (like eFG% before), a classification and regression tree can find multiple splits. For us, this means not just thresholds, but multiple ranges of winning percentage.
EFG% | Less than 45.9% | Between 45.9% and 49.2% | Between 49.2% and 51.6% | Above 51.6% |
Win% | 30.2% | 44.3% | 55.2% | 67.3% |
TO% | Less than 12.45% | Between 13.65% and 12.45% | Between 13.65% and 14.75% | Above 14.75% |
Win% | 61.8% | 52.9% | 47.0% | 40.7% |
3 Pt Rate | Less than 19.6% | Between 19.6% and 23.8% | Between 23.8% and 24.0% | Above 24.0% |
Win% | 45.2% | 47.5% | 66.2% | 55.6% |
Def. EFG% | Less than 47.8% | Between 47.8% and 50.2% | Between 50.2%And 51.2% | Above 51.2% |
Win% | 59.7% | 52.0% | 43.3% | 32.7% |
Def. TO% | Less than 12.95% | Between 12.95% and 13.45% | Between 13.45% and 14.75% | Above 14.75% |
Win% | 50.0% | 44.1% | 50.5% | 55.4% |
DREB% | Less than 71.45% | Between 71.45% and 74.05% | Between 74.05% and 75.55% | Above 75.55% |
Win% | 43.6% | 49.1% | 55.7% | 63.1% |
Def. FTA/FGA | Less than 19.4% | Between 19.4% and 22.4% | Between 22.4% and 26.8% | Above 26.8% |
Win% | 60.2% | 52.9% | 47.0% | 41.8% |
Many of these variables split even further along the decision tree, (like eFG% before), but the ranges above give a nice, simple idea of what’s going on. While this research doesn’t illuminate anything that basketball junkies don’t already know, (excelling at The Four Factors helps teams win games), it is nice to have an exact number to reference when looking at team stats online. If a team is on the right side of these thresholds, the recent historical odds are in their favor.