Confidence Intervals: is it too early to say that your team sucks?

Nov 3, 2015; Dallas, TX, USA; Dallas Mavericks forward Dirk Nowitzki (41) yells at the referees during the second half of the game against the Toronto Raptors at the American Airlines Center. The Raptors defeat the Mavericks 102-91. Mandatory Credit: Jerome Miron-USA TODAY Sports /

Last year, the Dallas Mavericks started blazing hot, culminating in a 140 to 106 win in their 13th game. It certainly a coincidence that this game was against the Los Angeles Lakers, who had a year long battle with the Timberwolves about which team could sport the worst defense. Speaking of the Wolves, the Mavs scored 131 points against Minnesota in Dallas’ 10th game. Up to game 13, the Mavericks scored more than 120 points per 100 possession[1. The definition of possession can slightly vary between sources. Here, I have used Darryl Blackport’s data for my calculations.]. 69 games and a Rajon Rondo trade later, the Mavericks were down to 110 points per possessions. Still a healthy overall figure, but several teams had passed them.
Why am I telling you all this? Because every year during the early days of the season, people go bonkers over team ratings. In reality, a lot of early statistics are a mix of skill, opponent strength[2. It helped a team’s early season defensive rating A LOT to play against the 76ers during last year and most likely this year as well.] and sheer luck [3. or unluck, if you look at the Rockets three point percentage during the first games.]. Still, the games count, and at same point we can start to learn things about which teams are good and which are not so go.

To examine the question of “how soon is too soon?” I used game data from the last two years to get a better idea[1. I am pretty sure that there are previous studies about this (99% chance somewhere in the ABPR forum), so these are hardly original thoughts on my part.].

How long until offensive or defensive rating stabilize?

Stat stabilization is in some regards a question of semantics a definitions. In this instance, I made some assumptions that a team’s actual quality would stay relatively stable. I used Games 42 to 82 as a proxy for a sort of “actual rating” and then compared it to the offensive and defensive ratings for each team from Game 1, Game 1 til 2, Game 1 til 3, an so on up to Game 1 to 41. Essentially, I compared the second half of the season with the first half in varying slices. As shown in the charts below, the correlation between first half “slice” and second half whole becomes slowly but surely more robust:

After7_41 /

The reason I took only the second part of the season as “actual net rating” is that I wanted to have a kind of “out of sample” comparison. The following figure shows the result for the last two years

stabilization_OffDefInk /

The dotted line describes a correlation of 0.7, something that sports statisticians use to say “I am halfway certain that it’s skill and not luck”. The whole thing is more a rule of thumb anyways, but the indication is that you should probably wait until Game 15 to completely discard your preseason expectations about a team’s offense. Evaluating defense takes a bit longer, which is mostly because team defensive ratings differ less on defense than they do on offense[4. Last season the worst offense was around 17 points per 100 worse then the best offense, but the worst defense only around 11 points.].

Is recency an important factor in Team ratings?

There has been studies on this as well and my answer is also a clear “probably not”. What I did was to additionally look at the correlation between my “actual offensive net rating” and the net rating for Game 41, Game 40 to 41, Game 39 to 41 and so on. Basically to invert the beginning of the season and to start the comparison with the games that are closer to the second part of the season. And the result is that the correlation is higher for the 13-14 inverted season and lower for 14-15:

stabilization_recency_Ink /

So, if you want to say something about the quality of your team, it is usually better to look at as many games as possible and not only at the most recent short trend [1. With the enormous caveat that if a team recently had a major makeover, such as the Cavaliers last January, you might be better off starting your evaluation over from a blank slate.].

Home/Nylon Calculus