# Deep Dives: Measuring Level of Competition Around the World

I am interested in projecting players into the NBA. I have done a lot of work projecting players from college, and in the past couple of years expanded my projections to some of the bigger international professional leagues. I am currently finalizing my efforts to include every league for which I can find publicly available data. I will make these projections available here at Nylon Calculus soon, but in the meantime I want to share one piece of this puzzle that serves as an interesting stand-alone resource.

One key part of projecting across different leagues is quantifying the relative strength of those leagues, or more specifically, identifying how heavily to weight performances in different venues in order to arrive at the most accurate player projections. Even if you do not ultimately put much stock in my projections, knowing how the many different competitions around the world stack up might make it easier to balance your subjective assessments of player performances.

The Method:

[Warning… the following gets a bit hairy. If you just want to see the rankings and commentary, skip past this section. If you want to be able to understand how I got there, read on.]

The process starts with a regression model that uses a collection of variables including age, per-possession boxscore statistics, and height to project NBA performance. The goal of this type of model is to find the weights for these variables that do the best job of accurately recovering the observed value of players in the NBA based on their pre-NBA production. Unfortunately, because this regression includes player-seasons from competitive venues as dramatically different as the Olympics and domestic European B leagues, it is going to lead to some whacky results. Not only will the resulting model under or overrate players depending on their competitive venue, but it will have a tough time understanding the true relationship between statistical production and ultimate NBA performance because it is clouded by strength-of-schedule effects. In order to get around this problem, I include a term that finds the optimal value to debit players in each competition in order to arrive at accurate predictions. As a result, I get “fixed effects” that give a reliable weighting of the importance of each variable, and “random effects” that take a stab at explaining the difference between all of the possible competitive venues.

However, this does not get us far enough. Not only is the sample of players who competed in the NBA and many of the international leagues extremely small, but there are even more leagues I am interested in projecting that have never produced an NBA player. To build more trustworthy competition rankings, and include the entire set of leagues, I need to take an additional step. At this stage, I project every player-season in the data into the NBA using the fixed effects I found above. This gives competition-agnostic projections for a player based on his production in a given venue. To this point, players in the Irish Superleague are held to the same standard as those in the Spanish ACB. Obviously this needs to be corrected. I accomplish this by looking at how the projections for players in each competition differ on average to those same players’ scores in other venues. This results in a collection of relative scores that are ultimately anchored in the random effects I found above. For example, I may not know how a player from the Georgian Super Liga does in the NBA, but I know how he performs in Eurobasket and from the random effects above, I have an estimate for how Eurobasket performance translates to the NBA. The average increase or decrease in ‘value’ between the Georgian Super Liga and Eurobasket gives me one estimate of competition value by simply adding that difference to the original estimated value for Eurobasket.  I do the same to arrive at unique estimated values for the Georgian Super Liga relative to each of the other leagues that it shared players with. I then calculate a single estimated competition value using the mean across those estimates (weighted by number of cases). This gives a first approximation for the relative value of all the competitions not initially captured above. It also improves the estimates for the random effects by taking information from many additional between league tranformations. After this step it gets a little more complicated, because in improving the estimates, all of the values that were used to find relative values are themselves moving around.  In order to find a stable set of relative values, I need to iterate this process many times until the estimates settle into a final stable set of values. After all of this is finished, I translate the values into standardized scores such that the most average competitive venue in the data scores a 0, and a venue that is one standard-deviation more difficult than average scores a 1, while a venue one standard-deviation less difficult scores a -1.

Relative Competition Scores:

One caveat before we move on to the numbers.  These are not exactly rankings for strength of competition in different conferences.  Instead, these are a measure of what statistical production in each league says about NBA potential.  A player in a competition with a lower relative score needs to put up more impressive numbers to be considered on the same level as a player in a higher scoring competition.  Obviously this concept is highly correlated with “strength of competition”, but it is not exactly the same thing.

Here are the results: