Projecting International Prospects, Part 1

Jul 18, 2014; Chicago, IL, USA; Chicago Bulls head coach Tom Thibodeau (left), new player Nikola Mirotic (middle) and general manager Gar Forman pose for a photo after a press conference at the United Center. Mandatory Credit: David Banks-USA TODAY Sports

I am not an expert on international basketball. I get excited about the Olympic and World Cup competitions, I watched a number of Barcelona games waiting for Ricky Rubio to cross the Atlantic, and I even caught a Besiktas game while in Istanbul (fellow Minnesotan Khalid El-Amin was playing). However, that is the extent of my basketball experience outside of the United States. I do not have the devotion to regularly watch second-tier Bundesliga games or grainy Chinese-League footage. I love basketball, but keeping up with the NBA and NCAA consumes enough of my time as it is. This is a problem for someone interested in the NBA Draft. The current CBA makes euro-stashes a very attractive investment, and that means I spend most of the second round listening to exotic sesquipedalian names I have, at best, glanced at on the DraftExpress prospect rankings.

I am trying to fix this.

I started researching the history of the different European basketball leagues and followed as many international basketball experts on Twitter as I could find. However, it will be a long time before I can pretend I know what I am talking about. Thankfully, I do most of my basketball analysis with statistics. While specialized knowledge is essential for developing strong theories, easily identifying data errors, and laugh-testing outputs; sometimes you can get pretty far simply letting the numbers do the thinking for you.

International basketball poses unique challenges to statistical projections, but the huge potential payoff makes them worth pursuing. International prospects are much more difficult than college players for NBA fans and professionals to evaluate. International scouts log obscene hours traveling from one far-flung destination to the next, and obsessive fans are often forced to develop opinions based on 30-second Youtube clips. Objective systems of evaluating statistical production offer quick (possibly wrong) answers which set priors and potentially guide where to invest time in more detailed scouting.

I am writing a short series of articles outlining my process of building an international draft model. In doing so, I hope to bring fellow newcomers a step towards understanding international basketball, provide novel insight into the relationship between different leagues and which skills translate to the NBA, and make people more comfortable judging when to believe or not believe the results my models spit out.

The Data:

Thanks to DraftExpress, I have individual and team data for the top-tier clubs in Italy, Greece, France, Spain and the Adriatic league, as well as the tier-1 (Euroleague) and tier-2 (Eurocup) interleague club competitions. Thanks to FIBA’s archives, I also have the same basic individual and team box-score data for the Olympics, FIBA World Cup, FIBA Europe, FIBA Americas, and three junior competitions; the Junior World Cup and both the U18 and U20 European Championships. All of these data are complete for most of the 21st century, and many of the global competitions extend deep into the 90s, with full Olympic data into the 80s. Putting this all together results in ~30,000 observations of ~10,000 unique players. However, only a small subset of these players ever logged an NBA minute. These are the key observations, since my goal is to use past examples of players in both international competition and the NBA to project future players. Limiting the sample to only players with some NBA experience I am left with 890 players.

Preprocessing:

There is nothing wrong with a sample of 890, but compared to the 2,700 players used to fit my NCAA models it is not particularly impressive either. Exacerbating any sample-size concerns are a couple additional issues that are not faced by NCAA-to-NBA projections. Rather than assessing strictly top young athletes who were recruited to the NBA, many observations come from marginal American players who had a cup of coffee in the NBA before being cut, forcing them overseas. I am banking on these players being useful for projecting up-and-coming international prospects, but there is a reasonable argument that the relationship between their performance abroad and in the NBA is different than that of the focal prospects.

The other major issue is that the competitive venues included in these data are quite different, particularly in terms of strength of competition and pace-of-play. Most European basketball hovers right around 1.8 possessions per minute with the French and Italian leagues playing a bit faster at 1.85 and 1.86 (for comparison, modern NCAA pace is ~1.75 and NBA is ~2). Meanwhile, Olympic and FIBA America basketball hits an average pace between 1.9 and 2 possessions per minute. Interestingly, that Olympic rate has been surprisingly steady since 1988. The Junior World Cup is the fastest competition with an average pace of 2.05 possessions per minute, and the European junior competitions fall in line with their senior counterpart. Thankfully, I have team data which makes pace simple to adjust not just across leagues, but also across teams. I use the formula ((FGA + TOV + FTA*0.44 – ORB)/MP) to calculate the number of possessions players on each team are expected to see per minute of play, and then convert each player’s production to per-possession rather than per-minute. This ensures that players are not helped or hindered by their team’s relative speed of play.

Adjusting for strength of schedule is a bit more complicated, but because many players compete in different leagues within the same season it can be managed. I only need to look at the historical difference in production for individual players in multiple leagues within the same year to set expected increases or decreases in performance. The Euroleague is the highest-level professional competition and it pits top clubs from different pro-leagues against one another. This makes it the perfect baseline to adjust performance across leagues. In order to capture all of the above-mentioned competitions, I first translate box-score statistics for players in different European top-tier club leagues (and FIBA Europe) into expected Euroleague production. I then transform production in international tournaments to those transformed European box-scores. Finally, I transform junior tournament production to the complete set of transformed senior competitions.

It is important for any projection model to adjust for strength of competition, however, I prefer not to stop there. Strength of competition does not necessarily impact all skill-sets in the same way, and different leagues may inflate one statistic while deflating another. These factors mean that information is lost in any simple SOS adjustment. Instead, I use an entire transformed box-score in my projection model. These box-score transforms not only help accurately fit the projection model, but they also give a nice picture of how different international leagues vary:

The above chart shows the expected percentage change in a particular statistic for a player moving from the league identified on the left into the Euroleague. For example, it is widely known that the French league is uniquely pass-happy (apparently this is why the Spurs love French players) and this comes out in the data. French leaguers have historically only collected 78% as many assists as they do at home while playing in Euroleague competition. Unsurprisingly, the biggest transformations are for junior competitions. Juniors playing with the big boys can expect their minutes, usage, and scoring rates to plummet while their foul rates skyrocket. Interestingly, some numbers are not impacted nearly as much. It looks like if you can make shots within the offense, collect steals, and hit the offensive glass even in junior competition, you may be able to compete against tougher competition as well.

NBA data:

International data covers one side of the ledger, but historical NBA data is also necessary in order to teach the projection model how international production translates into NBA performance. In this case, whatever indicator of NBA performance I choose will act as the “dependent variable” in the model. This is the thing that the model will eventually try to predict for upcoming prospects. Choosing a strong dependent variable is essential, because any outputs are going to be worthless if you cannot even convince people that the thing you are predicting is important.

The goal is to choose a measure that does a good job of accurately describing “good”/”bad” in a way that corresponds with reality (or at least something most people will agree with). Modelers typically use one of the popular “One stat to rule them all”s (WS, PER, WP, RAPM…) either as a rate or accumulated win metric, or some other measure like total minutes played. Here I use the same standard as my NCAA projection models. I take the average of NBA players’ Win Shares and “RAPM-wins” (RAPM converted to an accumulated wins metric like Win Shares) in each season. Then, I use those scores to calculate a rolling two-year average throughout each player’s career. The highest two-year average is that player’s “Win Peak”, which is the value I use for the dependent variable. The result is a largely uncontroversial description of player performance. That said, I do not expect you to take my word for it, so I make historical scores on the dependent variable available for anyone to judge for themselves (link).

Conclusion:

Projecting young international prospects into the NBA is an important but difficult and costly venture. It looks like there is a role for statistical models in this process, but there are definitely challenges to overcome. The sample set simply is not as clean as it is for NCAA prospects. Players do not funnel between international competition and the NBA in the same consistent pattern, and rather that sit in one nicely controlled league, they often collect relatively few minutes of action across several different leagues and teams. I have now walked you through the process of trying to address some of these concerns. I have pace-adjusted and league-transformed international data and a valid dependent variable to go with it. The data is ready, so the next step is to put it to use. There are many paths to take in putting together a projection model, and mine may not necessarily be the best one. That said, I am happy with the end product and want to take you through the process of putting it together. That is going to be the topic of my next article, Part 2 in this series, coming soon.