Based on readily available data and tools, most analytical projects estimate general performance standards against which players are measured. Think of shot quality. We establish an “expected” level of shooting success (usually, the league average or the baseline for similar positions), then determine the factors that can help individual players either reach or surpass these targets.
But what if we can move beyond such benchmarks? What if we have enough granular data and technical resources to identify the specific sets of actions — under certain conditions, within a particular timeframe — that enable players to make shots? “Personalized” analytics can provide a clearer, more customized picture of an athlete’s unique path to success.
These questions underpin the work of Patrick Lucey, the Director of Data Science at STATS. His findings often make their way to the Sloan Sports Analytics Conference and other research outlets. In the following interview (presented in full, with slight edits for readability), we discuss his latest work on 3-point shooting and player body movements. We also touch on the technological breakthroughs that are facilitating a new wave of sports analytics, the issue of reproducibility and the steps to entering the industry.
Nylon Calculus: You recently had a research paper at the Sloan Sports Analytics Conference that examined shooting styles using a player’s body pose. What drew you and your co-author Panna Felsen to conduct this study? What did you hope to accomplish with your project?
Patrick Lucey: That’s a great question, and by the way, Panna was amazing. She did amazing work; it was a pleasure to work with her. It was a lot of fun.
Basically, here at STATS, we just have an amazing amount of data. That’s why I came to work here. It’s just a dream to work with all the data that we have, and there are so many things that we can do. So my big thing is, I think the data that we have tells a story. It basically reconstructs the story and how the game’s being played. Then we have a different lens on how we can actually see what’s important about the game — whether it correlates with match prediction or player performance or whatnot.
As you know — and you guys have done some great stuff reporting on the SportVU data and some of the analytics that occur using that data source — we’re kind of pioneers [in this area]. But it’s limiting in terms of the stories that you can tell…. Not everything’s contained there. And a big part of [addressing] that is using body pose.
So Panna interned with us. We’re building this really strong data science group here; we’re investing a lot of money; and we have internships. Then we thought, “What would be really cool?…And wouldn’t it be cool if we could get body-pose information? And, if we could, what types of things could we do with it?”
That kind of led us to exploring what we could do. We captured a lot of data, and the paper was the result based on that. Basically, it’s asking these kinds of fundamental questions. What can we do with this new added information? How can it improve our analysis? How valuable is it? What types of things do we have to look at? And that just opened up a lot of interesting questions that we just wanted to answer and share with the community because, ultimately, we’d like this stuff to be used….
As you know, basketball’s at the forefront of analytics, mostly due to SportVU having that tracking data. But we can still do a lot more. We’re just scratching the surface.
NC: In the paper, you and Felsen observed that, although STATS SportVU tracking data have contributed to the “basketball analytics revolution,” they’ve been limited in helping us understand how players execute specific skills. Meanwhile, recent advancements in computer vision have made it easier to capture body pose information. Can you please elaborate on what these technological advancements are and how widely available they might be to those who work on sports analytics?
Lucey: When SportVU came out, it was a massive innovation. Computer vision kind of fuels that. Deep neural networks have been around a very long time — came out in the ‘50s. But it’s only with the matching of the amount of data that you have, coupled with computational resources like GPUs and the architecture with neural nets, that you can just do amazing things in computer vision. It’s really a pervasive technology….
Sports is a nice microcosm of all these environments. Since we have the data, we can start answering or looking at the [aforementioned] questions.
NC: Using this technology, you and Felsen identified 17 attributes that describe player movement during a 3-point shot, then pinpointed the ones that correlated most to shooting success. Which attributes seemed to matter most?
Lucey: I want to take a step back there. So attributes are just one way of representing pose. And this is a thing that we’re really good at as a group — you can think of the data that we have as a sort of unstructured data. We just need the computer to get it into form, so we can understand it and do comparisons.
One representation, which correlated well with some findings that we had in the paper, was using these 17 attributes. A big part of that was looking at balance. That was a strong attribute. But that was really relevant depending on the type of shot that a player took — whether someone’s open or it was a tough shot. So that correlated.
Another big part of the paper was actually looking at the skeleton. So a cool part of that is: given the 2-D input from the skeleton, how can we estimate that in the 3-D, and how do we normalize that (both in space and time) to do some cluster analysis? Then you can do comparisons when someone made a shot or missed a shot. And that also can be used to detect any types of anomalies, as well.
The nice thing I like about this paper is that we look fundamentally at the idea of how to represent basketball shots via the skeleton. You can just do it via the raw skeleton, or we can map it to these attributes. We can just correlate it with various things based on that.
Does that make sense?
NC: Yeah, that makes perfect sense. Basically, that’s one dimension or one way to represent a basketball shot, right?
Lucey: Yeah, the thing is, though, you have these representations and context is so important. First of all, you can map it down to this context, and these representations, when we normalize or align them, allow us to linearize the data so we can make comparisons. Because otherwise — and I won’t go into too much detail — there’s a lot of variation there. What you want to do is get into the same frame of reference so comparisons can be made… Basketball, especially at the granular level, is very complex. We first have to model the nonlinearities… The big thing is just getting that right initial representation…
NC: You and Felsen focused on Steph Curry, observing that “he moves more than other players in every phase of his shot.” Were there other players who perhaps might have been on the opposite end of the spectrum, but still had some success? What takeaways did you glean from these other cases?
Lucey: The big takeaway is that each player is different. It’s not a shocking finding, but to do that, you have to model each player differently. So the idea of personalization is key.
Like I said, as a group [at STATS], we do certain things very well. One is understanding and getting that right representation. But another really good thing that we do is have the ability to adapt models and personalize them. So, if we’re doing prediction or analysis, we’re going to do it best when we’re just comparing [players] to themselves because everyone is different, especially at that granular level…. To do meaningful analysis, you really have to have a per-player model per context.
Of course, you need a lot of data to do that, which we have, but that’s the main takeaway there.
You asked about another interesting player, and we looked at Klay Thompson. He’s different. He’s very balanced. He’s more of a pure shooter. But again, it’s about having the ability to personalize and making sure that we can just map each player to his own model, instead of having this general model, which isn’t going to pick up these very subtle things that we have with this granular data.
NC: In general, were there any other findings that surprised you, piqued your interest, or seemed promising for future study?
Lucey: Just the idea of personalization… you need to be able to personalize per context and per player. Then how do you adapt that over time? Someone can have a certain style. How does it change from season to season?
You look at [Kawhi] Leonard. He wasn’t a great 3-point shooter when he first came into the league. How did that change over time? Now that’s the key when you’re dealing with big data. You can aggregate all this data, but sometimes the data that you collect doesn’t represent what the player is now. So how do you adapt that? I think that’s really fascinating. If you can select that segment of time which best represents them, that’s interesting. And then it can lead to other analyses in terms of form and valuation and how they perform within the team…. It’s more of a research thing, but I can see some real applications there.
NC: Are you aware of any teams or players who have put the “Body Shots” research findings to use? If so, how have they used them? If not, how do you envision the potential application of the study?
You obviously mentioned personalization and maybe being able to predict how a player might evolve in terms of his shot. Are there other applications or thoughts that you might have here?
Lucey: Well, this is one of the holy grails in sports analytics. A lot of work’s been done over the last ten years in actually capturing body pose, but it’s just not been done within the game. So you have motion capture. You have Vicon. You look at NBA 2K games. They kind of have these mocap suits and they pick it up. But no one’s been able to do this in games. So that’s what really excites us — to be able to do in-game measurements.
You look at the combine and all these things. It doesn’t really reflect how [players] are going to perform in the game. If we’re going to predict how they’re going to perform in the game, we need measurements from within the game. This type of approach allows us to do that…
I think there’s so much to do in that space. Doing things in the wild — to compare to having just a lab setting… Being able to collect this information within the game will allow us to do the best in-game predictions because there’s little mismatch. You just want to minimize mismatch.
I don’t want to oversimplify, but that’s basically it, right? If you have infinite data within the game, you should be able to do the best analysis. A lot of shortcomings come from the fact that you just have mismatch or just not enough data.
NC: Last year, you and a group of fellow researchers won the SSAC research paper competition for your work on predicting tennis shot outcomes through style and context priors. Is there any substantive connection between this and the “Body Shots” studies? Is there a common thread that links all of your sports-analytics projects?
Lucey: The big thing with that [tennis] paper is personalization — how to get a model for a player against a certain opponent in a certain situation. So how can we do that in a smart data-driven way?
If you could see a common thread, we do two things extremely well. One thing is getting the right representation. What I mean by that is we don’t want to do any harm to the data. We get the right representation where we can do unsupervised learning. And, the second thing, that enables us to do this personalization.
People talk about deep learning a lot. Basically, deep learning is unsupervised feature learning. We’re learning the representation. But, to enable good representation learning, you need to have a good initial one. So that’s what we do really well, and that allows us to personalize.
In our paper last year, we showed that not only can we do better prediction, but having these models using our approach, we can actually make this interpretable as well. So that’s the nice thing with machine learning: we can learn these high-level descriptions of play, and then we can correlate them with certain interpretable things…
NC: In general, some scientists believe there’s a “replication crisis” — both in sports analytics and other fields. For example, Michael Lopez of Skidmore College recently noted that, while the SSAC research paper competition is based on “novelty of research, academic rigor, and reproducibility,” many entries do not use publicly accessible data, make code available, or can otherwise be reproduced. Do you share this observation? What do you think of the idea of a “replication crisis” in sports analytics and what might be done to address it?
Lucey: That’s a good question. My background is that I’m from academia. I do share Michael’s concerns.
But this isn’t just within sports analytics. This is a problem that faces most areas in artificial intelligence, machine learning and computer vision. It happens across every domain. It just so happens that we’re in the sports analytics field, and we’re very passionate about it. In sports, it’s fun.
Now, there are a couple of things here. First of all, we shouldn’t censor good ideas. If it’s a good idea, even though we don’t have the data or code available, we shouldn’t censor that. So we should just promote ideas and discussions… Let people talk about this…
Also, it’s kind of like planting seeds along the way. If we can have these ideas, we can have these descriptions, then it could promote better research or it could promote people releasing the dataset.
And we actually do that; I don’t think people actually know that. At STATS, we’re building this [data science] group, and we actually do share data. We had a NIPS paper at the end of last year with our great collaborators at Caltech. We released a basketball dataset. People can go to our STATS website and they can actually request that dataset.
But another thing, too, is that we are collaborating a lot. I think Panna’s a great example. You can come work at places like this, and you can write about what you’re doing. You can publish it. You can share it with the community…
On the flip side, there has to be a balance, because you have to consider that data collection and validation time are very consuming. You know, we spend 99 percent of our time just collecting and cleaning up data. There has to be some type of benefit for collecting that and sharing that idea and having that first-mover advantage. I think people would realize that. So we have to be thought leaders; we have to share data. We’re starting to do that, and it’s something we’re very mindful of…
I share the concerns. I’ve met Michael before, and he’s a very smart guy, written some great stuff. You know, there is a frustration out there that people publish these things and you can’t validate. I get that. There has to be a tradeoff.
NC: From your vantage point as Director of Data Science at STATS, what advice would you have for aspiring sports-analytics professionals who want to break into the industry and make an impact?
Lucey: My biggest advice is just to start somewhere, just to be proactive… Come up with a question and just start. The key differentiator is just having a body of work. You know, I think what you’re doing is amazing, just writing on the side. That’s awesome.
I kind of did that myself… I’d make notes and I’d actually start charting stats. A good friend of mine, Dean Oliver, did the same. He wrote a book. So you just have to be proactive and set yourself apart. Don’t be a fan — don’t just say, “I love sports,” but actually show that you know sports.
Another big thing: Know how to work with unstructured data. Know machine learning. Know how to program… Because if it can be done in an Excel spreadsheet, it’s solved.