“Basketball Words in the Data” – A Q&A With Second Spectrum’s Rajiv Maheswaran
By Seth Partnow
Communicating the insights of analytics in an actionable manner is a topic that never seems to go out of style. Especially as big data comes to the NBA, how does one parse that data in a way that is understandable and actionable for a great basketball thinker who has never had cause to develop any sort of knowledge basis in data science? Rajiv Maheswaran, CEO of pioneering analytics house Second Spectrum, perhaps unwittingly gave the best practical bit of advice I can remember hearing when he told me what Second Spectrum does is “mostly identifying basketball words in the data.”
Maheswaran is one of the biggest techno-optimists around, as demonstrated in this recent TED Talk:
But that optimism doesn’t blind him to the importance of speaking the language of basketball to convince often skeptical coaches and GM’s he is worth hearing out. Maheswaran spoke to me last week about his respect for the knowledge base of NBA coaches, the career prospects of MIT basketball players and a fair amount if detail about what Second Spectrum has learned in two seasons of consulting with multiple NBA teams on how to best make use of the SportVU dataset.
Seth Partnow: How have you managed to avoid some of the pitfalls of not “speaking the right language” to basketball people?
Rajiv Maheswaran: I think there are two things. First of all, NBA coaches in general – we never go in and tell coaches and front offices that we have something they don’t know, and they should listen to us. What we always do is say “we have the ability to get stuff out of this data that nobody else does. What would you like to know? What would you dream of having that you don’t right now?” And basically we’ll go in and try and extract that information for them. It turns out that we’ve been able to get lots and lots of things that are in the language of coaches or front offices that they find useful. The analogy we like to use is: we are going to build you the most wicked Iron Man suit you can possibly want. You tell us what gadgets you want on it, and we will go make it. You’re going to be wearing the suit, and you’re going to be driving it. You’re going to be able to do more with the suit on than without it. What information do you want that you don’t have the answer to, and how can we help you become a better version of what you want to be [as a team]. That’s really the philosophy of what we try and do. We don’t ever try to think we know more than them. What we’ve found from interacting with professional coaches and front offices is that they know WAY more than one would think they did if you just read about them in the standard press. They are ridiculously smart about the game. We’re just trying to make them even more powerful than they already are.
SP: You’ve talked a lot about the knowledge of Doug Collins, haven’t you?
RM: We (at Second Spectrum) are a lot of people who love basketball and are basketball fans. Doug is a guy who has been in the game for a long time. I may be wrong, but was he the number one overall pick in the draft?[1. In 1973.]
He’s played in the NBA. He’s coached for many many years. We never want to pretend we know more about basketball than Doug Collins. At one point, he came into our office and we were showing him one of our DataFX products and he froze it on a frame. And at that point, he spent 10 or 15 minutes explaining the purpose and intent of every single player on offense and defense and how the play was evolving the way it was. He was just amazing to watch with all the knowledge he had. And it got us thinking, with DataFX, what we have to do is make it easier for the world to understand what Doug Collins knows because Doug Collins knows a lot. We all felt so lucky – one more minute with Doug Collins made us all feel so much smarter because he knows so much. We showed him the interactive shot chart technology ESPN is using now. We showed him one shot chart and there were some tendencies of a particular player to shoot on one side versus the other side and he said ‘I know exactly why that is’ and he showed us exactly why that was happening. We were able to dive back into the data, and say, “yup, Doug is exactly right.”
SP: I imagine working with people like Doug, has increased your “technical basketball” knowledge, in terms of X’s and O’s rather than the data level?
RM: Yeah, I think that’s mostly because we interact with coaching staffs on a relatively frequent basis. Because when you they tell you what gadgets they want on that Iron Man suit, you have to step up your game, because if they say I want this, you have to understand what “this” is. It makes us all step up our games and become far more involved in the details of the game. A good example of that is we’ve learned how to identify 12 different kinds of off-ball screens – of which I knew zero types beforehand. And after we classified them, we showed them to some teams and they said “oh, we only use three of four of these, but it’s interesting that you have all 12.” That’s something we wouldn’t have been able to come up with all by ourselves, at least not all 12. There are people in the company, we have, I think five former MIT basketball players, including three MIT basketball captains. There are people who know the game, but still coming up with all 12 types? That’s not something we could have done in-house.
SP: As an aside, have you ever run into a situation where different teams use different verbiage for the same action – “oh, we call that this instead” – but it’s the same basic action?
RM: That happens all the time! But the beautiful thing is with computers that’s easily solved and every team can have their own language.
SP: And that can be different “rules” not just different names or labels for the same action for different teams?
RM: Let me think about an example: I might call 10 percent of these plays a ‘show’ instead of a ‘soft.’ We can then feed that information back into our machine and change all those plays instantly across the entire league. You’re not going to have someone go back and watch hundreds of thousands of pick-and-rolls over several years are you? But we can do that relatively simply through our machine learning technology.
SP: So it sounds like the best way to get into NBA data science right now is to play basketball for MIT?
RM: (Laughs). The best way to join Second Spectrum is to do that! No, no, we have people from all sorts of schools. People who have played all sorts of sports – football, baseball, soccer, tennis. We have people who are ridiculously good at computer science and ridiculously in love with sports. That’s generally a good combination.
SP: The first time we talked, I think you described Second Spectrum as the place for people who love data and like sports?
RM: Yeah, I’d agree. There’s a slight difference between the kind of people who work for Second Spectrum and the kind of people who work for NBA teams – and there are tremendous people working for teams. I’d put it this way, if you love data science and design first and you like sports a lot, you work for us. If you love sports and like data analysis and visualization second, you work for a sports team. If somebody from Second Spectrum were to leave [the company], they’d leave for another tech company, whereas somebody at a sports team were to leave, they would probably leave for another team.
SP: Have you had experiences of teams poaching your guys or anything like that?
RM: We’ve never lost anyone.
SP: Period? Once you’re in, you’re in?
RM: Well, we’re only two years in! We’re up over forty employees now and no one’s left.
SP: I know you can’t be specific about this[1. Many of Second Spectrum’s clients have asked not to be specifically identified as such.], but would you say about a third of the NBA is currently working with you?
RM: What I’ll say is that almost all of the top contenders work with us. We made the decision we are not just trying to work with “analytically-heavy” organizations. Everyone we work with is a top-notch organization, but not necessarily the ones where fans would think they are obviously into this stuff.
SP: But you’re working with teams that feel like they can get some use out of your services?
RM: We wouldn’t want to work with a client who we doesn’t think they can’t get value out of our products. If it doesn’t do them any good, it doesn’t do us any good. Everyone we work with, it’s a two-way partnership. We know there are things they want to get out of the data, and we want to help them get those things. We always want to work with people to help them get an advantage, and if they aren’t going to, that’s not good for either of us.
SP: Again, knowing you can’t be too specific, judging from this conversation and your TED talk, I imagine “screens” and identifying the different types of them whether on-ball or off ball are kind of a big thing you’ve helped teams work through?
RM: That was one of the interesting things we noticed when we started talking to teams. We were asking them what they wanted to learn from the data, and almost universally, every team wanted to understand every nuance of the pick-and-roll before they wanted to learn anything else. And that was interesting to me and indicated how important that seemed to be to the league. I certainly wouldn’t have thought that beforehand, you know there are pick-and-rolls, drives, post ups, isolations. But we’ve found the pick-and-roll to be of significant – significant – importance. We probably spent most of our first year working on 20 different dimensions of the pick-and-roll. The great thing to come out of that is we built a really good machine understanding of the pick-and-roll, and then from that we can look at a whole bunch of stuff relatively easily with the same technology.
SP: A common topic at Sloan the last few years is the accuracy in terms of the number or proportion of ball screens you can identify with machine learning. Is there a percentage you would put on the accuracy you can identify?
RM: That is the absolute most important thing, you have to be really accurate in terms of both precision and recall. NBA coaches know their stuff and you can’t be wrong in front of them. That is where we make our reputation. It’s relatively easy to get to 80 percent – one of our undergrad engineers got to 80 percent in about a week – and right now, I won’t go too into detail but I’ll say we’re in the high 90s in both precision and recall. And that’s a number that’s very very hard to arrive at because every percentage point above 80 is harder and harder than the percentage before it because you are dealing with harder cases to identify.
SP: It’s getting harder because you’re looking at some cases that look sort like what you want to capture but end up being something else and you don’t want the false positive?
RM: Yes. So the simple example, and I may be getting a little bit technical, but say you get 80% of the pick-and-rolls, but there’s a biased error. Say you’re missing all the rejects or slips[2. The screener rolls to the basket without actually setting a screen, perhaps anticipating his defender moving to double team the ball-handler.], then large sample size doesn’t help you there. Even if you’re going to look at people with hundreds and hundreds of pick-and-rolls, if your algorithm doesn’t capture and classify them, then your stuff is inaccurate. That’s the most important thing.
SP: So what are some of the things you can tell that weren’t really able to be discerned on a broad scale before?
RM: Even just identifying the pick-and-roll itself, I don’t think anyone out there has an algorithm as good as ours in terms of precision and recall. And then on top of that, we have further classifications – was the pick taken, rejected, did the screen roll, or slip? What did the ballhandler defender do, what did the screener defender do? What were the outcomes? I think the biggest thing we’ve gotten out of all of this information is equipping teams with information about defense. The automatic identification of pick-and-roll defense has had the biggest impact
SP: In terms of practical application, you’re a team looking for a point guard, okay how does this guy do against blitzes?
RM: Yes.
SP: And I imagine all of that is cued to video as well?
RM: That’s the beautiful thing, you can search all this information – how does he do against blitzes, against “up-to-touch,” against a “contain trap.” And then if you don’t believe the numbers you can hit one button and watch all the video so you can see that proves the identification.
SP: Changing topics, you guys have had enough success and a enough good press that I imagine you’re starting to look into branching into other sports as well?
RM: We have received inquiries from 10 to 15 sports, and many many things which are not sports, but do involve moving dots. I think American football, baseball and soccer are our next priorities.[1. When I spoke to Maheswaran at Summer League, he showed me this tweet as an example of the projects Second Spectrum is considering or has been approached about:
]
SP: You’ve talked a little about the subject matter expertise needed for basketball, there will be a similar learning curve for other sports as well?
RM: There’s two things that usually help us. The first is we know sports well enough and we know the math well enough that we can do something unique that someone in the sport might not think of because they don’t know the math. As an example, in basketball we were able to do things with rebounding using Voronoi tessellations and spatial probabilities. We prove ourselves by showing we have a unique capability in an area and then very immediately we go to people who are subject-matter experts and ask “what sorts of gadgets do you want on your iron man suit for this sport?” It’s an attitude of “what can we do to help you win?” We do have a number of subject-matter experts in house. We have an ex-NFL wide receiver who is our director of business development, David Anderson. Mike D’Auria and Noel Hollingsworth are two of our former MIT basketball captains.[1. From personal experience, Noel is an absolutely beast on the court. To quote Shaq, jump hook ‘em to death, Noel.] We have a Sports Advisory Group. Becky Hammon is on it, Shane Battier, Eric Winston from the NFL, Ryan Johnson from hockey. We try to surround ourselves with people who know what they are talking about.
SP: Back to basketball, quantifying defense is sort of the great white whale of basketball analytics. There still aren’t many good metrics, at least publicly, to this point. Do you feel you’ve made progress at closing the gap between our understanding of offense and defense?
RM: Everything we do, which is mostly identifying basketball words in the data, we do both sides, offense and defense. If we give information on offense we’ll give information on defense. I’m wouldn’t say we’ve solved defense, but we’ve made progress. There is a significant amount of information about defense that exists now than there was before this data existed.
SP: By “the data”, you don’t necessarily mean the stuff you’re doing, but the underlying SportVU tracking?
RM: Absolutely. Because you have the positions of everyone on the court. I think I mentioned this in the TED talk, but raw data is not useful unless you turn it into something that is consumable, or actionable. For example, the raw data without identifying the events in the data is not that valuable. But identifying the events in the data requires a certain amount of sophistication in machine learning and the big data architecture that Second Spectrum brings.
(this piece has been slightly edited from its original to correct slight transcription errors.)