Nylon Calculus 101: Creating SportVU Game Logs in Python
It’s been two years since NBA.com first rolled out their SportVU player tracking stats. As great as these stats are, there was one thing that many people who have spent a lot of time working with this data felt was missing – the ability to filter these stats by date or dial down to the individual game level. Fortunately, a few days ago an update was made that included this feature. Up until now we could just see per game averages for all these stats but with this new update we can now get a breakdown for all these stats on a day by day basis. In this post I will walk through how to use the date filtering feature to get game logs for the one of these SportVU stat categories for a single day using Python 2.7. If you want to do some digging to a full season’s worth of these stats you can build upon this to get these game logs for all the stats for a full season.
Let’s start by importing the packages we will use.
In [1]:
import json import requests import pandas as pd from IPython.display import display
Since we are going to be pulling a few different stats from stats.nba.com we should have a function that can get what we need into a format we can work with. The function below gets the data we want for a given base URL and set of parameters from the NBA stats API and returns a list of dictionaries where each dictionary contains the stats for a player.
In [2]:
def get_data_from_url(base_url, parameters, index): response = requests.get(base_url, params=parameters) data = response.json() headers = data['resultSets'][index]['headers'] rows = data['resultSets'][index]['rowSet'] return [dict(zip(headers, row)) for row in rows]
To get game logs for a single day, we need to pick a date.
In [3]:
date = "10/29/2014"
Now we need to get the player tracking data for that date. Since we can now filter these stats by date, this is pretty simple to do using the NBA stats API. Below is the base URL for the player tracking stats along with the needed parameters. The key parameters we need to set are DateFrom, DateTo, PlayerOrTeam, PtMeasureType, Season and SeasonType. Most of these should be pretty self explanatory. The PtMeasureType parameter is just the player tracking stat we want to get. For this example we will get the possessions/touches stats. There are a lot of parameters that are blank that you can play around with as you see fit, but for making the game logs they aren’t needed.
In [4]:
player_tracking_base_url = "http://stats.nba.com/stats/leaguedashptstats?" player_tracking_parameters = { "DateFrom": date, "DateTo": date, "LastNGames": 0, "LeagueID": "00", "Month": 0, "OpponentTeamID": 0, "PORound": 0, "PerMode": "Totals", "PlayerOrTeam": "Player", # use Team for team stats "PtMeasureType": "Possessions", # change this for different player tracking stat "Season": "2014-15", "SeasonType": "Regular Season", # use Playoffs for playoff stats "TeamID": 0, "Outcome": "", "Location": "", "SeasonSegment": "", "VsConference": "", "VsDivision": "", "GameScope": "", "PlayerExperience": "", "PlayerPosition": "", "StarterBench": "" }
We can use the get_data_from_url function we made to get the player tracking data for the above parameters.
In [5]:
player_tracking_data = get_data_from_url(player_tracking_base_url, player_tracking_parameters, 0)
Let’s take a look at what this data looks like for a single player.
In [6]:
player_tracking_data[0]
Out[6]:
{u'AVG_DRIB_PER_TOUCH': 7.4, u'AVG_SEC_PER_TOUCH': 6.5, u'ELBOW_TOUCHES': 0, u'FRONT_CT_TOUCHES': 42, u'GP': 1, u'L': 0, u'MIN': 24.0, u'PAINT_TOUCHES': 0, u'PLAYER_ID': 201166, u'PLAYER_NAME': u'Aaron Brooks', u'POINTS': 13, u'POST_TOUCHES': 0, u'PTS_PER_ELBOW_TOUCH': 0.0, u'PTS_PER_PAINT_TOUCH': 0.0, u'PTS_PER_POST_TOUCH': 0.0, u'PTS_PER_TOUCH': 0.289, u'TEAM_ABBREVIATION': u'CHI', u'TEAM_ID': 1610612741, u'TIME_OF_POSS': 4.9, u'TOUCHES': 45, u'W': 1}
So that looks good, it has all the data there, but it is missing one key element that might be of use for doing more analysis – the game id. We need a way to add the game id. Fortunately it is pretty simple to find the game ids for all games played on a specific date. We can do this by getting the scores for the day and extracting the game ids.
In [7]:
game_ids = [] date_base_url = "http://stats.nba.com/stats/scoreboardV2?" date_parameters = { "DayOffset": "0", "LeagueID": "00", "gameDate": date } games = get_data_from_url(date_base_url, date_parameters, 1) for game in games: game_ids.append(game['GAME_ID'])
Now that we have a list of games, we can get all the boxscores for those games and create a dictionary of key, value pairs that maps each player who played in those games to the id of the game in which they played.
In [8]:
player_game_map = {} boxscore_base_url = "http://stats.nba.com/stats/boxscoretraditionalv2?" for game_id in game_ids: boxscore_parameters = { "GameId": game_id, "StartPeriod": 0, "EndPeriod": 10, "RangeType": 2, "StartRange": 0, "EndRange": 55800 } player_boxscore_data = get_data_from_url(boxscore_base_url, boxscore_parameters, 0) for player_data in player_boxscore_data: player_game_map[player_data["PLAYER_ID"]] = player_data["GAME_ID"]
This player game map is the key to adding the game id to the game logs. We can loop through the player tracking data and add in the correct game id for each player.
In [9]:
for i in range(len(player_tracking_data)): player_tracking_data[i]["GAME_ID"] = player_game_map[player_tracking_data[i]["PLAYER_ID"]]
To examine the data, let’s put the results into a pandas data frame and take a look at the first five rows.
In [10]:
player_tracking_df = pd.DataFrame(player_tracking_data) with pd.option_context('display.max_columns', None): display(player_tracking_df.head())
AVG_DRIB_PER_TOUCH | AVG_SEC_PER_TOUCH | ELBOW_TOUCHES | FRONT_CT_TOUCHES | GAME_ID | GP | L | MIN | PAINT_TOUCHES | PLAYER_ID | PLAYER_NAME | POINTS | POST_TOUCHES | PTS_PER_ELBOW_TOUCH | PTS_PER_PAINT_TOUCH | PTS_PER_POST_TOUCH | PTS_PER_TOUCH | TEAM_ABBREVIATION | TEAM_ID | TIME_OF_POSS | TOUCHES | W | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 7.4 | 6.5 | 0 | 42 | 0021400010 | 1 | 0 | 24 | 0 | 201166 | Aaron Brooks | 13 | 0 | 0.000 | 0 | 0.000 | 0.289 | CHI | 1610612741 | 4.9 | 45 | 1 |
1 | 0.5 | 1.5 | 7 | 46 | 0021400008 | 1 | 1 | 31 | 5 | 201143 | Al Horford | 12 | 5 | 0.286 | 0 | 0.000 | 0.171 | ATL | 1610612737 | 1.8 | 70 | 0 |
2 | 0.4 | 2.1 | 3 | 45 | 0021400004 | 1 | 0 | 40 | 2 | 2744 | Al Jefferson | 14 | 8 | 0.667 | 0 | 0.375 | 0.230 | CHA | 1610612766 | 2.1 | 61 | 1 |
3 | 0.6 | 1.3 | 0 | 15 | 0021400006 | 1 | 1 | 25 | 0 | 101187 | Alan Anderson | 4 | 1 | 0.000 | 0 | 0.000 | 0.148 | BKN | 1610612751 | 0.6 | 27 | 0 |
4 | 2.5 | 3.1 | 0 | 50 | 0021400012 | 1 | 1 | 36 | 0 | 202692 | Alec Burks | 18 | 0 | 0.000 | 0 | 0.000 | 0.305 | UTA | 1610612762 | 3.0 | 59 | 0 |
5 rows × 22 columns
Now we have the game logs for the day’s games. Most of these column names should be pretty self explanatory but if you ever need to check to see what a heading means you can just mouse over the header on the stat page to get a more detailed description. To get the game logs for a different stat you just need to change the PtMeasureType parameter when getting the player tracking data. To get the full season’s game logs you can just loop through all the days of the season and run this for each day. If you have any questions feel free to ask me on Twitter.