Using the NHL API to analyze pro ice hockey data - Part 1

sgenzersgenzer Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959 Community Manager
edited December 2018 in Knowledge Base


Hello RapidMiners - 


Well my "Pellets and Pi" project is still collecting data so I decided to change gears and get into more sports data. The "Fantasy Football" challenge was fun but up here in Vermont, we are all hockey.  Plus the Winter Olympics are around the corner in Pyeongchang so perhaps more of you will be interested...


Part 1 of this is just getting the data. So the "National Hockey League" (yes it's both Canada and the USA but it's still the "National" Hockey League - don't get political on me) here in North America has a new API available to the public where you can download virtually everything about every game - including each play! It's pretty crazy.  And what's even crazier is that it's the only API I have seen where there is zero, yes zero, official documentation on this from the NHL.  Strange but true.  So there is a small group of dedicated data scientists who have been trying to reverse-engineer the API to get the endpoints and have been reasonably successful.  Most of the work is from Kevin Sidwar (see his excellent website here) plus some random R and Python code on GitHub.


So first I'd like to expand on Kevin's work by listing the root URL, all known endpoints, and some sample calls/responses.


Root URL: https://statsapi.web.nhl.com/api/v1/





/teams/[team id]

/teams/[team id]/roster


/people/[people id]



/divisions/[division id]



/conferences/[conference id]



/franchises/[franchise id]


/venues/[venue id]






And the most important ones:


/game/[game id]/content

/game/[game id]/feed/live


Yes that's right - LIVE FEED.  And not only play-by-play, but X and Y coordinate of where the player is on the ice when the play is made.  It's really cool.  For example take this game on Oct 4 with the Toronto Maple Leafs vs the Winnipeg Jets. First goal of the game was scored by Nazem Kadri in the 1st period.  OK fine - that's a box score.  But he hit that wrist shot exactly 84 feet from the center of the ice (at coordinates -84, 6 to be exact) which is right next to the goal. Pretty cool.  Here's the JSON snippet:


"result" : {
"event" : "Goal",
"eventCode" : "WPG212",
"eventTypeId" : "GOAL",
"description" : "Nazem Kadri (1) Wrist Shot, assists: James van Riemsdyk (1), Tyler Bozak (1)",
"secondaryType" : "Wrist Shot",
"strength" : {
"code" : "PPG",
"name" : "Power Play"
"gameWinningGoal" : false,
"emptyNet" : false
"about" : {
"eventIdx" : 93,
"eventId" : 212,
"period" : 1,
"periodType" : "REGULAR",
"ordinalNum" : "1st",
"periodTime" : "15:45",
"periodTimeRemaining" : "04:15",
"dateTime" : "2017-10-04T23:47:47Z",
"goals" : {
"away" : 1,
"home" : 0
"coordinates" : {
"x" : 84.0,
"y" : -6.0
"team" : {
"id" : 10,
"name" : "Toronto Maple Leafs",
"link" : "/api/v1/teams/10",
"triCode" : "TOR"


[Just in case you want to figure this out, the coordinates (0,0) represent the dead center of the ice. The x-axis goes lengthwise/goal-to-goal direction across the ice; y-axis goes across widthwise/perpendicular to goal-goal axis. Xmax = (I think?) 42.5, Xmin = -42.5, Ymax = 100, Ymin = -100. Note that a hockey ice rink is a rounded rectangle; the coordinates (100, 42.5) do not exist. By my research the goal lines should be 11 feet from the ends of each side of the ice, which would be centered at (89,0) and (-89,0), but when I look at coordinates of goals scored right at the goal line, they appear to be more like (84,0).]



 I am attaching a sample JSON file for each endpoint to this post, and of course a few RapidMiner processes to get the data in a nice format.


Until next time...








Sign In or Register to comment.