When Deep Learning Met Vantage Data

Published 12/27/2015 by Philip Maymin
  • The New Data Sports algorithm for pick forecasting applies deep learning methodology to Vantage data.

  • The algorithm made in-the-money picks for all of last season as well as this season so far.

  • Vantage Sports Chief Analytics Officer Philip Maymin explains how it works.

Here at Vantage Sports, we know we are sitting on the most revolutionary and complete basketball data set there is. The hardest part isn’t figuring out what to do with the data; it’s what to do with the data next. While we are helping basketball fans and daily fantasy users this season with our V+ Matchup Analyzer (as well as a couple new tools, Nearest Neighbors Search and DFS Roster Assistant, launching January 2016), we have also been working with New Data Sports in developing an algorithm for pick forecasting that was in the money all last season and has remained in the money so far this season.

A consistently in-the-money algorithm essentially turns your book into your personal stock market with a positive expected rate of return. Here’s how it works.


If you’ve ever played or watched basketball, you know how important it is to contest the shot—to get your hand up in the shooter’s face. But check any NBA box score on any site, or any play-by-play, and you won’t find any indication anywhere, for any shot, whether the defender had his hand up or not.

Vantage Sports has this data. For every shot. By every player. On every team. In every game.

And that’s just one of hundreds of unique metrics that only Vantage Sports tracks. Others include whether a pass was made to an open shot, whether the shooter made it or not, because the passer should be rewarded for making the correct pass regardless of the bounce of the ball. Vantage also tracks active pressure, not just proximity, on the perimeter, on sidelines, and on inbounds passes. Rebounding efforts and opportunities. Screen offense and defense. Did they hedge? Did they do a hard show? Did the ball handler split the screen? Closeout opportunities. And dozens more.

Vantage Sports has data that no one else on earth has. And New Data Sports has the exclusive right to use that data to make and resell predictions. If anyone else claims to base their predictions on Vantage data, they aren’t telling the truth.


Data by itself doesn’t make a decision or recommend a direction. It needs analysis and backtesting to be useful.

There’s a subtle issue here. How do you know if your analytics is really using your data? Maybe your strategy is just to always bet on the home team, or the favorite, or the one with prettier colors.

Looking at performance is important, but it is not enough. Why? Because there are so many companies out there offering picks. For every hundred or so that try, maybe only a handful have historical and statistical success, so those are the ones who put up a website. But their results could still just be due to random chance, and have nothing to do with any real data.

How can we address this problem?

Ideally, you’d like to take one group of smart people and give them access to all publicly available data. Then give another group of equally smart people access to all that data, plus the Vantage Sports data. And see which group comes up with a better strategy.

The problem with that ideal is different people can have different insights. You’d really like to clone one smart person and give him and his twin the two tasks. And keep them in similar rooms. With the same food and drinks and all other aspects of their environment. It’s just not feasible.


Instead of using smart humans, we can use smart machines. Deep learning is a remarkable and quite recent method of machine learning. Deep learning has been used to identify pictures, recognize human speech, and perform many other feats that previously only humans excelled at.

So what if we use deep learning instead of smart humans? That’s exactly what we did.

Run one deep learning algorithm with the publicly available data. Run it again with the Vantage data. Then, see if there is a difference.

Each deep learning attempt will try to do the best it can with the data it has. So there would be three possibilities:

1. Neither attempt can make any money at all. Coin flips are unpredictable no matter what data you have, so if the wagering markets are perfectly efficient and already somehow incorporate all this information, and betting is no different than flipping a coin, then both attempts should fail, regardless of what data they use.

2. We can make money from just the publicly available data. No need for Vantage data.

3. We can’t make money from just the publicly available data, but we can if we also use the Vantage data.

The third possibility turned out to be the truth. Starting with an initial bankroll of $5,000, deep learning was unable to make money using just publicly available information. About the best it could do was predict accurately 49% of the time, essentially a coin toss. But because typical wagering involves winning $10 per winning bet and losing $11 per losing bet, your bankroll dwindles to just $1,700 by the end of the season.

What if we also use Vantage Sports data? Then starting with the same $5,000 initial bankroll, the deep learning algorithm was correct about 54% of the time, and you ended up with $6,500 at the end of the season.

In other words, we learn two things: (1) the strategy works and (2) the data matters.


The wagering market is different for different people. If one company just tried to wager, the market would react, and it would be difficult to monetize. But if a few people are able to profit here and there in their local communities from the picks, then the market won’t move as much, and every subscriber could, in principle, win.

But it is for these concerns that New Data Sports has reserved the right to restrict the number of subscribers per geographical region. If your region is already fully subscribed and blocked, then we apologize.