World Series Special: Hitting it Out of the Park with Deep Learning

The AI technique may soon predict some baseball plays and reveal new details about player behavior
by Jamie Beckett

If there were a World Series of big data baseball analytics, Claudio Silva would be in the starting lineup.

Silva, a professor of computer science, engineering and data science at New York University, co-developed the game-changing metrics engine in pro baseball’s Statcast tracking and statistics system. By tracking every movement of every player and the ball throughout the game, it’s changed how coaches evaluate and train players and how fans watch the game.

But Silva is swinging for the fences. He’s now using GPU-accelerated deep learning to reveal minute details of player behavior and game patterns, which has the potential to revolutionize how coaches manage players and plan strategy. It could even give them the ability to make predictions about some aspects of the game.

“For a given pitcher and batter, we can figure out the most probable locations the ball will go in the field,” Silva said. That would allow players to position themselves in the best place to field the ball.

Deep Learning Steps up to the Plate

Coaches could use the deep learning tool to detect when players are reacting more slowly or playing less effectively, and use that information to avoid player injuries, said Silva.

The same type of analysis could help coaches determine the best matchups, decide which throws by a pitcher are most likely to result in a hit, or compare two players with similar styles but very different pay packages, he added.

“The more coaches know about player behavior, the better they’re able to make informed decisions about when and how to play them and in what capacity to use them,” he said.

Statcast: Big Data Baseball Analytics

Ever since 2002, when Major League Baseball’s Oakland A’s adopted Sabermetrics — the focus of the book and movie “Moneyball” — advanced statistical analysis has played a large role in managing and selecting players. Statcast added big data and machine learning, making it possible to track things that weren’t measurable before, especially the performance of outfielders.

Silva was able to make the leap into deep learning because he’s armed with GPUs and Statcast data from nearly 1.5 million plays collected over two seasons.

The Statcast database, created and maintained by MLB Advanced Media, contains what is likely the world’s most detailed sports database. Each play has detailed textual descriptions, video clips, outcomes and positioning of player movement.

Deep Learning Heads for the Big Leagues

Silva and co-developer Carlos Dietrich, a consultant to MLB Advanced Media, want to go beyond current measurements, which average samples taken from a player’s’ movements into a single measurement.

Their goal is to capture and analyze detailed player movement. The most common way to do this would be install more cameras, and more complex computer vision technology, but it’s a costly option.

Instead, the NYU research team aims to use deep learning technology by coupling the Statcast data with detailed human movements acquired with motion-capture systems. Training deep learning networks with such complex, visually rich information is not feasible on CPUs, said Silva.

The team is using our DGX-1 AI supercomputer — recently acquired by NYU’s Center for Data Science — which provides the deep learning computing performance equivalent to 250 conventional servers.

Silva hopes to have the DGX-1-powered deep learning tool ready in time for next season.

“This is a game that’s been played a long time,” he said. “What surprises me is that we’re still able to make it better by using technology.”