Last week, when I landed in New Orleans heading to the SC14 Supercomputing conference, my phone’s Google Now application told me about the local weather, and I searched for a good restaurant using voice commands in the taxi coming in.
Given where I was headed, it felt appropriate that these were running on some of the world’s largest supercomputers. At the show itself, our booth’s mini-theater featured three days of talks by world-leading experts on supercomputers – and their use for weather and climate modeling, object recognition in images, and drug discovery.
Few topics generated as much interest, though, as machine learning applied to big data. Nearly a fifth of our booth talks focused on how computers train themselves to identify objects, images, signals and data patterns. And given the field’s enormous potential – virtually every major web-services company hiring away leading researchers in droves – that’s almost certain to rise in the years ahead.
“I honestly believe that for the next several decades, there will be more and more people applying machine learning to “x,” which will generate huge economic value,” said Bryan Catanzaro, a senior researcher at Baidu, the Chinese search giant, in his packed half-hour talk at the booth.
Supercomputing – specifically GPU-accelerated supercomputing – could prove to be central to machine learning because it relies on rapidly processing massive data sets.
These sets can easily exceed an exabyte of data – equivalent to 50,000 years’ worth of DVD-quality video – such as the work being done by Shoou-I Yu, a researcher at Carnegie Mellon University, one of the speakers at our booth. His team has spent four years working on enabling ultra-fast search on video so that you could one day find scenes of a specific individual on YouTube with just a few clicks.
That task is staggering, given the scope involved. Yu noted that a hundred hours of video are uploaded on YouTube each minute of the day. But he and his team have developed tools that are now able to scan for specific objects in 8,000 hours of video and produce results in near real time.
The applicability of machine learning extends far beyond the digital realm.
Jonathon Cohen, a machine-learning specialist at NVIDIA, described work being done by researchers at University of California, San Diego, who are beginning to use GPU-accelerated machine learning to help map the distribution of coral species across the ocean floor.
In the past, the only way this exceptionally complex task could be approached was for marine biologists to pore over images of reefs, sift through what’s coral and what may be sand or algae, and then identify specific types of coral based on appearance. When done by hand, scientists could only label one or two percent of the images from coral reefs. Today, computers can do 60 percent of the images, with only 5 percent less accuracy than scientists. And the UCSD scientists are on their way to increase this to 90 percent.
Another real-world example discussed at the NVIDIA booth this week was work focused on training computers to detect breast cancer by determining the rate of mitosis – the splitting of chromosomes – within tissue samples.
Dan Ciresan, a researcher based at IDSIA, a Swiss artificial-intelligence lab, described how his team’s work in this area using deep neural nets – which has captured a half dozen international competitions – is already better and much faster than humans on many types of difficult medical image assessments.
Other experts on the subject – from Microsoft Labs, among other organizations, described their own approaches to using machine learning. But despite their range of topics, all agreed that machine learning is just beginning to take hold and that it will surely loom larger still at next November’s SC conference in Austin.