Four Amazing Things Carnegie Mellon Is Doing With CUDA

by Chandra Cheij

Carnegie Mellon University (CMU) has been doing some amazing things with CUDA. So we’ve named CMU as a CUDA Center of Excellence.

The recognition is a result of CMU’s ongoing work in parallel computing research and education using NVIDIA GPUs and the CUDA parallel computing platform (see “What Is CUDA?”).

The CUDA Center of Excellence (CCOE) at CMU will establish research collaborations with NVIDIA to realize new, high-impact applications in field robotics, high-throughput gene sequencing, spoken-language processing, signal processing and computer graphics.

The collaboration will also seek to advance the design of throughput-focused hardware and software systems by promoting and targeting GPU technologies in heterogeneous systems research at CMU and across the nation.

Located in Pittsburgh and Silicon Valley, CMU prides itself in creativity and leadership in computing education. Parallel computing is no exception. At CMU abstract parallel thinking concepts and hands-on parallel computing experience using GPUs are provided to undergraduate and masters-level students in Computer Science and Electrical and Computer Engineering. As a CUDA Center of Excellence, CMU will use equipment and grants provided by NVIDIA to support research and academic programs across both campus locations.

These projects include:

CMU “Icebreaker” Lunar Expedition

Carnegie Mellon University will land an unmanned spacecraft on the Moon in 2015. As part of the Icebreaker Mission led by Professor Red Whittaker of the CMU Robotics Institute, CMU’s Polaris will explore for ice at the pole of the moon and transmit high-definition video back to the world. In collaboration with NVIDIA, CMU researchers are exploring the use of high-performance GPU computing to assist mission realization by accelerating systems for computer vision, rover trajectory planning, and pre-mission simulation.

NSF PRObE: Enabling Accelerated Computing Research at Scale

The central challenge for the coming era of Exascale and Data Intensive Scientific Discovery will be the scale at which large computers operate. Unfortunately, today students and researchers rarely have access to clusters of more than a handful of machines or access to the latest, highest core-count computers.

To address this challenge, the National Science Foundation (NSF) Parallel Reconfigurable Observational Environment (PRObE) for Data Intensive Super-Computing and High-End Computing will provide a platform enabling researchers across the country to perform computer systems research at scale. In cooperation with the CCOE at Carnegie Mellon, NVIDIA is outfitting a PRObE high-core count cluster with 36 Kepler GPUs. This collaboration creates a modern, large-scale environment for future research in GPU accelerated-computing.

Accelerating Next-Generation Genome Sequence Analysis using GPUs

The massively parallel sequencing, or so-called next-generation sequencing (NGS), technologies promise an era of preventive and personalized medicine through low-cost and high-throughput genome sequencing. Realizing this potential depends on the existence of computational technologies that can process and analyze enormous amounts of sequence data quickly, economically, and in an energy-efficient manner.

Research within the CCOE at Carnegie Mellon led by Electrical and Computer Engineering Professor Onur Mutlu will develop technologies for high-performance NGS-based genome sequence alignment and sequence assembly by combining the benefits of new GPU-friendly software algorithms with the large processing capabilities of GPUs.

Accelerating Speech and Language Technologies with CUDA

Modern machine learning techniques, such as deep neural networks and graph-based semi-supervised learning, are powerful but generally too computationally intensive to be applied for large-scale speech and language processing tasks. Within the CCOE at Carnegie Mellon University, Assistant Research Professor Ian Lane and his group are developing methods to accelerate these and related technologies and are exploring their effectiveness for tasks such as large vocabulary speech recognition and context-aware spoken language processing.

Using CUDA-accelerated GPUs we have obtained 1000x speedup for signal processing tasks, 100x speedup for well-structured tasks such as Viterbi training and 10x speedups for complex tasks such as speech recognition. Within this work we have developed a novel Automatic Speech Recognition engine, HYDRA, that leverages both many-core GPUs and multicore CPU processors to efficiently perform speech recognition with very large vocabularies 3 to 5x faster than real-time.