CUDA Comes to the Cloud

by Steve Wildstrom

Two of the hottest trends in information processing today are cloud computing and general purpose computing on GPUs. Both provide users with vast amounts of computing power, great flexibility, and very low costs compared to traditional solutions. Now they are coming together in what we can call CUDA in the cloud.

Cloud computing has become a somewhat vague term because of a tendency to use it to refer to any sort of network-based service, but I am using it in its original sense of the supply of on-demand computing through the provision of virtual servers. Leaders in this business include Amazon Elastic Cloud Computing, Rackspace, or Microsoft Azure. If you need an extra server for a Web site or database, just contact your cloud computing provider and you can be up and running in minutes.

For the most part, cloud service focus on plain-vanilla Linux or Windows servers. Since they are generally run as headless servers accessed only through a remote desktop, they featured little if anything in the way of GPU capabilities. But add a high-end GPU—or several– and drive that hardware with general purpose programming on GPU techniques such as CUDA and you can quickly get a low-cost on-demand supercomputer for big, computationally intense jobs without the capital expense and administrative complexity of running your own high performance system.

Penguin Computing, a Fremont, CA, company that sells systems such as the NVIDIA Tesla-powered Tesla 8 TFLOP compute cluster is adding GPU computing capability to its Penguin on demand (POD) cloud service. Its GPU compute nodes consist of a system with two quad-core Intel Xeon processors, 24 GB of memory, and three Tesla C1060 Computing Processors. That’s 720 GPU processors and 12 GB of on-card memory, available by the hour.

Penguin has also started offering high-performance processing based on the new RealityServer, a software platform developed by NVIDIA and its subsidiary, mental images, designed to create high-quality 3D images for the Web. “RealityServer is a powerful web platform that enables the development of interactive, incredibly photorealistic 3D web applications, which can be compute-intensive,” said Tom Coull, general manager of Software and Services at Penguin. “With POD – which is now optimized specifically for GPU compute workloads – mental images’s RealityServer users have access to a cost-effective, extremely powerful resource on demand, to help them innovate faster and without concern for compute infrastructure limitations.”

We usually think of cloud computing as a pure online experience, but bandwidth constraints can make it impractical to move the vast input and output data sets sometimes used in high-performance computing over the internet. For moving more than 250 GB of data, POD offers a disk caddy service that moves information in 2 TB chunks via overnight air shipping.

Peer1 is a long-established hosting company with data centers across the U.S., in Canada, and Europe. It now offers two levels of cloud-based CUDA computing based on the NVIDIA s1070 GPU computing server or the m2050 GPU compute module. The Peer1 CUDA cloud also supports RealityServer software.

Hoopoe is an Israel-based project designed to build cloud-based GPU computing systems based on Tesla GPUs. It’s still in preliminary (alpha) testing, but is running a variety of services including the execution of CUDA programs and video transcoding on demand. One goal of Hoopoe is to provide a big performance boost to distributed applications based on Microsoft .NET technology though extensions called CUDA.NET.

Sabalcore Computing, an Orlando-based provider of on-demand high-performance computing on Linux servers, also offers servers with Tesla GPUs and support for CUDA.

These services are likely to become much more widespread. In the past, when GPUs were thought of purely as display adapters, no one thought very much about the graphics adapters in data center servers since they were almost never called upon to do more than serve up a console screen. But the realization that GPUs are a tremendously cost-effective way to boost performance, that is beginning to change rapidly. For example, IBM recently announced a version of its System x iDataPlex server that couple dual Xeon CPUs with a pair of Tesla 20-series GPUs. “NVIDIA provides an innovative solution for customers who push the envelope in high-performance computing,” said Dave Turek, IBM vice president for deep computing. “GPU acceleration provides performance boosts for many applications in energy exploration, science and financial services. It is among the significant emerging supercomputer technologies to watch in the years ahead.”