Supercomputing at the Edge Takes Wing at SC14 Show in New Orleans

by George Millington

There’s nothing remotely like it at the SC14 supercomputing show in New Orleans.

A supercomputer the size of a window air-conditioning unit, it glows an otherworldly orange. And it points the way to the future of supercomputing at the edge of the network, that is, the point where data gets generated.

That’s a future where every mobile-phone cell tower instantly processes data as it’s being uploaded or downloaded. A future where  a compact cluster on a drone or airplane instantly analyzes data on weather, troop movements or the impact of a natural disaster – without the plane needing to return to base to download its findings.

Orange Silicon Valley has built a compact supercomputer using NVIDIA Tegra K1 mobile processors.

“Think of this as landscape computing,” said Soumik Sinharoy, of Orange Silicon Valley, the innovation arm of telecoms giant Orange, which has developed the system together with Silicon Valley startup Reneo, and Echostreams, a systems platform provider. “This is computing on the very edge of the network, where the data’s being collected, even if it’s being collected far away.”

At the system’s heart are 96 NVIDIA Jetson TK1 embedded-computing development kits, built on  NVIDIA Tegra K1, the world’s fastest mobile processor. The processor’s GPU is based on NVIDIA’s Kepler architecture, which is used in many of the world’s fastest supercomputing clusters that are the focus of the annual supercomputing conference.

This system can offer one petaflop of computing power per standard rack outfitted with 4,000 TK1s per rack unit. And it can provide twice the efficiency of the world’s top supercomputer, China’s Tinanhe-2, when upgraded with ultra-low latency RapidIO fabric. That joins multiple nodes at up to 16 gigabytes a second, with far greater speed and efficiency than enabled by PCI Express or Ethernet technologies.

Orange, along with Reneo, showed a demonstration of image-classification using deep learning running in near real time on the GPU cores in the Tegra K1 processors. This kind of high-performance, compact image classification system could one day be a standard for mobile supercomputing.

The system’s ability to process one terabyte of data in under six minutes is enabled by a robust parallel runtime environment, plus a hardware stack comprising 96 terabytes of POSIX-compliant parallel storage, 30 teraflops  of peak computing capability and a cross-section bandwidth of 96 gigabytes a second. And it consumes less than 1,400 watts of power—about that of a window-fit air conditioning unit.

Jag Bolaria, an analyst at the Linley Group, sees huge potential for the system.

“By integrating a large volume of low-power GPUs in a server rack at scale, this industry first creates a clear path to massive cloud-based clusters for analytics and gaming. This achievement means developing large clusters with low latency and massive scalability is finally possible. This architecture delivers—in an energy- and latency-efficient manner—remarkable computing horsepower ….” he said.