by Kevin Krewell

Yesterday was the first day of the GPU Technology
Conference, or GTC. There was a lot to cover and digest and the NVIDIA blogging
team did a great job reporting the action on the GTC Blog. The top story was
that NVIDIA introduced and demonstrated, for the first time, its next
generation of GPUs, code named Fermi. To accompany the chip, NVIDIA also
delivered the Nexus integrated CPU and GPU development environment – a key
developer tool for GPU Compute.

The conference itself is bustling with activity. The morning
introduction sessions were booked to capacity. The keynote by Jen-Hsun Huang
filled the main room and ran well into the overflow areas. And while other
conferences are suffering from significantly low attendance, this one was
bursting at the seams. There is no question that GPU Compute is a hot topic and
a strong audience draw. And Fermi was the proof that we’re on the verge of a
major leap forward in computing.

Michael Diamond has listed the Fermi highlights:
three billion transistors, 512 cores, ECC support throughout, C++ programming
model support, 8x faster double-precision floating point (with full IEEE
654-2008 compliance and fused multiply-add). Think about it: there has not
before been a GPU with system level ECC.
Or a L2 cache.  There has
never before been a single chip with three billion transistors before – Intel’s
latest Nehalem-EX processor is “only” 2.3 billion transistors. The first Fermi
chip is seriously pushing the limits of chip design and production in order to
provide the most computational resources possible on one die. It is a monster
project, designed to produce monster results.

Fermi and GPU Compute: What does it mean and why does it

During the keynote, Jen-Hsun used a number of demos to
show the potential for GPU Compute. Some were fun, such as creating realistic
physical reaction on games (throwing rag dolls at destructible walls); another
was 3D Stereoscopic videos (including a live 3D video of Jen-Hsun
himself).  A third showed how GPUs
can enhance the processing of ultrasound recordings for breast cancer
detection. And the national research lab at Oak Ridge endorsed GPU Compute for
a future multi-PetaFLOPS supercomputer to solve massive problems such as global
carbon emissions modeling. The bottom line is that GPU Compute is finding
real-world problems that can take advantage of the massively parallel computing
provided by GPUs, and we’re only just started. This field of CPU Compute only
started to gain mainstream support about two years ago.  It’s still a very new field, but
interest in it is growing faster than even the GPUs themselves. Fermi is the
next step in that evolution.

There were white papers available on Fermi from three respected
technical analysts and two top professors on parallel computing that, though
were commissioned by NVIDIA, still represent the top independent thinkers on
GPU Compute. Tom Halfhill, of Microprocessor Report, says “they are taking the
largest step yet toward becoming equal-partner coprocessors with CPUs.”
Halfhill highlights in one chart that NVIDIA GPUs have grown from 128 cores in
2006, to 512 cores in 2009. Nathan Brookwood, of Insight64, says with Fermi
“NVIDIA will finally have assembled all the pieces it needs to solve its GPU
Computing puzzle.” Peter Glaskowsky, former editor in chief of Microprocessor
Report and now an independent analyst, calls Fermi the “world’s first complete
GPU computing architecture.”

Also, as part of the GPU Compute’s ties with university
research, NVIDIA has many posterboards from engineers, researchers, and
scientists all over the world describing projects like this one: “Optimized
CUDA Implementation of a Navier-Stokes Based Flow Solver for the 2D Lid Driven
Cavity.” These particular researchers got up to 13x improvement over a quadcore
Xeon processor and they have more performance optimizations planned.

Frankly, calling Fermi a GPU is a disservice to all the
extra capabilities of the chip architecture. But there’s not an industry wide
consensus on what to call a massively-parallel, server-class compute engine
derived from GPU technology. While there have been specialize function designs
from eager start-ups promising supercomputer performance, chips like Fermi still
have what Jen-Hsun would call “a day job.”

NVIDIA can leverage this high-volume graphics day-job to
drive down costs, drive up yields and volume, making these
supercomputer-on-a-chip affordable and attainable. In addition, NVIDIA offers
the business stability of a company that has been around for 16 years.  It’s very similar to how Intel migrated
x86 CPUs from PCs into the server market. The x86 microprocessor was not
originally designed to compete with server processors, but though improvements
in the system and chip design eventually it can dominate the volume server
business at a much lower cost than using dedicated server processors. It’s the
volume PC markets that are driving innovations and cost-effectiveness that
enterprise systems can leverage – the enhanced GPU is the next big thing in
enterprise and personal computation. There are also aspects of Fermi that will
significantly enhance its usefulness in cloud computing and virtualization –
two other essential enterprise technology vectors.

I talked with one analyst who said he was attending the
GTC because he wanted to see the future of computing systems so that he could
properly advise his clients on where to invest their resources. He recognized
that developer support is the key to extending the reach of GPU Compute to as
broad an audience of programmers as possible. That is why NVIDIA has invested
in CUDA C, OpenCL, Direct X Compute, and the Nexus integrated development
environment for Microsoft’s Visual Studio. Any program that can benefit from parallelization
and can benefit from running faster, needs to be running with GPU acceleration.
That analyst was very impressed with what he saw on Wednesday and he realized
that Fermi is leading a new wave of high performance parallel computing.

So remember the date: Sept. 30, 2009. It is the day
NVIDIA showed a GPU architecture that was fully enterprise ready and Fermi’s
enhanced software support made it a true processing peer to the CPU.