Live: Jen-Hsun Huang Kicks Off NVIDIA’s 2014 GPU Technology Conference

by Bob Sherbin

NVIDIA CEO Jen-Hsun Huang kicks off our annual GPU Technology Conference in San Jose, California this morning.

Stay tuned, we’ll be live blogging throughout the event, starting at 9 o’clock Pacific.

Hit refresh on your browser for updates.

Live streaming video by Ustream

10:50 a.m. –  One of the greatest games is Valve’s Portal, a fantastic puzzle game. Jen-Hsun describes the game. We’re so excited that Valve has ported it to SHIELD. I’m so excited about it, that I’m giving every GTC attendee a free SHIELD today. This draws lots of applause. Jen-Hsun thanks the crowd.


10:48 a.m. – Jen-Hsun thanks Audi and he’s back on stage, along with the car. “Tegra K1 is the brain of your future self-driven car.”

He now recaps the speech. There were six announcements: Pascal, the next-gen architecture for NVIDIA procesors: TITAN Z, NVIDIA’s fastest-ever GPU; VMware using NVIDIA GRID to make virtualized enterprise computing widely available; Iray VCA – world’s first iray turbocharger, which can do away with the creation of physical prototypes: also, Jetson TK1, the world’s first mobile supercomputer for embedded applications like robotics.

These announcements are all based on one unified architecture called CUDA. Wehave CUDA in PCs, in clouds, in mobile. The leverage that our developers can get is really, really wonderful. Same program works everywhere.

I have one last announcement, this one is just for fun…One of the greatest games of all time.

10:44 a.m. – Jen-Hsun queues up a self-driven car, an Audi A7 that drives across the stage, it turns slightly toward the audience. “It’s ghostly actually, there’s no one inside. Is there a short person inside, no. Is there a short person in the trunk?”Jen-Hsun asks. It turns out the trunk is empty. A couple years ago, a self-driven car required a trunk full of computing equipment. Today, there’s a tiny computer module, the size of a thick Android tablet that’s powering the car forward.

10:42 a.m. – At CES, Audi execs said that they needed supercomputers in their cars so they could be self-piloted. The next day, they announced that Audi and NVIDIA were partnering to develop Audi’s first self-driven car. Jen-Hsun welcomes Audi’s Andreas Reich, head of Audi Pre-Development, who comes to the stage.

Jen-Hsun: Before we talk, let’s show people what’s required for self-driven cars. It’s a video of a car humming along a parking lot in what appears to be in Germany – the cars, at least, have very long license plates. The car has a camera that’s looking sideways as it rolls forward. The camera converts images into 3D space so it can tell which cars are closer to it and which are further from it.

10:38 a.m. – Jen-Hsun now talks about VisionWorks, a library of tools for Computer Vision, which enables driver assistance, computational photography, augmented reality and robotics. Next generation Tegra is codenamed Erista, it will be based on Maxwell, even more energy efficient, even more capable. The Erista name continues our long tradition of superhero codenames. One of the things Erista will be used for is cars….

10:36 a.m. – Jen-Hsun says that from lots and lots of 2D images, we can reconstruct the 3D world. Researchers have been using PCs with CUDA to do computer vision. Now it can be mobile – with robots, cars that can run CUDA programs directly on the move. Jen-Hsun unveils Jetson TK1, the world’s first mobile supercomputer for embedded systems. It has 192 CUDA cores, runs 326 GFLOPs, and includes VisionWorks SDK. It’s being sold for $192, or a dollar for every core. “I’m just happy that the marketing department loves the symmetry of numbers more than it loves money,” he says.

10:32 a.m. – Next up: Mobile CUDA

Jen-Hsun: This is easy to say, but really, really hard to do. Kepler will help us. Kepler architecture with CUDA now powers 10 of the world’s 10 greenest supercomputers. We’ve brought this to mobile with Tegra K1, the world’s first mobile superchips, which we introduced at CES. This has made it possible for us to unify Kepler architecture across everything we do. It’s not a cliché; it’s a supercomputer on a chip. It has the DNA of the U.S.’s fastest supercomputer, TITAN, at Oak Ridge National Laboratory.

One of the things that this will open the door to is Computer Vision on CUDA for mobile. This enables cars to detec features on the road, to track pedestrians so it could avoid them, and to interpret 3D scenes.

10:29 a.m. – “Partnering with VMWare makes it possible for us to virtualize the enterprise end to end,” Jen-Hsun says.

10:29 a.m. – Jen-Hsun and Ben are chatting on stage and talking about how VMwares’ Horizon desktop-as-a-service platform will be powered by GRID.

Ben says while virtualization has come a long way, graphics-rich applications are very hard to virtualize. But we want to virtualize these for millions of desktops.

The Horizon DaaS solution is available today. Navisite will be first service provider to deliver this. Integration with GRID that we’re working with vGPU will be available in Q3 2014, with general availability in 2015.


10:25 a.m. – Next up: GRID, which is putting the GPU in the cloud, so it gets virtualized and can be shared. That required solving many, many probems But there are now hundreds of tests taking place around the world where organizations are using graphics-rich applications from the cloud, streaming it as if it were Netflix. Today, we’re announcing that the company that brought enterprise virtualization to the world, VMWare is here to join us. We have a common vision of not just virtualizing from within the data center but all the way out to the client and application. Because of GRID, we can virtualize the enterprise end to end. He introduces Ben Fathi, chief technology officer of VMware.

10:21 a.m. – Jen-Hsun now shows a rendered image of the Honda on the stage itself. It looks like a photograph but the car itself can rotate, which would be helpful for a designer. One application for this would be to sell cars – especially expensive ones that can’t be delivered to a dealership.

10:19 a.m. – The 19 Iray VCAs that delivered this is 40X more power than a Qaudro K5000 workstation, and 60-70X that of a CPU-only workstation. Iray VCA would cost $50K, or one-sixth what an equivalent number of workstations would go for.


10:17  a.m. – Even though the car’s sliced in half, the shadows are realistic. The supercomputer, now goes inside the Honda and you see the car with stunning photorealistic detail.

10:16 a.m. – A Honda executive, Ide-san, introduces an image of a real production car. It’s a rendered image which he spins around and it resolves without any lag. Every frame of the image simulates photos as they bounce off the real world – as they bounce into small nooks and crannies of the car. Everything in the scene lights everything else in the scene. You see shadows in seams of the door, the crystalline nature of headlights comes through. It simulates billions of light beams. This demo is being run on one petaflop, which was equivalent to the world’s fastest supercomputer six years ago. Now, he shows what looks like a CAT Scan of the car – it’s getting sliced length-wise, as if it were by a bread-cutter. Six years ago, this would have cost $500 million to carry out.

10:12 a.m. – Now, Jen-Hsun focuses on photo-realistic graphics. He shows two images of a hip urban loft. One is a photograph, the other is computer generated. It’s virtually impossible to tell the difference between the two. This level of capability has amazing applications, whether for design, advertising, architecture. But until now, it’s way too slow. Jen-Hsun now introduces Iray VCA – world’s first first scalable iray appliance. This wildly accelerates iray, and you can stack as may VCAs if you want. You can put together amazing scaling capabilities. Our goal is to hook up design applications that now use iray so they can use Iray VCA.

10:09 a.m. – Now, he shows a still fight scene with the robots sprawled on the floor, without the lighting put in. it’s flat, without character. It’s clear that lighting makes all the difference.

10:07 a.m. – To put it all together, Jen-Hsun shows a game scene from Unreal Engine 4 which integrates our GameWorks libraries. He shows a scene generated in real time of robots fighting violently. It’s terrifyingly real.  It looks like a scene from a 1970s subway station, as if it were filmed, but with robots fighting. Wouldn’t be surprising to see the young Bruce Willis walk out. That’s how real it is.

10:05 a.m. – Next up is the world’s first real-time unified physics solver. Fluids, cloth, rigid bodies all interact together. Every single physics detail is coupled, one physics simulator effects another. It’s the world’s first unified physics solver in real time. Demo shows fluid interacting and a copper-colored lozenge and you can see how the flow of fluid is interrupted. It leaves a beautiful wake.

Next up is demo that shows a swinging rigid body interacting with green smoke. There’s a phenomenal amount of data being simulated in this real-time demo. It’s not about artistic design; it’s all simulated on the GPU.

A third demo shows smoke interacting with a wooden floor, in a way that grid-free. No artist manipulation is necessary, which radically cuts back on the cost of creating cartoons.


A fourth demo shows 32 million voxels, or 3D pixels. Each contains information on the fuel, the amount of density and temperature. If the temperature reaches a certain temperature, it ignites and emits light. Once engineers describe the scene and write the algorithm, it just appears. Demo shows fire enwrapping a sphere, which gets gorgeously entangled in the fire.


9:59 a.m. – What can TITAN Z do? Thirty years ago, Pixar launched the world into CGI film. It took an hour and a half to run each frome iin its early film, Luxo Jr. It looked natural and beautiful. But now, 30 years later, the work by a studio called Rhytm and Hues is far more amazing. They’ve done state-of-the-art water simulation. Computer graphics wasn’t enough. We needed simulation. Computers have advanced a million times since Pixar started but rendering still takes a very, very long time to simulate things like the ocean, foam, splashes. Just as we launched the programmable shader and demo’s Luxo Jr in real time, now we’re showing. Simulation and computer graphics at the same time.


They show a scene, like “Life of Pi,” with a magical simulated ocean, with jelly fish glowing and a massive whale that jumps out, spashes and placidly swims away. It’s sufficiently real that you feel like you’re getting wet.

9:53 a.m. – Jen-Hsun now shifts to graphics. You can’t come to an NVIDIA event and not hear about graphics. He recounts the success of the TITAN GeForce GPU, which is basically a supercomputer you can buy at retail. It’s being used not just by gamers but by designers, artists and scientists. But where do we go from here? He shows an awesome, bone-shaking video of a new GPU that sees to assemble itself. He introduces a new GPU GeForce GTX TITAN Z, which has 5,760 CUDA cores, 12GB memory, being sold for $2,999. This processor can fit the entire Google Brain in just three TITAN Z cards.


9:50 a.m. – The amount of machine learning work being done worldwide is hard to keep up with. It’s predicted that this will be the most important advancement in computers in many years. The new capabilities it brings will be shocking. At GTC there will be companies, researchers worldwide presenting their work – IBM, Flickr, Facebook, DARPA, Russia’s Yandex, Japan’s Denso, China’s Baidu. This will move us toward a universal translator that will let you speak into a phone and have it translate into another language in YOUR OWN VOICE.


9:47 a.m. – Now, Brian shows a neural network that’s much deeper, that was trained on seven GPUs running for two weeks, it took 25 exaflops to train the computer on many natural data sets. One of the things it’s good at is differentiating breeds of dogs. NVIDIA had asked GTC attendees to tweet in pictures of their dog before the show started. So, now Brian is going to see what the classifier thinks is the breed of dog that various people brought it. First, it classifies a Dalmatian correctly, then it recognizes a vizsla correctly. Then it recognizes a German Shepherd.

9:44 a.m. – But a team at Stanford worked on a new way to simulate Deep Neural net on GPUs. The work that was done before at a cost of $5M has recently been done by a team of Stanford researchers for $30,000. And it used 100x less energy, or 4 kWatts. Any company big or small can now do Machine Learning the way Google did.  So Brian, an NVIDIA research analyst, is going to show us some examples of how a neural network learns.

He teaches the computer to recognize objects associated with NVIDIA and with Ferrari. One is largely green, the other is largely red. So he starts feeding in images, but just a dozen or so. If it was random chance, computer would get it right half the time.  he now feeds in far more images, closer to 72, now there’s a better classifier that gets it right almost all the time.

Now, he loads up objects that could confuse the computer. Brian puts in a green Ferrari, the computer thinks that it’s an NVIDIA product. Then he feeds in a green, white, red Italian flag, which it thinks is a Ferrari.

9:37 a.m. – Turns out that it saw a lot of faces. And a lot of cats. Jen-Hsun shows images of a face and a cat look like to this three-day old computer. They’re vague, fuzzy, slightly discernible. But it’s a start. So the Google brain took 1,000 CPU servers, 2K CPUs, 16K cores and it consumed 600 kWatts and cost $5M. But a billion connection is the number of synapses of a honeybee. So, it takes three days to train a honeybee to recognize two things. So, to extrapolate this to the human brain, we have 100 billion neurons, with 1,000 connections. If we trained the brain with 500 million images, all this compounded. If we modeled something as big as the human brain on the Google brain, it would take 40,000 years to train the Google brain to do similar work.

9:32 a.m. – Machine Learning is how computers teach themselves with data. And wWe’re surrounded by data now, torrents of it. There are many fields of ML. One is Deep Neural Nets, which is inspired by biology. Neurons recognize edges. When we look at things, the appropriate edge neuron lights up, so several edges turn into features to recognize ears, eyes, noses, subway cars. Before you know it, you have a neuron that recognizes a face. Edges become features, features become objects. Machine Learnings scientists call this Object Recognition. Teaching a Neural Net to recognize images is a huge challenge. Recently, a team from Google built a computer with 1,000 servers simulating a brain with 1 billion synapsis. It was trained by 10M, 200×200-pixel-size images, It took three days. This was unsupervised, which is amazing. It trained itself basically by watching YouTube for three days.


9:27 a.m. – So, what will we call this chip that incorporates 3D Memory and NVLink? We’re going to name it after a scientist we recognize – Pascal – who invented the mechanical calculator in the 17th century. Jen-Hsun quickly runs through a series of insights and theories that Pascal came up with before dying at the age of 39. There’s applause as he describes our next-gen Pascal module, which will be the heart of next-gen supercomputers, workstations and gaming PCs, and cloud supercomputers. It’s one-third the size of a PCIe card. Pascal will allow NVIDIA to scale with Moore’s law. What will we do with all these flops, with a processor that’s a supercomputer the size of two credit cards. One things is Machine Learning.

9:23 a.m. – The GPU has a lot of pins, it’s the biggest chip in the world, interface is extremely wide. Can we go wider? It would make package enormous. Can we go faster? Uses too much energy. So, our next-enabling technology is 3D packaging. We’re going to build chips on other chips. It starts with a base wafer where interconnects are done on the wafer – thousands of bumps on these chips are flipped and bumped onto base wafer. Memory interfaces went from hundreds to thousands of bits. We stacked all the memory chips on top of each other and punch holes through them. The stacked DRAM is stacked on a wafer that sits on a substrate, which has wires that connect to the GPU and together form an interface that delivers an unbelievable amount of bandwidth, which will grow 5X over the next two years. And it will operate at 4X the energy efficiency.

9:20 a.m. – First technology we’ll announce today is an important invention called NVLink. It’s a chip-to-chip communication channel. The programming model is PCI Express but enables unified memory and moves 5-12 times faster than PCIe. “This is a big leap in solving this bottleneck,” Jen-Hsun says.

9:15 a.m. – Jen-Hsun describes the importance of tomorrow’s Emerging Companies Summit, which is a breakout session for startups using GPUs. There will be more than three dozen companies at it talking about how they want to change the world.

9:13 a.m. – The span of work at GTC is astounding, Jen-Hsun says. In 2010, the focus was high performance computing. In 2012, the focus was on energy exploration, life science and molecular dynamics. This year, the fastest-growing topics are big data analytics, machine learning and computer vision. We now encompass quantum levels, atomic levels, molecular levels, up to planetary levels.

9:08 a.m. – Some very cool scenes from the movie Gravity, which was made with GPUs that helped show people Sandra and George swirl through space.


9:05 a.m. – Video starts. Black room. Lots of volume, scenes of self-driven cars, anti-missile technology, swooping triangles, medical research.

8:55 am – It’s standing room only. We’re getting to 9am Pacific. Jen-Hsun’s sharply prompt, so it shouldn’t be long.

8:50 am – This is biggest GTC yet. Folks are here from 50 countries. There are 170 mind-aching research posters outside. And 550 breakout sessions. They’re by companies like Pixar, Google, Facebook. By university researchers. Folks from national laboratories. By startups working with GPUS.

8:45 am – A peek at the preparations, backstage, below.



For more details on Pascal and Stacked Memory see this Parallel Forall post by NVIDIA Senior Director of Architecture Denis Foley.