Live: Jensen Huang Keynotes NVIDIA’s 2017 GPU Technology Conference

by Bob Sherbin

11:19 – We created a simulator that will enable robots to learn in a parallel universe before joining ours.

Jensen lets out a small breath and says, “That’s it.” Then he recaps all of his announcements. He talks about the rise of new era of computing on AI, he recounts Volta and Volta-based AI supercomputing system. He talks about TensorRT, and the Tensor Core. He talks about the NVIDIA GPU Cloud platform, and he talk about the Xavier DLA open source DLA – deep learning accelerator. He cites Toyota as a new partner of NVIDIA. And he talks about the Isaac robot simulator.

“That’s all I have for you today,” Jensen smiles, lets out a breath, and leaves the stage after more than two hours of steady news.

11:15 – Without breaking stride, Jensen segues into robots.

Robots, unfortunately, are hard to do. It has to sense the world, interact with the world and plan. With a self driving car, a specific objective is collision detection. With a robot collision detection is essential – your goal is to collide in the right way – how a kinetic object comes into contact with another object.

Jensen show how a robot, called Ada, is learning to shoot an orange plastic puck into a small net. It’s easy to do that. But what about teaching a robot to do surgery.


We need to create an alternative universe. It needs to obey the laws of physics and gravity if you chose. It needs to be visually photorealistic. It needs to have the ability to learn inside this universe. The one gap with the real world is it needs to operate at warp speed – it needs to move faster.

We created a new simulator called Isaac. It’s named after two Isaacs – named for physics pioneer Isaac Newton and Isaac Asimov, the science-fiction author.  It’s connected to the OpenAI Gym.

11:07 – Jensen begins to talk about DLA. This stands for a Deep Learning Accelerator.

You an create a custom ASIC and improve efficiency by a third. In the case of Xavier, we’ve created a processor out of a CPU, GPU with CUDA and DLA. This lets you combine general purpose architecture, and domain specific accelerators.  What’s great is we have an architecture that is both programmable, robust, super energy efficient and can run the entire software stack of self driving cars. We understand the entire pipeline and we have the software stack across the board.

It’s so incredibly hard to put it together, why don’t we accelerate the adoption, democratize deep learning for the trillions of devices that will use deep learning, Jensen says.

So, we’re open sourcing the Xavier DLA. The best engineers in the world are working on this. We’re going to take this, which some people call the TPU and we’re going to open source it. Early access in July. Full access in September, “Our goal is proliferation,” Jensen says.


11:03 – Jensen said he’s got a big announcement to make….

He reels off the way companies are using NVIDIA technology for self-driving cars, truck, self-flying planes.

Toyota has selected NVIDIA for its self-driving vehicles. As you know, they are one of the largest companies in the world. This is a company that is legend in so many ways…so many modern management systems were invented by this company. They’re working with us. Our two engineering teams are working to create their autonomous car and put it on the road in the next few years.


“This will be the architecture of their future production cars,” Jensen says.

He notes Toyota will use NVIDIA’s Xavier technology. It has 30 Tflops in a 30 watt package.

10:59 – Next Up: We’re going to talk about AI at the Edge.

He quickly segues into transportation, noting that AI is revolutionizing transportation. We need to find a way to augment truck deliveries, and automate so we can keep up with the Amazon Effect. Domino’s alone delivers one million pizzas a day,

Our environment would be better without parked cars, Jensen says. There are 800 million parking spots for a third that number of cars.

That’s why we’ve created NVIDIA Drive, a full stack that will deliver full autonomy for cars, Jensen says. We now have more than 200 developers using DRIVE PX. This is a 50 percent increase in the past quarter alone.

He now shows how NVIDIA Drive can be used for mapping, as a co-pilot and as your guardian angel (so even when the car isn’t driving you, it should be watching out for you).

For mapping and driving he shows a video that shows how a car makes an HD map and localizes itself within it. This shows how a car scans, detects road features, constructs an HD road map and then locates itself.

Jensen then shows NVIDIA’s Janine acting as a test pilot, though she’s actually an events planner in real life.

Jensen then shows guardian angel, which tells driver not to proceed because there’s a car she didn’t seen coming across the intersecting road.

10:55 – The NVIDIA GPU Cloud Platform will be available in beta in July. “If you’re anxious to burst into the cloud, this is the way to do it,” Jensen says.


10:52 – Next in the docket: the NVIDIA Deep Learning Stack.

We decided to containerize a huge stack with every framework and version of software we know, and then create a cloud registry for it. so, if you have a titan X, you can go to a web site, type in your address and download a containerized stack. No configuration necessary. Once you start using the platform, you start realizing you can with a click create an instance, download the container and your workload bursts into the cloud. It’s the first hybrid deep learning cloud platform.

An NVIDIA engineer named Phil talks the crowd through it. He launches into the NVIDIA GPU Cloud, or NGC, and shows the three steps needed to create a deep learning job. First, you select which environment – cloud, your own DGX-1, your own DGX Station or Titan PC. If you choose cloud, you can choose from a whole series of options.  Next up is choosing the data set – which he clicks in in one step. Finally, you select which containerized  framework you want – there are choices like Pytorch, Caffe2 and others (they will get updated every month). You’re then good to go. You see a list of jobs that you’ve run and the ones that you’re currently running.

For the latest updates, refresh your browser, and tune in to the live video stream of our event below. 

10:46 – Jensen now introduces another flavor of V100 for hyperscale scaleout. It’s small, the size of a large CD case, it provides a 15-25X inference speedup against Intel Skylake. It draw s150 watts. He ridiculed its dimensions, playfully describing it as FHHL – full height, half length.

JHH now makes the case for accelerators. JHH shows a row of 500 servers with 1K cpus – a big honking wall — which can support 300K inferences a second. If we added this at $3K per node, with interconnect and power and cooling that goes with it. this would be $1.5 Million with 250K watts. That price doesn’t count power and cooling or interconnect.


This is equivalent to 33 nodes of V100, which is a 15x reduction

When we get asked about FPGAs, we decided:  why not make Volta the best inferencing machine can possibly be made.

10:41 – The subject now turns to inferencing, which Jensen had warned us earlier was coming.

Jensen announces TensorRT – RT for Run Time – for TensorFlow. It accelerates training by 12x and inferencing by 6x. (A tensor, Jensen notes, is a mathematical object containing vectors.)

What we can do with this is combine mathematics that had to be done in bias into a single block. We can also recognize when different mathematical blocks share the same inputs, this is achieved through graph analytics.

JHH shows a chart describing inferencing performance in terms of throughput and latency on ResNet-50, measured in terms of images per second. P100 can do 600 images a second, whereas Intel’s Broadwell CPU can do 100 images a second and K80 can do 200 per second. But Volta can do more than 5,000 images a second.

“Volta is groundbreaking work. Its’ incredibly good at training and incredibly good at inferencing,” Jensen says. “Volta and TensorRT are ideal for inferencing.”

10:36 – There’ also another flavor of Tesla V100 that comes with HGX-1, which is specifically for GPU cloud computing. This is intended for the public cloud, whether for Deep Learning, graphics, CUDA computation.

Joining Jensen on stage now is Jason Zander, corporate VP of Microsoft Azure, who skipped part of Microsoft’s BUILD conference to join him on stage. Microsoft has made amazing breakthroughs with DL – particularly in natural language processing and translation and with ResNet. Jensen asks him about the Cognitive Toolkit.

We want to infuse AI across our platforms, Zander says. We want users to be able to take advantage of it. We’ve been doing AI for nearly two decades. I can talk to someone in Chinese in real-time, even though I don’t know the project. Jensen asks him about Hololens, which recognizes and augments 3D imaging. With this, you can encounter someone from a different country in VR and speak to them and understand them, even though you aren’t using the same language.

“We’re on our second generation of GPUs in the cloud,” Zander says. “We just announced P40s and P100s, but we really love Volta. My job is to ensure people use the Azure Cloud, and people want to use what’s available immediately, without waiting. We want data scientists and developers to focus on models and less on the plumbing.”


10:28 – And now, Jensen announces NVIDIA DGX-1 with eight Telsa v100.  It’s labeled on the slide as the “essential instrument of AI research. What used to take a week now takes a shift. It replaces 400 servers. It offers 960 tensor TFLOPS. It will ship in Q3. It will cost $149,000. He notes that if you get one now powered by Pascal, you’ll get a free upgrade to Volta

Turns out, there’s also a small version of DGX-1,  DGX Station. Think of it as a personal sized one. It’s liquid-cooled and whisper-quiet. Every one of our deep learning engineers has one.

It has four Tesla V100s. It’s $69K. Order it now and we’ll deliver it in Q3. “So place your order now,” he suggests.

10:24 – Jensen now announces new framework releases for Volta.

Using Caff2, training training a convolutional neural network goes from more than 40 hours on eight K80s or about 20 hours on 8 Pascals, to about five hours on 8 Voltas.

Similarly on MxNet, which is growing like crazy, using LSTM – a network for time sequencing, we can train MXnET in just several hours compared with a day and a half for Kepler.


Jensen introduces his first live guest, Matt Wood, GM of deep learning and AI for Amazon Web Services. He says Amazon has been working on AI for nearly two decades, and it’s helped define new experiences like Echo, Alexa and Amazon Go. Earlier uses included the application of search and discovery. Matt says first business case of AWS was to take applicability within the reach of very few into the hands of many. We’re doing that, and we’re busy optimizing MXNet for Volta. We look forward to making Volta available in the cloud from its launch.

Jensen asks what’s the funniest question Alexa has ever gotten, but notes that it needs to be at least PG-rated. Matt laughs and talks about the direct personal relationship that people quickly obtain with the service.

Jensen asks how did you know customers wanted GPU in the cloud. Matt responses that it came in responses to customer demand. Our most recent instances is growing “like wildfire.”

10:15 – What had been possible in several minutes on TITAN X takes just seconds on Volta. This required learning art, style, shading very quickly.

10:14 – Next up: Deep learning for style transfer.

Jensen shows how deep learning allows you to do photographic style transfer – you learn the style form one photo and apply it to another. Julie, a deep learning scientist who works for NVIDIA, shows a photo of a dock a glorious, red and purple drenched sunset. She also shows a second photo of a dreamy blue Caribbean-looking beach, with a spit of sand pushing toward the horizon. Within a matter of 10 seconds or so, the style of the first image is applied to the second.

10:11 – Now, a new subject.

Jensen talks about the introduction of Kepler several years ago, which now runs the Titan supercomputer at the Oak Ridge National Laboratory, the fastest in the U.S. Since then GPUs have improved 7-8 times, while the CPU has only improved 50 percent a year.

He notes that NVIDIA has tons of computational scientists, and one of them is talking about what the university will look like roughly four billion years. He follow Andromeda making a close pass of the Milky Way, and the stars that get thrown off. It’s a live simulation that lets you looks out and see galaxies collide in waves. In 5.3 billion years there’s a confusion of all the galaxies into a single body, flung out in the universe.

Jensen calls it galaxies making love. But It’s a pretty grim vision , though at least it’s way, way after lunch.

For the latest updates, refresh your browser, and tune in to the live video stream of our event below. 


10:06 – This has 16MB of Cache, with memory from Samsung, we’ve achieved 900 megabytes of memory. It’s new NVLink interconnect is 10X faster than the fastest PCI Express interconnect.

The New Tensor Core is a 4×4 matrix array. It’s fully optimized for deep learning. We felt Pascal wasn’t fast enough. This is one year later than Pascal but 12x its tensor operations, and 6x its capability for inferencing.


He then goes through some demos with Volta. While it’s for AI, it also does graphics, so he starts demonstrating the computer-generated Japanese sci-fi fantasty Kingsglaive: Final Fantasy XV, by Square Enix.

We see one of its key characters Nyx, on the world of Eos, with visual fidelity that looks photorealistic. Jensen praises the character’s black leather jacket – which he is something of an expert on.


9:59 – Jensen now moves to the issue of how model complexity is exploding.

Microsoft’s ResNet, launched in 2015, had 7 exaFLOPS with 60 million parameters. In 2016, Baidu introduced its Deep Speech 2 model 20 ExFLOPs larges, with 300 million parameters. This year, Google NMT has 105 ExaFLOPS, with 8.7 bilion parameters. This is the ultimate high performance computing problem.

Jensen now says I’d like to introduce you to the next level of computer projects. He now introduces the Tesla Volta V100, which draws huge applause.

It’s made on TSMC’s 12nm finfet process, at the limits of photo lithography. It has 5,120 CUDA cores. And has 120 TeraFLOPS of performance, delivered by a new processors called a Tensor Core. The fact that this can even be manufacturing is extraordinary, he says.

The R&D behind this was about $3 billion, Jensen says.

9:55 – Next slide up, is about a new product that SAP has called Brand Impact. Advertisers spend $60B a year advertising their brand but they don’t know how successful the are.

SAP’s product, which runs on NVIDIA GPUs, lets you know in real-time the exposure of logos in video in a moving scene.  It times the length of exposure, the size the logo takes on the screen and the brand value that this delivers.

9:52 – Jensen talks now about the democratization of AI. He notes that the most popular course at Stanford is CS229, Intro to Machine Learning. It’s being taken by majors from across the campus. Everyone wants to be able to use their data to teach a computer to automate their work.

NVIDIA’s strategy, he says, is to build the most productive platform for deep learning. We’ve created the NVIDIA Deep Learning SDK. We  work with all of the major AI frameworks – Caffe2, MXNet Pytorch, TensorFlow and Theano. We work with the major OEMs, like Cisco, Dell, HPE, IBM, Lenovo. We work with every single cloud company in the world, which have provisioned GPUs.


We haven’t stopped there. because we realize what an important revolution this is, our NVIDIA Research group is doing advanced work in AI, and our Applied Research is working in other areas. We have dedicated teams working with vertical industries like ISPs and Healthcare. We also have a program for startups called NVIDIA Inception, which is working with 1,300 AI startups – we provide them with advice, technology, marketing help, and sometimes funding. In fact, NVIDIA will be handing out $1.5M later today to six of these 1,300 companies.

One especially cool company MapD is the first to create a database engine on top of GPUs, he just open-sourced MapD, enabling you to access massive databases in memory and query it in real time.


For the latest updates, refresh your browser, and tune in to the live video stream of our event below. 

9:44 –  Jensen now shows a slide titled Deep Learning for Ray Tracing. He describes how Deep Learning can be applied to the incredibly computationally demanding task of following photons around a simulated scene. It allows us to use AI to fill in information.

He shows a pair of simulated Mercedes SLK 350s, one being rendered without deep learning take some time to fully resolve in photo -realistic detail. With deep learning it can do this much faster. Deep learning based auto encoders work incredibly fast. In a matter of seconds, the full scene is rendered, including the reflection of trees, clouds and other objects in the larger scene.


9:41 – The Big Bang Of Modern AI lets a computer learn features of an image hierarchically, to turn noses and eyes and ears into a face. It’s not only robust but diverse – it could recognize me but also recognize me in different settings, in different clothes, partially concealed.

For first time, a deep learning network trained by data, not scientists, has been approved by the FDA for medical diagnosis. Superhuman speech recognition is arising. Captioning is becoming possible.

Reinforcement learning has brought steady improvement and unleashed breakthroughs like AlphaGo, which beat the world’s top Go player.

We now have unsupervised learning, which lets us fill in missing parts of data. More recently is the rise of generative adversarial networks, in which two networks go back and forth and train each other, based on work by Ian Goodfellow. This has allowed things like ability to generate voice, natural language translation and transfer learning where you could learn to translate from one language into another, and then apply that to a third language.

The Big Bang of Deep Learning has laid the groundwork for the era of AI where computers automate programs – it’s the automation of automation.

9:35 – The first trend that’s emerged is the rise of GPU Computing. The next is what we can call the second era of computing – the Era of Machine Learning. This, borne on AI, has transformed many consumer services which anticipate your needs.

Whereas computer scientists used to specify every instruction, now software writes software, algorithms write algorithms.

Deep learning is the culmination of research labs in Switzerland, Canada, the U.S.. All of this work has come together into the Big Bang of Deep Learning. Three things have made this possible, according to Fei-Fei Li, formerly of Stanford: 1) the culmination of deep learning approach, 2) enormous amounts of data; 3) the use of GPUs.


9:31 – Okay, enough of the context. Jensen is starting with his first demo.

It’s something called Holodeck. He’s calling up as a first guest someone from Sweden, Christian Koenigsegg. He appears as one of four avatars dressed as lightly clad space/time travelers, against a white field.

They’re looking at Christian’s company’s new car. It carries a $1.9 million price tag with 1,500 horsepower, and a top speed of 250 miles an hour. It’s a hybrid with no gears, made of carbon fiber.

This particular car is pretty amazing. It’s a color the maker calls Orange Amber. Though it kind of looks like Orange Fanta soda spiked with Dom Perignon champagne.

So, the avatars now go into the car. One grabs the steering wheel. The whole scene is in photorealistic 3D VR, with extraordinary visual fidelity. Jensen asks to see all the parts of the car and many thousands of them appear to explode and then quickly reassemble

9:23 – Jensen now shows a chart showing the rise of GPU Computing:

  • number of GTC attendees is up 3x in five years, to 7,000
  • there’s been an 11x increase in GPU developers over the same time to more than a half million
  • and downloads of our CUDA drivers and SDK’s exceeded one million last year.

And GTC’s attendance reflects this. All of the world’s top 15 tech companies are here. all of the top 10 auto makers here. there are 80 AI startups and 20 VR start up.

GTC, he said, is where the future gets invented.

9:19 – 
Jensen  shows a chart with a blue line that shows progressive improvement of processor performance, which has increased 50 percent a year for years. But it’s begun to slow.

So, we realized that there’s some workload inside applications, algorithms of artists, scientists, engineers – the Einsteins of our time – their software includes parallel processing aspects. So, we created a domain-specific accelerator that’s a companion to the CPU

The second thing we did was create a platform called CUDA. This was extended for computing starting 10 years ago. It’s our computing architecture, which you dedicate your lives to. Eventually, others can benefit from it. It has to be something available everywhere. It has to be thought about top to bottom with middleware. What’s really special about GPU Accelerated Computing is it took huge effort to port applications you’ve developed for CPUs onto this new platform. We dedicated ourselves to having a team dedicated to overcoming every possible layer of computing to find inefficiencies and get rid of waste.

This kind of top-to-bottom and bottom-to-top has yielded phenomenal results. Some people have described our progress as Moore’s law squared.

9:15 – For the last 30 years, we’ve benefited from a powerful impact. Moore’s law has enabled us to advance micro-processing architecture year in and year out. In addition, the law of Dennard scaling lets us put more transistors into an area as long as we can reduce its voltage – more and more transistors and more energy efficiency has let us improve microprocessor performance by a million times. Nothing else in society has advanced like that. But the laws of physics are catching up with us. We’re now at the end of two roads.

And this is the reason of our existence: To find a new life after Moore’s law.

9:10 – Big reveal: she, the voice, that is, the I, AI, is also the composer of the sound track of the video itself.

Big crowd reaction.

Here’s a teaser: we have a great blog post coming out later today that describes how this was made.

The so-called Voice of God comes up, introducing Jensen. The theme of the talk is behind him, “Powering the AI Revolution.”

Jensen himself is in trademark black – a familiar black leather jacket, black pants and shirt and shoes, with white trim around the sole.

We have a lot to talk to you about – laws of physics, laws of math, and some guests, he says.

9:07 – The room begins to darken and the music is shifting into a quitter, more soulful vibe.

And the video comes on. This one is different. For one, there’s a woman narrating. Over a montage of scenes of scientific advancement, she intones “I am a….”

“Visionary” comes first – referring to work don to explore distant galaxies.

“Healer” next, with ability to ID diseases from a single drop of blood.

Then, “Protector” of crops and oceans. Then “Helper,” of the disabled.

Folks beginning to get the message. Clearly she’s AI, speaking about herself in the first person. And the orchestral soundtrack is pretty powerful.

Next referring to being a “Creator,” “Teacher,” and”Learner.”

8:58 – True to NVIDIA’s roots in graphics, the company puts a lot shoulder into its opening videos. Recent ones at GTC, as well as our pretty cool CES opening keynote, were aspirational spine-tinglers about the promise of the future, narrated by the mellifluous Peter Coyote.

This year’s is different, though. Stay tuned for that. ….”

8:57 – The room’s filling up pretty fast. Light R&B is playing on speakers. But the screen’s the thing. It’s a yawning LED that’s just shy of 5K. I saw it tested yesterday and the colors are indelibly rich, deeply saturated, with blacks that look like something out of Dante.

On the screen are a series of crenulated triangles filling in with a jillion shades of green, undulating and reforming as you watch them

8:53 – Those here for Jensen’s keynote are of that cast of mind. Here’s an indication – of the 600 talks being given at GTC this year, some 60 percent are on AI-related topics.

There’s been tons of activity at the show in its first couple days. There’s a VR showcase with 10 virtual-reality companies competing for cash. There’s an Inception Awards event later today where six companies will receive $1.5 million in funding, contributed in part by Goldman Sachs, Fidelity, SoftBank and others. We also have a soundproof glass booth that some have mistaken for a human-sized aquarium where we’re recording episodes of NVIDIA’s AI Podcast with some of the luminaries that are here.

8:51 – There are those here who may remember GTC way, way back. Okay, eight years ago, at our first. There were a thousand fellow travelers on the second floor of the San Jose Fairmont Hotel, just up the street from here. You could have almost fitted them into the Presidential Suite there.

But the event’s grown close to 30 percent a year, every year since. There are well over 7,000 here today.

The nature of the event has changed, too.

Used to be focused pretty much on professional graphics. One of the first speakers was the CTO of LucasFilm. He showed how they use computer generated graphics to create realistic fire. It was very cool. But in more recent years, keynotes have come to focus on AI, deep learning and tools like IBM’s Watson computer.

8:50 – Our CEO Jensen Huang’s keynote should start in the next 10-15 minutes or so. We’re going to be trying to keep pace with today’s live blog post.

Thanks for tuning it with us here. Buckle up. It’s going to be quite a ride.