Correcting Intel’s Deep Learning Benchmark Mistakes

Benchmarks are an important tool for measuring performance, but in a rapidly evolving field it can be difficult to keep up with the state of the art. Recently Intel published some incorrect “facts” about their long promised Xeon Phi processors.

Few fields are moving faster right now than deep learning. Today’s neural networks are 6x deeper and more powerful than just a few years ago. There are new techniques in multi-GPU scaling that offer even faster training performance.

In addition, our architecture and software have improved neural network training time by over 10x in a year by moving from Kepler to Maxwell to today’s latest Pascal-based systems, like the DGX-1 with eight Tesla P100 GPUs.

So it’s understandable that newcomers to the field may not be aware of all the developments that have been taking place in both hardware and software.

For example, Intel recently published some out-of-date benchmarks to make three claims about deep learning performance with Knights Landing Xeon Phi processors:

  • Xeon Phi is 2.3x faster in training than GPUs(1)
  • Xeon Phi offers 38% better scaling that GPUs across nodes(2)
  • Xeon Phi delivers strong scaling to 128 nodes while GPUs do not(3)

We’d like to address these claims and correct some misperceptions that may arise.

Fresh vs Stale Caffe

Intel used Caffe AlexNet data that is 18 months old, comparing a system with four Maxwell GPUs to four Xeon Phi servers. With the more recent implementation of Caffe AlexNet, publicly available here, Intel would have discovered that the same system with four Maxwell GPUs delivers 30% faster training time than four Xeon Phi servers.

In fact, a system with four Pascal-based NVIDIA TITAN X GPUs trains 90% faster and a single NVIDIA DGX-1 is over 5x faster than four Xeon Phi servers.

DL-comparison-chart

38% Better Scaling

Intel is comparing Caffe GoogleNet training performance on 32 Xeon Phi servers to 32 servers from Oak Ridge National Laboratory’s Titan supercomputer. Titan uses four-year-old GPUs (Tesla K20X) and an interconnect technology inherited from the prior Jaguar supercomputer. Xeon Phi results were based on recent interconnect technology.

Using more recent Maxwell GPUs and interconnect, Baidu has shown that their speech training workload scales almost linearly up to 128 GPUs.

Source: Persistent RNNs: Stashing Recurrent Weights On-Chip, G.Diamos
Source: Persistent RNNs: Stashing Recurrent Weights On-Chip, G.Diamos

Scalability relies on the interconnect and architectural optimizations in the code as much as the underlying processor. GPUs are delivering great scaling for customers like Baidu.

Strong-Scaling to 128 Nodes

Intel claims that 128 Xeon Phi servers deliver 50x faster performance compared with a single Xeon Phi server, while no such scaling data exists for GPUs. As noted above, Baidu already published results showing near-linear scaling up to 128 GPUs.

For strong-scaling, we believe strong nodes are better than weak nodes. A single strong server with numerous powerful GPUs delivers superior performance than lots of weak nodes, each with one or two sockets of less-capable processors, like Xeon Phi. For example, a single DGX-1 system offers better strong-scaling performance than at least 21 Xeon Phi servers (DGX-1 is 5.3x faster than 4 Xeon Phi servers).

Era of AI

Deep learning has the potential to revolutionize computing, improve our lives, improve the efficiency and intelligence of our business systems, and deliver advancements that will help humanity in profound ways. That’s why we’ve been enhancing the design of our parallel processors and creating software and technologies to accelerate deep learning for many years.

Our dedication to deep learning is deep and broad. Every framework has NVIDIA-optimized support, and every major deep learning researcher, laboratory and company is using NVIDIA GPUs.

While we can correct each of their wrong claims, we think deep learning testing against old Kepler GPUs and outdated software versions are mistakes that are easily fixed in order to keep the industry up to date.

It’s great that Intel is now working on deep learning. This is the most important computing revolution with the era of AI upon us and deep learning is too big to ignore. But they should get their facts straight.

Similar Stories

  • XYZ

    So many lies. How do we know who to believe?

  • CorpSmackDownBegins

    pew pew pew

  • Kaizer

    Is the speed up only applicable to convolutions? Which functions are accelerated by GPUs? I believe Intel Phi can accelerate diverse machine learning pipelines, not just convolutions.

  • Jmiah Diabetical Williamson

    So what’s up with the CUDA drivers 7.5.30 not working on Adobe After Effects 15.3 on my MAC?!? I cant even render anything 3D with 4Gb VRAM AMD Radeon R9 M295X OPEN GL!!!

  • nqxla

    Run the benchmarks yourselves.. or in this case, wait for OpenAI to do the same, since they have a DGX1 now.

  • trajan2448

    Intel has a history of expensive failures in mobile, and graphics(Larrabee), both of which were heavily hyped until their eventual demise. The fact they released non current data as proof of their excellence us not reassuring.

  • ProphetC2 .

    Are you for real? How do you expect NVIDIA drivers to install where there is no NVIDIA GPU present, and even worse: CUDA to work where there are no CUDA cores available?!! I hope you’re just trolling.

  • Gary_Rainville

    For technical assistance, check our support site where we have a knowledge base, live chat and email support: http://www.nvidia.com/page/support.html

  • poohbear300

    totally agree. i’m not sure if they learned their lesson but from the looks of these slides it’s not promising.

  • Franpa

    Now all Nvidia needs to do is stop using gray fonts on a white background so that their website content is legible on an LCD display without their customers having to injure themselves from excessive eye strain.

  • jipe4153

    the speedup is applicable to all of the training. The GPU is doing all of it, GPUs are highly programmable and are being used in applications in virtually evert field of computing.

  • Najeeb Shah

    Its black, adjust your monitor settings

  • Anne-Lise Pasch

    No, its grey. #464646 to be precise. Adjust your monitor settings.

  • http://hubpages.com/living/Review-of-the-Rinna-RL75iN-Natural-Gas-Tankless-Water-Heater Bert

    Of course, corrections lead to improvement.