SAT, ACT, MCAT, LSAT — there are many standardized ways to measure human aptitude in various fields. But when it comes to deep learning and artificial intelligence, how do we capture the tangible standards that need to be met for an AI-based service to be successful?
This acronym, introduced by NVIDIA CEO Jensen Huang at the GPU Technology Conference earlier this year, describes a framework that addresses the seven major challenges for delivering AI-based servers: Programmability, Latency, Accuracy, Size of Model, Throughput, Energy Efficiency and Rate of Learning.
NVIDIA introduced this framework to help customers overcome these important challenges, given the complexity of deploying deep learning solutions and the rapidly moving pace of the industry.
“Hyperscale data centers are the most complicated computers the world has ever made,” Huang said. They serve hundreds of millions of people making billions of queries, and represent billions of dollars of investment.
How to Measure Deep Learning Performance
PLASTER describes the key elements for measuring deep learning performance. Each letter identifies a factor that must be considered to arrive at the right set of tradeoffs and to produce a successful deep learning implementation. They are:
- Programmability: The types and complexity of neural networks are growing exponentially in order to handle different use cases. Yet deep learning models must be coded, trained and then optimized for a specific runtime inference environment. The computing model needs to provide great performance on all kinds of neural networks.
- Latency: The time between requesting something and receiving a response is critical to the quality of a service. With most human-facing software systems, not just AI, the time is often measured in milliseconds.
- Accuracy: Deep learning models need to make the right predictions. Options to address data volumes have been either to transmit the full information with long delays or to sample the data and reconstruct it using techniques that can lead to inaccurate reconstruction and diagnostics.
- Size of model: To improve prediction accuracy, the size of neural networks is also growing exponentially. The computing approach needs to be support and efficiently process large deep learning models.
- Throughput: Hyperscale data centers require massive investments of capital. Justifying a return on this requires understanding how many inferences can be delivered within the latency targets. Or, put another way, how many applications and users can be supported by the data center.
- Energy efficiency: Power consumption can quickly increase the costs of delivering a service, driving a need to focus on energy efficiency in devices and systems.
- Rate of learning: Deep learning models are dynamic, involving training, deployment and retraining. Understanding how and how fast models can be trained, re-trained and deployed as new data arrives helps define success.
It’s important to note that none of the components is independent — PLASTER is greater than the sum of its parts. And getting them right holds the potential to elevate organizations to the next level in their work with deep learning.
Applying deep learning can be complex and is in the early stages of its life cycle. With a clear framework like PLASTER, organizations can take advantage of its vast potential.
In a recently published whitepaper, researchers explored PLASTER at greater depth within the context of NVIDIA’s deep learning work. They found that the acronym has superpowers — namely, the ability to help organizations better manage performance, make more efficient use of developer time and create a better developer operations environment to support the products and services that customers want.
Listen to Huang as he introduces and explains PLASTER for the first time, at GTC in San Jose.