SLMming Down Latency: How NVIDIA’s First On-Device Small Language Model Makes Digital Humans More Lifelike

Announced at Gamescom, ‘Mecha BREAK’ from Amazing Seasun Games is the first game to showcase ACE technology, including NVIDIA Nemotron-4 4B, for quicker, more relevant responses.

August 21, 2024 by Ike Nnoli

0 Comments

Editor’s note: This post is part of the AI Decoded series, which demystifies AI by making the technology more accessible, and showcases new hardware, software, tools and accelerations for RTX PC and workstation users.

At Gamescom this week, NVIDIA announced that NVIDIA ACE — a suite of technologies for bringing digital humans to life with generative AI — now includes the company’s first on-device small language model (SLM), powered locally by RTX AI.

The model, called Nemotron-4 4B Instruct, provides better role-play, retrieval-augmented generation and function-calling capabilities, so game characters can more intuitively comprehend player instructions, respond to gamers, and perform more accurate and relevant actions.

Available as an NVIDIA NIM microservice for cloud and on-device deployment by game developers, the model is optimized for low memory usage, offering faster response times and providing developers a way to take advantage of over 100 million GeForce RTX-powered PCs and laptops and NVIDIA RTX-powered workstations.

The SLM Advantage

An AI model’s accuracy and performance depends on the size and quality of the dataset used for training. Large language models are trained on vast amounts of data, but are typically general-purpose and contain excess information for most uses.

SLMs, on the other hand, focus on specific use cases. So even with less data, they’re capable of delivering more accurate responses, more quickly — critical elements for conversing naturally with digital humans.

Nemotron-4 4B was first distilled from the larger Nemotron-4 15B LLM. This process requires the smaller model, called a “student,” to mimic the outputs of the larger model, appropriately called a “teacher.” During this process, noncritical outputs of the student model are pruned or removed to reduce the parameter size of the model. Then, the SLM is quantized, which reduces the precision of the model’s weights.

With fewer parameters and less precision, Nemotron-4 4B has a lower memory footprint and faster time to first token — how quickly a response begins — than the larger Nemotron-4 LLM while still maintaining a high level of accuracy due to distillation. Its smaller memory footprint also means games and apps that integrate the NIM microservice can run locally on more of the GeForce RTX AI PCs and laptops and NVIDIA RTX AI workstations that consumers own today.

This new, optimized SLM is also purpose-built with instruction tuning, a technique for fine-tuning models on instructional prompts to better perform specific tasks. This can be seen in Mecha BREAK, a video game in which players can converse with a mechanic game character and instruct it to switch and customize mechs.

ACEs Up

ACE NIM microservices allow developers to deploy state-of-the-art generative AI models through the cloud or on RTX AI PCs and workstations to bring AI to their games and applications. With ACE NIM microservices, non-playable characters (NPCs) can dynamically interact and converse with players in the game in real time.

ACE consists of key AI models for speech-to-text, language, text-to-speech and facial animation. It’s also modular, allowing developers to choose the NIM microservice needed for each element in their particular process.

NVIDIA Riva automatic speech recognition (ASR) processes a user’s spoken language and uses AI to deliver a highly accurate transcription in real time. The technology builds fully customizable conversational AI pipelines using GPU-accelerated multilingual speech and translation microservices. Other supported ASRs include OpenAI’s Whisper, a open-source neural net that approaches human-level robustness and accuracy on English speech recognition.

Once translated to digital text, the transcription goes into an LLM — such as Google’s Gemma, Meta’s Llama 3 or now NVIDIA Nemotron-4 4B — to start generating a response to the user’s original voice input.

Next, another piece of Riva technology — text-to-speech — generates an audio response. ElevenLabs’ proprietary AI speech and voice technology is also supported and has been demoed as part of ACE, as seen in the above demo.

Finally, NVIDIA Audio2Face (A2F) generates facial expressions that can be synced to dialogue in many languages. With the microservice, digital avatars can display dynamic, realistic emotions streamed live or baked in during post-processing.

The AI network automatically animates face, eyes, mouth, tongue and head motions to match the selected emotional range and level of intensity. And A2F can automatically infer emotion directly from an audio clip.

Finally, the full character or digital human is animated in a renderer, like Unreal Engine or the NVIDIA Omniverse platform.

AI That’s NIMble

In addition to its modular support for various NVIDIA-powered and third-party AI models, ACE allows developers to run inference for each model in the cloud or locally on RTX AI PCs and workstations.

The NVIDIA AI Inference Manager software development kit allows for hybrid inference based on various needs such as experience, workload and costs. It streamlines AI model deployment and integration for PC application developers by preconfiguring the PC with the necessary AI models, engines and dependencies. Apps and games can then orchestrate inference seamlessly across a PC or workstation to the cloud.

ACE NIM microservices run locally on RTX AI PCs and workstations, as well as in the cloud. Current microservices running locally include Audio2Face, in the Covert Protocol tech demo, and the new Nemotron-4 4B Instruct and Whisper ASR in Mecha BREAK.

To Infinity and Beyond

Digital humans go far beyond NPCs in games. At last month’s SIGGRAPH conference, NVIDIA previewed “James,” an interactive digital human that can connect with people using emotions, humor and more. James is based on a customer-service workflow using ACE.

Changes in communication methods between humans and technology over the decades eventually led to the creation of digital humans. The future of the human-computer interface will have a friendly face and require no physical inputs.

Digital humans drive more engaging and natural interactions. According to Gartner, 80% of conversational offerings will embed generative AI by 2025, and 75% of customer-facing applications will have conversational AI with emotion. Digital humans will transform multiple industries and use cases beyond gaming, including customer service, healthcare, retail, telepresence and robotics.

Users can get a glimpse of this future now by interacting with James in real time at ai.nvidia.com.

Generative AI is transforming gaming, videoconferencing and interactive experiences of all kinds. Make sense of what’s new and what’s next by subscribing to the AI Decoded newsletter.

Into the Omniverse: How Industrial AI and Digital Twins Accelerate Design, Engineering and Manufacturing Across Industries

March 12, 2026 by James McKenna

0 Comments

Editor’s note: This post is part of Into the Omniverse, a series focused on how developers, 3D practitioners and enterprises can transform their workflows using the latest advancements in OpenUSD and NVIDIA Omniverse.

Industrial AI, digital twins, AI physics and accelerated AI infrastructure are empowering companies across industries to accelerate and scale the design, simulation and optimization of products, processes and facilities before building in the real world.

Earlier this month, NVIDIA and Dassault Systèmes announced a partnership that brings together Dassault Systèmes’ Virtual Twin platforms, NVIDIA accelerated computing, AI physics open models and NVIDIA CUDA-X and Omniverse libraries. This allows designers and engineers to use virtual twins and companions — trained on physics-based world models — to innovate faster, boost efficiency and deliver sustainable products.

Dassault Systèmes’ SIMULIA software now uses NVIDIA CUDA-X and AI physics libraries for AI-based virtual twin physics behavior — empowering designers and engineers to accurately and instantly predict outcomes in simulation.

NVIDIA is adopting Dassault Systèmes’ model-based systems engineering technologies to accelerate the design and global deployment of gigawatt-scale AI factories that are powering industrial and physical AI across industries. Dassault Systèmes will in turn deploy NVIDIA-powered AI factories on three continents through its OUTSCALE sovereign cloud, enabling its customers to run AI workloads while maintaining data residency and security requirements.

These efforts are already making a splash across industries, accelerating industrial development and production processes.

Industrial AI Simulations, From Car Parts to Cheese Proteins

Digital twins, also known as virtual twins, and physics-based world models are already being deployed to advance industries.

In automotive, Lucid Motors is combining cutting-edge simulation, AI physics open models, Dassault Systèmes’ tools for vehicle and powertrain engineering and digital twin technology to accelerate innovation in electric vehicles.

In life sciences, scientists and researchers are using virtual twins, Dassault Systèmes’ science-validated world models and the NVIDIA BioNeMo platform to speed molecule and materials discovery, therapeutics design and sustainable food development.

The Bel Group is using technologies from Dassault Systèmes’ supported by NVIDIA to accelerate the development and production of healthier, more sustainable foods for millions of consumers.

The company is using Dassault Systèmes’ industry world models to generate and study food proteins, creating non-dairy protein options that pair with its well-known cheeses, including Babybel. Using accurate, high-resolution virtual twins allows the Bel Group to study and develop validated research outcomes of food proteins more quickly and efficiently.

Using accurate, high-resolution virtual twins allows the Bel Group to study and develop validated research outcomes of food proteins more quickly and efficiently.

In industrial automation, Omron is using virtual twins and physical AI to design and deploy automation technology with greater confidence — advancing the shift toward digitally validated production.

In the aerospace industry, researchers and engineers at Wichita State University’s National Institute for Aviation Research use virtual twins and AI companions powered by Dassault Systèmes’ Industry World Models and NVIDIA Nemotron open models to accelerate the design, testing and certification of aircrafts.

Learning From and Simulating the Real World

Dassault Systemes’ physics-based Industry World Models are trained to have PhD-level knowledge in fields like biology, physics and material sciences. This allows them to accurately simulate real-world environments and scenarios so teams can test industrial operations end to end — from supply chains to store shelves — before deploying changes in the real world.

These virtual models can help researchers and developers with workflows ranging from DNA sequencing to strengthening manufactured materials for vehicles.

“Knowledge is encoded in the living world,” said Pascal Daloz, CEO of Dassault Systemes, during his 3DEXPERIENCE World keynote. “With our virtual twins, we are learning from life and are also understanding it in order to replicate it and scale it.”

Get Plugged In to Industrial AI

Learn more about industrial and physical AI by registering for NVIDIA GTC, running March 16-19 in San Jose, kicking off with NVIDIA founder and CEO Jensen Huang’s keynote address on Monday, March 16, at 11 a.m. PT.

At the conference:

Explore an industrial AI agenda packed with hands-on sessions, customer stories and live demos.
Dive into the world of OpenUSD with a special session focused on OpenUSD for physical AI simulation, as well as a full agenda of hands-on OpenUSD learning sessions.
Find Dassault Systèmes in the industrial AI and robotics pavilion on the show floor and learn from Florence Hu-Aubigny, executive vice president of R&D at Dassault Systemes, who’ll present on how virtual twins are shaping the next industrial revolution.
Get a live look at GTC with our developer community livestream on March 18, where participants can ask questions, request deep dives and talk directly with NVIDIA engineers in the chat.

Learn how to build industrial and physical AI applications by attending these sessions at GTC.

NVIDIA Virtualizes Game Development With RTX PRO Server

NVIDIA RTX PRO 6000 Blackwell Server Edition GPUs centralize compute infrastructure for content creation, AI, engineering and quality assurance, delivering workstation-class performance at data center scale for game studios.

March 10, 2026 by Paul Logan

0 Comments

Game development teams are working across larger worlds, more complex pipelines and more distributed teams than ever. At the same time, many studios still rely on fixed, desk-bound GPU hardware for critical production work.

At the Game Developers Conference (GDC) this week in San Francisco, NVIDIA is showcasing a new approach to bring together disparate workflows using virtualized game development on NVIDIA RTX PRO Servers, powered by NVIDIA RTX PRO 6000 Blackwell Server Edition GPUs and NVIDIA vGPU software.

With the RTX PRO Server, studios can centralize and virtualize core workflows across creative, engineering, AI research and quality assurance (QA) — all on shared GPU infrastructure in the data center.

This enables teams to maintain the responsiveness and visual fidelity they expect from workstation-class systems while improving infrastructure utilization, scalability, data security and operational consistency across teams and locations.

Simplifying Complex Workflows

As game development studios scale, hardware can often sit underutilized in one location while other teams wait to access it for production work. QA capacity is hard to expand quickly. Over time, workstation hardware, drivers and tools diverge, making bugs harder to reproduce. AI workloads are often isolated on separate infrastructure, creating more operational overhead.

The NVIDIA RTX PRO Server helps studios move from workstation-by-workstation scaling to centralized GPU infrastructure. Studios can pool resources, allocate performance by workload and support parallel development, testing and AI workflows without expanding physical workstation sprawl.

Centralized GPU infrastructure enables studios to run AI training, simulation and game automation workloads overnight, then dynamically reallocate the same resources to interactive development during the day, improving overall utilization and reducing idle capacity.

The NVIDIA RTX PRO Server supports virtualized workflows for 3D graphics and AI across the game development lifecycle for:

Artists: Providing virtual RTX workstations for traditional 3D and generative AI content-creation workflows.
Developers: Powering consistent, high-performance engineering environments for coding and 3D development.
AI researchers: Offering large-memory GPU profiles for fine-tuning, inference and AI agents.
QA teams: Enabling scalable game validation and performance testing using the same NVIDIA Blackwell architecture used by GeForce RTX 50 Series GPUs.

This allows studios to support multiple teams — including across sites and contractors — on one common GPU platform, improving collaboration and reducing debugging issues that can arise from disparate hardware.

Supporting AI and Engineering on Shared Infrastructure

AI is becoming a core part of everyday game development, spanning coding, content creation, testing and live operations. As these workflows expand, studios need infrastructure that can support AI alongside traditional graphics workloads without introducing separate, siloed systems.

With the RTX PRO Server, studios can support coding agents, internal model experimentation and AI-assisted production workflows without spinning up a separate AI stack for every team.

The NVIDIA RTX PRO 6000 Blackwell Server Edition GPU features a massive 96GB memory buffer, enabling developers to run multiple demanding applications simultaneously while supporting AI inference on larger models directly alongside real-time graphics workflows.

NVIDIA Multi-Instance GPU (MIG) technology partitions a single GPU into isolated instances with dedicated memory, compute and cache resources. Combined with NVIDIA vGPU software, MIG can help studios securely allocate GPU capacity across users and workloads. In combined MIG and vGPU configurations, a single RTX PRO 6000 Blackwell Server Edition GPU can support up to 48 concurrent users, maximizing utilization while maintaining performance isolation.

Enterprise-Ready Deployment for Game Studios

NVIDIA RTX PRO Servers are designed for enterprise-grade data-center operations. Studios can deploy virtual workstations on RTX PRO Servers via NVIDIA vGPU on supported hypervisor and remote workstation platforms.

That means RTX PRO Servers can fit into studios’ existing infrastructure and IT practices, rather than requiring one-off deployments.

Major game publishers already use NVIDIA vGPU technology to scale centralized development infrastructure and improve efficiency at studio scale.

Learn more about the NVIDIA RTX PRO Server.

See these workflows live by joining NVIDIA’s booth 1426 at GDC or attending NVIDIA GTC, running March 16-19 in San Jose, California.

See notice regarding software product information.

GeForce NOW Raises the Game at the Game Developers Conference

Dive into all the latest announcements for GeForce NOW and catch five new games in the cloud, including the latest entry in ‘Monster Hunter Stories’ and Fortnite’s ‘Save The World’ update.

March 12, 2026 by GeForce NOW Community

0 Comments

GeForce NOW is bringing the game to the Game Developers Conference (GDC), running this week in San Francisco. While developers build the future of gaming, GeForce NOW is delivering it to gamers. The latest updates bring smoother performance, easier game discovery and a fresh lineup of blockbuster titles to the cloud.

Game discoverability gets a boost with new in‑app labels for connected accounts for Xbox Game Pass and Ubisoft+. It’ll be easier than ever to see titles already available through linked subscriptions, so members can seamlessly jump into games they already own.

Virtual reality gets a smooth upgrade — supported devices now stream at 90 frames per second (fps), up from 60 fps, delivering more responsive and immersive virtual reality (VR) experiences.

Account linking is also leveling up. Following Gaijin single sign-on announced at CES in January, GOG account linking and game library syncing are coming soon.

The GeForce NOW library continues to grow with new releases joining the cloud at launch: CONTROL Resonant and Samson: A Tyndalston Story. Plus, select Xbox titles will join the Install-to-Play library.

In addition, there’s a lineup of five new games to catch this week, including Capcom’s Monster Hunter Stories 3: Twisted Reflection, on top of the latest update for Fortnite.

Gaming Is Buzzing

GeForce NOW is rolling into GDC with an easier way to keep track of titles, as well as performance upgrades and a growing lineup of major titles ready to stream at launch.

Keeping track of which game lives on which service can be tricky. In‑App labels — coming soon to GeForce NOW for connected subscriptions — will help make it simple for members to know exactly what games they can play on GeForce NOW. Once a member connects their Xbox Game Pass Account or Ubisoft+ account, clear labels will appear directly on the game art inside the GeForce NOW app — eliminating guesswork and making it easy to see exactly what’s available to play from their game subscription services.

GOG and Gaijin SSO coming to GeForce NOW — *Set it and forget it.*

Account linking is expanding too. On top of Gaijin single sign-on, GeForce NOW is adding GOG account linking and game library syncing in the coming months.

90fps VR gaming on GeForce NOW — *Smooth moves.*

Virtual reality is also getting an upgrade. Starting Thursday, March 19, VR devices that GeForce NOW supports, including Apple Vision Pro, Meta Quest and Pico devices, will stream at 90 fps for Ultimate members, an increase from 60 fps. The higher frame rate enhances smoothness, responsiveness and realism across every session — whether gamers are chasing enemies through neon-lit streets or exploring far‑flung alien worlds.

GeForce NOW’s Install‑to‑Play library is also expanding with select Xbox titles, including Brutal Legend from Double Fine Productions and Contrast from Compulsion Games. These additions bring more flexibility for members to download and install their owned games alongside streaming favorites.

That’s just the start. Highly anticipated games are headed to the cloud at launch:

CONTROL Resonant coming to GeForce NOW — *Bending reality.*

CONTROL Resonant — Remedy’s upcoming action‑adventure role-playing game (RPG) that blends supernatural powers with a warped Manhattan facing a reality-bending cosmic threat.

Samson coming to GeForce NOW — *Unravel a family story steeped in myths.*

Samson: A Tyndalston Story — the game from Liquid Swords is a gritty action brawler, set in the city of Tyndalston, launching on PC.

Free to Save the World

Fortnite save the world on GeForce NOW — *Chaos in the cloud.*

Fortnite’s original adventure is back in the spotlight — and soon, it’ll free to play. Fortnite first launched in 2017 as a story-driven co‑op experience, and on Thursday, April 16, the “Save the World” update will officially be free to play for all players. Pre-registration begins on Thursday, March 12.

Join forces against hordes of husks, solo or with the squad, in a player vs. environment action-packed story, complete with gathering, crafting and collecting. Pick a favored playstyle with four distinct classes to choose from, over 150 heroes and weapons to upgrade, and loadout customization options to hone builds even further. With hundreds of updates since its original launch and over 100 hours of content, squads can build, grind gear and engineer elaborate homebase defenses to keep the Storm King at bay. “Save the World” isn’t available on mobile devices, including tablets.

On GeForce NOW, Fortnite “Save the World” streams straight from the cloud — no waiting around for updates or patches. Low‑latency streaming keeps building, shooting and trap placement feeling snappy across supported devices. Stay in the action with GeForce NOW.

Gear Up for Glory

Battlefield 6 reward on GeForce NOW — *The cloud makes it easy to suit up in style.*

From chaotic infantry clashes to roaring jet dogfights, every match is an unpredictable explosion of strategy and mayhem in EA’s Battlefield 6.

This week, GeForce NOW Ultimate members can drop into the action with serious style — a new reward, the Advancing Gloom Soldier Skin, gives soldiers a sleek, battle-hardened look fit for the frontlines. Members can claim it in their GeForce NOW account portals, redeem it at EA.com/redeem, then show up ready in true Ultimate fashion. It’s available through Sunday, April 12, or while supplies last.

Being a GeForce NOW member pays off. Whether streaming on the go or maxing out graphics in the cloud, members get exclusive rewards to keep and flaunt.

Start the Games

MH3 Twister Reflection on GeForce NOW — *Twin monsters, one cloud.*

Twin Rathalos, born in a twist of fate, set the stage for the third entry in the Monster Hunter Stories RPG series, launching on GeForce NOW. Monster Hunter Stories 3: Twisted Reflection is an RPG series set in the Monster Hunter world, where players can become a Rider, and raise and bond with their favorite monsters. Play it instantly on GeForce NOW and take the adventure anywhere, on any device.

In addition, members can look for the following:

Warcraft I: Remastered (New release on Ubisoft, March 11)
Warcraft II: Remastered (New release on Ubisoft, March 11)
1348 Ex Voto (New release on Steam, March 12, GeForce RTX 5080-ready)
John Carpenter’s Toxic Commando (New release on Steam, March 12, GeForce RTX 5080-ready)
Monster Hunter Stories 3: Twisted Reflection (New release on Steam, March 12, GeForce RTX 5080-ready)

This week’s additional GeForce RTX 5080-ready game, on top of the addition of John Carpenter’s Toxic Commando, 1348 Ex Voto and Monster Hunter Stories 3: Twisted Reflection:

Greedfall: The Dying World 1.0 (Steam, GeForce RTX 5080-ready)

What are you planning to play this weekend? Let us know on X or in the comments below.

Year: 2024

SLMming Down Latency: How NVIDIA’s First On-Device Small Language Model Makes Digital Humans More Lifelike

The SLM Advantage

ACEs Up

AI That’s NIMble

To Infinity and Beyond

Into the Omniverse: How Industrial AI and Digital Twins Accelerate Design, Engineering and Manufacturing Across Industries

Industrial AI Simulations, From Car Parts to Cheese Proteins

Learning From and Simulating the Real World

Get Plugged In to Industrial AI

NVIDIA Virtualizes Game Development With RTX PRO Server

Simplifying Complex Workflows

Supporting AI and Engineering on Shared Infrastructure

Enterprise-Ready Deployment for Game Studios

Share on Mastodon

The SLM Advantage

ACEs Up

AI That’s NIMble

To Infinity and Beyond

Related News

Industrial AI Simulations, From Car Parts to Cheese Proteins

Learning From and Simulating the Real World

Get Plugged In to Industrial AI

Related News

Simplifying Complex Workflows

Supporting AI and Engineering on Shared Infrastructure

Enterprise-Ready Deployment for Game Studios

Related News

Gaming Is Buzzing

Free to Save the World

Gear Up for Glory

Start the Games

Related News