NVIDIA RTX Accelerates 4K AI Video Generation on PC

2025 marked a breakout year for AI development on PC.

PC-class small language models (SLMs) improved accuracy by nearly 2x over 2024, dramatically closing the gap with frontier cloud-based large language models (LLMs). AI PC developer tools including Ollama, ComfyUI, llama.cpp and Unsloth have matured, their popularity has doubled year over year and the number of users downloading PC-class models grew tenfold from 2024.

These developments are paving the way for generative AI to gain widespread adoption among everyday PC creators, gamers and productivity users this year.

At CES this week, NVIDIA is announcing announcing a wave of AI upgrades for GeForce RTX, NVIDIA RTX PRO and NVIDIA DGX Spark devices that unlock the performance and memory needed for developers to deploy generative AI on PC, including:

Up to 3x performance and 60% reduction in VRAM for video and image generative AI via PyTorch-CUDA optimizations and native NVFP4/FP8 precision support in ComfyUI.
RTX Video Super Resolution integration in ComfyUI, accelerating 4K video generation.
NVIDIA NVFP8 optimizations for the open weights release of Lightricks’ state-of-the-art LTX-2 audio-video generation model.
A new video generation pipeline for generating 4K AI video using a 3D scene in Blender to precisely control outputs.
Up to 35% faster inference performance for SLMs via Ollama and llama.cpp.
RTX acceleration for Nexa.ai’s Hyperlink new video search capability.

These advancements will allow users to seamlessly run advanced video, image and language AI workflows with the privacy, security and low latency offered by local RTX AI PCs.

Generate Videos 3x Faster and in 4K on RTX PCs

Generative AI can make amazing videos, but online tools can be difficult to control with just prompts. And trying to generate 4K videos is near impossible, as most models are too large to fit on PC VRAM.

Today, NVIDIA is introducing an RTX-powered video generation pipeline that enables artists to gain accurate control over their generations while generating videos 3x faster and upscaling them to 4K — only using a fraction of the VRAM.

This video pipeline allows emerging artists to create a storyboard, turn it into photorealistic keyframes and then turn these keyframes into a high-quality, 4K video. The pipeline is split into three blueprints that artists can mix and match or modify to their needs:

A 3D object generator that creates assets for scenes.
A 3D-guided image generator that allows users to set their scene in Blender and generate photorealistic keyframes from it.
A video generator that follows a user’s start and end key frames to animate their video, and uses NVIDIA RTX Video technology to upscale it to 4K

This pipeline is possible by the groundbreaking release of the new LTX-2 model from Lightricks, available for download today.

A major milestone for local AI video creation, LTX-2 delivers results that stand toe-to-toe with leading cloud-based models while generating up to 20 seconds of 4K video with impressive visual fidelity. The model features built-in audio, multi-keyframe support and advanced conditioning capabilities enhanced with controllability low-rank adaptations — giving creators cinematic-level quality and control without relying on cloud dependencies.

Under the hood, the pipeline is powered by ComfyUI. Over the past few months, NVIDIA has worked closely with ComfyUI to optimize performance by 40% on NVIDIA GPUs, and the latest update adds support for the NVFP4 and NVFP8 data formats. All combined, performance is 3x faster and VRAM is reduced by 60% with RTX 50 Series’ NVFP4 format, and performance is 2x faster and VRAM is reduced by 40% with NVFP8.

NVFP4 and NVFP8 checkpoints are now available for some of the top models directly in ComfyUI. These models include LTX-2 from Lightricks, FLUX.1 and FLUX.2 from Black Forest Labs, and Qwen-Image and Z-Image from Alibaba. Download them directly in ComfyUI, with additional model support coming soon.

Once a video clip is generated, videos are upscaled to 4K in just seconds using the new RTX Video node in ComfyUI. This upscaler works in real time, sharpens edges and cleans up compression artifacts for a clear final image. RTX Video will be available in ComfyUI next month.

To help users push beyond the limits of GPU memory, NVIDIA has collaborated with ComfyUI to improve its memory offload feature, known as weight streaming. With weight streaming enabled, ComfyUI can use system RAM when it runs out of VRAM, enabling larger models and more complex multistage node graphs on mid-range RTX GPUs.

The video generation workflow will be available for download next month, with the newly released open weights of the LTX-2 Video Model and ComfyUI RTX updates available now.

A New Way to Search PC Files and Videos

File searching on PCs has been the same for decades. It still mostly relies on file names and spotty metadata, which makes tracking down that one document from last year way harder than it should be.

Hyperlink — Nexa.ai’s local search agent — turns RTX PCs into a searchable knowledge base that can answer questions in natural language with inline citations. It can scan and index documents, slides, PDFs and images, so searches can be driven by ideas and content instead of file name guesswork. All data is processed locally and stays on the user’s PC for privacy and security. Plus, it’s RTX-accelerated, taking 30 seconds per gigabyte to index text and image files and three seconds for a response on a RTX 5090 GPU, compared with an hour per gigabyte to index files and 90 seconds for a response on CPUs.

At CES, Nexa.ai is unveiling a new beta version of Hyperlink that adds support for video content, enabling users to search through their videos for objects, actions and speech. This is ideal for users ranging from video artists looking for B-roll to gamers who want to find that time they won a battle royale match to share with their friends.

For those interested in trying the Hyperlink private beta, sign up for access on this webpage. Access will roll out starting this month.

Small Language Models Get 35% Faster

NVIDIA has collaborated with the open‑source community to deliver major performance gains for SLMs on RTX GPUs and the NVIDIA DGX Spark desktop supercomputer using Llama.cpp and Ollama. The latest changes are especially beneficial for mixture-of-experts models, including the new NVIDIA Nemotron 3 family of open models.

SLM inference performance has improved by 35% and 30% for llama.cpp and Ollama, respectively, over the past four months. These updates are available now, and a quality-of-life upgrade for llama.cpp also speeds up LLM loading times.

These speedups will be available in the next update of LM Studio, and will be coming soon to agentic apps like the new MSI AI Robot app. The MSI AI Robot app, which also takes advantage of the Llama.cpp optimizations, lets users control their MSI device settings and will incorporate the latest updates in an upcoming release.

NVIDIA Broadcast 2.1 Brings Virtual Key Light to More PC Users

The NVIDIA Broadcast app improves the quality of a user’s PC microphone and webcam with AI effects, ideal for livestreaming and video conferencing.

Version 2.1 updates the Virtual Key Light effect to improve performance — making it available to RTX 3060 desktop GPUs and higher — handle more lighting conditions, offer broader color temperature control and use an updated HDRi base map for a two‑key‑light style often seen in professional streams. Download the NVIDIA Broadcast update today.

Transform an At-Home Creative Studio Into an AI Powerhouse With DGX Spark

As new and increasingly capable AI models arrive on PC each month, developer interest in more powerful and flexible local AI setups continues to grow. DGX Spark — a compact AI supercomputer that fits on users’ desks and pairs seamlessly with a primary desktop or laptop — enables experimenting, prototyping and running advanced AI workloads alongside an existing PC.

Spark is ideal for those interested in testing out LLMs or prototyping agentic workflows, or for artists who want to generate assets in parallel to their workflow so that their main PC is still available for editing.

At CES, NVIDIA is unveiling major AI performance updates to Spark, delivering up to 2.6x faster performance since it launched just under three months ago.

New DGX Spark playbooks are also available, including one for speculative decoding and another to fine-tune models with two DGX Spark modules.

Plug in to NVIDIA AI PC on Facebook, Instagram, TikTok and X — and stay informed by subscribing to the RTX AI PC newsletter. Follow NVIDIA Workstation on LinkedIn and X.