country_code

If I Had a Hammer: Purdue’s Anvil Supercomputer Will See Use All Over the Land

The project director of the new accelerated system at Purdue shares the story of her path to leadership in high performance computing.
by
Purdue Anvil AI supercomputer

Carol Song is opening a door for researchers to advance science on Anvil, Purdue University’s new AI-ready supercomputer, an opportunity she couldn’t have imagined as a teenager in China.

“I grew up in a tumultuous time when, unless you had unusual circumstances, the only option for high school grads was to work alongside farmers or factory workers, then suddenly I was told I could go to college,” said Song, now the project director of Anvil.

And not just any college. Her scores on a national entrance exam opened the door to Tsinghua University, home to China’s most prestigious engineering school.

Along the way, someone told her computers would be big, so she signed up for computer science before she had ever seen a computer. She learned soon enough.

“We were building hardware from the ground up, designing microinstructions and logic circuits, so I got to understand computers from the inside out,” she said.

Easing Access to Supercomputers

Skip forward a few years to grad school at the University of Illinois when another big door opened.

While working in distributed systems, she was hired as one of the first programmers at the National Center for Supercomputing Applications,  one of the first sites in a U.S. program funding supercomputers that researchers shared.

To make the systems more accessible, she helped develop alternatives to the crude editing tools of the day that displayed one line of a program at a time. And she helped pioneering researchers like Michael Norman create visualizations of their work.

GPUs Add AI to HPC

In 2005, she joined Purdue, where she has helped manage nearly three dozen research projects representing more than $60 million in grants as a senior research scientist in the university’s supercomputing center.

“All that helped when we started defining Anvil. I see researchers’ pain points when they are getting on a new system,” said Song.

Anvil links 1,000 Dell EMC PowerEdge C6525 server nodes with 2,000 of the latest AMD x86 CPUs and 64 NVIDIA A100 Tensor Core GPUs on a NVIDIA Quantum InfiniBand network to handle traditional HPC and new AI workloads.

The system, built by Dell Technologies, will deliver 5.3 petaflops and half a million GPU cycles per year to tens of thousands of researchers across the U.S. working on the National Science Foundation’s XSEDE network.

Anvil Forges Desktop, Cloud Links

To harness that power, Anvil supports interactive user interfaces as well as the batch jobs that are traditional in high performance computing.

“Researchers can use their favorite tools like Jupyter notebooks and remote desktop interfaces so the cluster can look just like in their daily work environment,” she said.

Anvil will also support links to Microsoft Azure, so researchers can access its large datasets and commercial cloud-computing muscle. “It’s an innovative part of this system that will let researchers experiment with creating workflows that span research and commercial environments,” Song said.

Fighting COVID, Exploring AI

More than 30 research teams have already signed up to be early users of Anvil.

One team will apply deep learning to medical images to improve diagnosis of respiratory diseases including COVID-19. Another will build causal and logical check points into neural networks to explore why deep learning delivers excellent results.

“We’ll support a lot of GPU-specific tools like NGC containers for accelerated applications, and as with every new system, users can ask for additional toolkits and libraries they want,” she said.

The Anvil team aims to invite industry collaborations to test new ideas using up to 10 percent of the system’s capacity. “It’s a discretionary use we want to apply strategically to enable projects that wouldn’t happen without such resources,” she said.

Opening Doors for Science and Inclusion

Early users are working on Anvil today and the system will be available for all users in about a month.

Anvil’s opening day has a special significance for Song, one of the few women to act as a lead manager for a national supercomputer site.

Carol Song. project director, Purdue Anvil supercomputer
Carol Song and Purdue’s Anvil supercomputer

“I’ve been fortunate to be in environments where I’ve always been encouraged to do my best and given opportunities,” she said.

“Around the industry and the research computing community there still aren’t a lot of women in leadership roles, so it’s an ongoing effort and there’s a lot of room to do better, but I’m also very enthusiastic about mentoring women to help them get into this field,” she added.

Purdue’s research computing group shares Song’s enthusiasm about getting women into supercomputing. It’s home to one of the first chapters of the international Women in High-Performance Computing organization.

Purdue’s Women in HPC chapter sent an all-female team to a student cluster competition at SC18. It also hosts outside speakers, provides travel support to attend conferences and connects students and early career professionals to experienced mentors like Song.

Pictured at top: Carol Song, Anvil’s principal investigator (PI) and project director along with Anvil co-PIs (from left) Rajesh Kalyanam, Xiao Zhu and Preston Smith. 

At ISC, JUPITER Shows What Exascale Science Looks Like

Europe’s first exascale supercomputer — running on NVIDIA Grace Hopper Superchips — is mapping the brain, modeling climate, advancing 6G AI and breaking records in quantum computing simulation.
by

JUPITER, Europe’s first exascale supercomputer at Germany’s Forschungszentrum Jülich, runs on NVIDIA Grace Hopper Superchips and NVIDIA Quantum-X800 InfiniBand networking — and it’s had a busy year.

As the international supercomputing community gathers at ISC in Hamburg this week, four projects running on JUPITER point to what exascale computing can actually do: map the human brain at cellular scale, simulate the entire Earth’s climate at 1-kilometer resolution, build AI systems for the next generation of wireless networks and simulate a universal 50-qubit quantum computer.

“With JUPITER, Europe doesn’t just join the exascale era — it leads it, across the widest range of science and AI of any system worldwide,” said Thomas Lippert, director of the Jülich Supercomputing Centre and professor at Goethe University Frankfurt. 

Four projects, detailed below, share a throughline: scientific problems that were out of reach on previous hardware are now tractable at exascale.

A Foundation Model for Mapping the Brain

The Jülich Brain Atlas project — anchored at Jülich’s Institute of Neuroscience and Medicine with partner Helmholtz AI, partner hospital and other Helmholtz institutions — has produced CytoNet, a foundation model for brain microarchitecture analysis.

The complexity of the human brain is astonishing. With 86 billion neurons and about 100 trillion connections between them, understanding brain function at single neuron resolution has been out of reach, until now.

The research is led by neuroscientist Katrin Amunts and computer scientist Christian Schiffer at INM-1, Jülich’s Institute of Neuroscience and Medicine. The model learns from brain imaging data at cellular scale, building a map that links individual cell structures to broader patterns of brain organization and function.

Training ran on JUPITER in under five days, using 6.5 petabytes of data from 21 post-mortem brains on 4,096 NVIDIA Grace Hopper Superchips. A paper describing the work is available on arXiv.

“For the first time, we’re not just using AI to analyze the brain — we’re building an agent that can think through the experiment itself,” said Katrin Amunts, director of INM-1 at Forschungszentrum Jülich and professor of brain research at Heinrich Heine University Düsseldorf. “That changes what neuroscience will be, and JUPITER is what makes that sentence possible to say today.”

That agent is the team’s next step: building an AI agent for brain researchers — integrating multimodal reasoning, language interfaces and Q&A capabilities using open models, including NVIDIA Nemotron 3 120B, working toward AI assistants that can help scientists interrogate brain data directly.

Climate at Kilometer Resolution

A novel ICON configuration — developed by researchers at the ETH Zurich, German Climate Computing Centre (DKRZ), Jülich Supercomputing Centre (JSC), Max Planck Institute for Meteorology, NVIDIA, Swiss National Supercomputing Centre (CSCS) and the University of Hamburg — won the Gordon Bell Prize for Climate Modelling at SC25 last November.

The breakthrough isn’t resolution alone. ICON is the first model to simulate a coupled Earth system at 1-kilometer resolution, with ocean, atmosphere and land, biogeochemistry and the full carbon cycle, with carbon exchanged, between all components. It can simulate and visualize complete ecosystems, such as phytoplankton blooms and zooplankton grazing. Previous systems could model pieces of this; ICON runs it all. This allows a much more precise and complete simulation of the Earth — observable at that level of detail for the first time.

Running on 20,480 NVIDIA Grace Hopper Superchips on JUPITER, the model simulated roughly 146 days of real climate into 24 hours of compute, setting a world record in global climate simulation. NVIDIA’s involvement in the ICON community spans more than a decade.

“Our simulations resolve the fine-scale winds, ocean eddies and upper-ocean mixing that shape marine ecosystems and regulate the ocean’s uptake of carbon,” said Daniel Klocke, computational infrastructure and model development group leader at the Max Planck Institute for Meteorology. “At a global resolution of just 1 kilometer, many of these interactions emerge directly from the laws of physics rather than being approximated. This gives us an unprecedented view of how the atmosphere, ocean and biosphere work together, helping us understand the processes driving our changing climate.”

6G Gets an Exascale Partner

In March, Ericsson and Forschungszentrum Jülich announced a collaboration to develop AI for the continued evolution of 5G and for 6G networks — with JUPITER as the compute engine for large-scale AI model training and testing.

The collaboration targets brain-inspired architectures designed to handle complex network operations at far lower energy costs. 

Research priorities include AI models for Ericsson’s radio and core networks, energy-efficient AI inference at the radio edge using neuromorphic approaches, and modular supercomputing architecture concepts drawn from JSC’s exascale work. 

Breaking Quantum Records

Researchers at the Jülich Supercomputing Centre (JSC), working with the jointly run NVIDIA Application Lab, also set a world first by fully simulating a universal 50-qubit quantum computer, surpassing the previous 48-qubit record. 

The simulation was made possible by drawing on the coherent, tightly coupled CPU-GPU memory architecture of JUPITER’s NVIDIA GH200 Grace Hopper Superchips, which lets data exceeding GPU limits spill seamlessly into CPU memory with minimal performance loss — allowing JUPITER to hold a far greater quantum state than GPU memory alone, which is what pushed the simulation past the previous 48-qubit record.

For now, that kind of simulation is the most powerful tool quantum research has: today’s quantum hardware can’t yet outperform classical computers on useful problems, so simulating quantum machines at the largest possible scale is how researchers design and stress-test the algorithms that future hardware will run.

This powerful quantum simulator, JUQCS-50, will be accessible to explore the performance of quantum algorithm designs within JUNIQ, the quantum computer user facility at JSC, led by Kristel Michielsen, director of JSC and professor at the University of Cologne. JUQCS-50 turns Europe’s first exascale system into a powerful testbed for tomorrow’s quantum-GPU supercomputers.

Exascale’s Impact

The range of science running on JUPITER — from neurons to atmosphere to wireless infrastructure to quantum — makes a case that exascale computing has moved from a research category into production. 

The results are a proof point for the Grace Hopper platform at the frontier of science.

Learn more about NVIDIA AI for science.

NVIDIA Vera CPU Opens the Way for Agentic Scientific AI at Los Alamos National Laboratory

Mission, Vision and Veritas supercomputers with Vera CPUs to advance materials simulation, scientific AI agents and molecular design.
by

Mission, Vision and Veritas — new Los Alamos National Laboratory (LANL) supercomputers to be built with HPE and NVIDIA — are tapping NVIDIA Vera CPUs to accelerate scientific discovery, unlocking agentic AI for science.

The supercomputers will use the HPE Cray Supercomputing GX5000 architecture with the NVIDIA Vera Rubin platform, combining NVIDIA Vera CPUs, NVIDIA Rubin GPUs and NVIDIA Quantum-X800 InfiniBand networking.

Under the planned configuration, Mission will include NVIDIA Vera Rubin GPU nodes and 2,300 standalone NVIDIA Vera CPUs using the HPE Cray Supercomputing GX240 blade. Veritas will feature approximately 1,150 standalone NVIDIA Vera CPUs to complement NVIDIA Vera Rubin nodes. 

Veritas will arrive alongside Mission and Vision and serve the Laboratory Directed Research and Development program, helping accelerate agentic AI for science. The system will test these technologies for use in larger systems being built out at LANL. 

Researchers are adding a new tool for science with AI agents that can form hypotheses, choose tools, launch simulations, analyze outputs and refine the next step. LANL’s public work on URSA, the Universal Research and Scientific Agent — running on Venado and soon Mission and Vision — points in this direction: a modular, feedback-driven AI framework designed to help scientists brainstorm hypotheses, plan experiments, run simulations and analyze results. 

LANL demonstrated that the Vera CPU delivered 7x higher performance on URSA workloads than the CPUs in the Crossroads x86 supercomputer.

Vera CPU for Agents and Simulation

In LANL’s early testing of NVIDIA Vera CPUs on Branson — an open source Monte Carlo heat transfer simulation tool — Vera outperforms the CPUs used in the Crossroads x86 supercomputer by over 3x. 

These results were made possible by Vera, including its custom Olympus core, LPDDR5 memory and fast on-chip fabric. 

A single Vera CPU outperforms a single socket x86-based CPU by more than 3x while providing more than 4x the memory per core and 6x the memory per node. Ultimately, this means faster  scientific results for LANL.

All of the lab’s supercomputers were codesigned by hardware architects, system software developers, domain scientists, computer scientists and applied mathematicians — helping ensure systems are shaped by real scientific workloads, not abstract benchmarks alone. 

Building on Generations of LANL Systems

Mission, expected to be operational in 2027, will be the fifth Advanced Technology System in the National Nuclear Security Administration’s Advanced Simulation and Computing program and will replace Crossroads for classified national security workloads. 

Vision, also expected to be operational in 2027, will serve as a resource for fundamental science, including materials and nuclear science, energy modeling, biomedical research and AI — letting more scientists test methods, train models and explore ideas before moving into higher-consequence work.

The work extends more than a decade of LANL and NVIDIA’s deep collaboration on CPUs, from Grace to Vera, using extreme codesign for LANL simulation workloads.

The three new supercomputers build on Venado, the HPE Cray EX supercomputer installed at Los Alamos in 2024 with NVIDIA GH200 Grace Hopper Superchips and NVIDIA Grace CPU Superchips. 

Learn more about the NVIDIA Vera CPU.

Hotter Than a Hot Tub: The 45°C Breakthrough to Cool AI’s Biggest Machines

NVIDIA’s latest AI servers can run on coolant warmer than a hot tub — and that counterintuitive choice is one of the biggest efficiency leaps in data center history.
by

Hot tubs sit at about 38 to 40 degrees Celsius, warm enough that most people can only soak for about 15 minutes. NVIDIA’s newest AI servers can run their cooling liquid even hotter — up to 45 degrees Celsius, or 113 degrees Fahrenheit. That higher temperature limit is precisely what makes them more energy efficient.

The Rubin generation of NVIDIA AI infrastructure is the world’s first to achieve 100% liquid cooling — every chip, every networking component, cooled entirely by liquid in a closed loop with no fans anywhere in the system. This liquid cooling methodology is outlined in the NVIDIA DSX AI factory reference design, a guide that outlines best practices to design, build and operate the entire AI factory infrastructure stack.

Although each generation offers significantly more computing power for each watt, full liquid-cooled AI compute infrastructure enables data centers to dramatically reduce cooling energy consumption — making a meaningful difference to overall data center energy use at hyperscale.

“The NVIDIA DSX reference design for AI factories has zero water consumption — we have eliminated massive amounts of power usage and pretty much all water usage,” said Ali Heydari, director of data center cooling and infrastructure at NVIDIA. “With dry-cooler-based designs, it’s a closed-loop system with no evaporative water cooling — outside of maybe 1% of the year when we might need chillers in some climates.”

Historically, cooling alone has accounted for up to 40% of a data center’s electricity consumption, making it one of the most significant areas where efficiency improvements can drive down both operational expenses and energy demands.

Industry estimates suggest that raising chiller plant temperatures by just one degree can cut cooling energy costs by about 4%. At scale, those savings add up quickly. A 50-megawatt hyperscale facility can save over $4 million annually in cooling-related energy and water costs by moving to liquid-cooled infrastructure. 

In favorable climates, NVIDIA’s 45-degree liquid-cooling architecture can enable chiller-less operation with dry coolers, reducing facility cooling water consumption from roughly 2.6 million gallons per megawatt per year for conventional cooling-tower-based systems to near zero — up to a 100% reduction in water use. 

The reason: traditional air-cooled data centers depend on large volumes of cooled air to remove heat from IT equipment, often requiring energy-intensive cooling infrastructure during hot weather. With NVIDIA’s 45-degree liquid cooling, heat is captured directly at the chip and transported through liquid loops operating at much higher temperatures, allowing outdoor dry coolers to reject heat efficiently for much of the year while significantly reducing mechanical cooling requirements and facility water consumption. 

The data center ambient temperature is flexible — warm summer air is fine — because nothing in the server depends on cool air. The liquid does all the work — and the same liquid can be recirculated in a closed loop so no new water is consumed to cool the chips.

 

A New Standard for the Industry

Because the NVIDIA Rubin platform integrates 100% liquid-cooled infrastructure, every cloud provider and data center operator building for it is making the transition. 

The ecosystem is keeping pace. Motivair, the advanced cooling division of Schneider Electric, has worked alongside NVIDIA’s product roadmap for nearly a decade — and Richard Whitmore, its president and CEO, says the relationship only intensified as power densities crossed the threshold where air cooling was no longer a viable option.

“Once the watts per chip crossed a certain level, liquid cooling became mandatory,” said Whitmore.

Too Hot to Cool AI Infrastructure Is Hotter Than You’d Think

There’s a long-standing misconception in the industry that a cold data center is an efficient one. Decades ago, if a data center didn’t feel like a walk-in freezer, people would assume something was wrong. 

In reality, chips can sustain far warmer environments than that instinct suggests. Silicon processors generate enormous internal heat — the coolant entering a fully liquid-cooled chip at 45 degrees Celsius exits at roughly 55 degrees, having absorbed that heat load across the chip surface. Yet performance doesn’t degrade. 

The processors continue to operate at full performance because liquid-cooled cold plates keep device temperatures within validated operating limits, even with coolant entering the rack at 45 degrees Celsius. 

No Fans, No Cold Aisles — A Fundamentally Different Machine

Walk into a traditional data center and notice two things: the noise — cooling fans contribute to total noise levels at or above 85 decibels, loud enough to require ear protection — and the physical choreography of hot aisles and cold aisles, carefully managed to push cooled air across components. 

The Rubin architecture changes the picture.

Coolant — 75% water and 25% propylene glycol — flows through cold plates that sit directly on processors, pulling heat out at the source. Running that coolant at up to 45 degrees Celsius means that in many climates, the facility loop can reject heat without turning on mechanical chillers and noisy fans. 

In an AI factory, coolant flows from a coolant distribution unit to the servers in a closed-loop cyle.

That unlocks something beyond energy savings: the possibility of eliminating water consumption entirely. 

In the right geography — somewhere with reliably cool outdoor air — a liquid-cooled data center can reject its heat through coolant distribution units that capture heat directly at the source and transport it to outdoor dry coolers, essentially large radiator coils positioned outside the building. 

The loop is filled once and runs closed for the life of the facility. And it takes dramatically less space in the AI factory compared to traditional air-cooling infrastructure.

“In the right geographic location, with the right system design, you don’t need any refrigeration equipment,” Whitmore said. “You can just put big radiator coils outside and use the air temperature for all your cooling. It’s incredibly efficient.”

The geography caveat matters. A data center in the Scottish Highlands and one in Phoenix, Arizona, face very different realities. But even in warmer climates, the shift toward 45-degrees-Celsius coolant moves operators significantly closer to that chiller-less ideal — where chillers may turn on just a few days a year when the outside air temperature demands it.

Another key benefit of this new model for AI factories is the potential for waste heat recovery, where residual heat from AI factory operations can be repurposed to heat commercial or residential buildings nearby. 

The Engineering Problem Nobody Had Solved

Previous liquid-cooled servers were hybrid: GPUs and CPUs got cold plates, but the rest of the system stayed air-cooled, with finned heat sinks designed to shed heat into moving air. In a fully liquid-cooled server, the cooling for these components needed to be completely redesigned to use liquid.

NVIDIA’s thermal engineering team reworked how those components handle heat, designing cooling loops that simplify how liquid is routed to multiple high-power chips on the board using a single inlet and outlet, resulting in a cleaner tray-level cooling architecture.

One visible outcome: Rubin servers have clean, sealed front panels where air-cooled servers have perforated bezels. Another: fully liquid cooled servers enable higher rack density than air-cooled servers, so a system that previously occupied six rack units now fits in two — more compute, less space, less noise.

Liquid cooling infrastructure overhead pipes routes into powerful AI servers.

AI workloads are not getting lighter. The compute demand driving data center construction is growing faster than almost any other category of infrastructure investment. 

Without efficiency improvements in how that compute is cooled, the energy cost of running AI at scale would grow in lockstep with the hardware. Liquid cooling at up to 45 degrees Celsius — hotter than a hot tub, cooler for the planet — is one of the most important tools the industry has to close that gap.

Learn more about liquid cooling, the NVIDIA DSX platform for AI factories and NVIDIA’s approach to energy-efficient AI infrastructure.