Clearing the Air: NASA Scientists Use NVIDIA RAPIDS to Accelerate Pollution Forecasts

by Isha Salian

Air quality is a vastly underestimated problem, said NASA research scientist Christoph Keller in a talk at this week’s GTC DC, the Washington edition of NVIDIA’s GPU Technology Conference.

Nine in 10 people breathe polluted air, and millions of deaths a year are attributed to household or outdoor air pollution. Poor air also lowers crop yields, costing billions of dollars in agricultural yield losses annually.

To better understand and forecast air quality, NASA researchers are developing a machine learning model that tracks global air pollution in real time. The model also provides forecasts up to five days in advance that can help government agencies and individuals make decisions.

Keller’s team is using NVIDIA V100 Tensor Core GPUs and NVIDIA RAPIDS data science software libraries to accelerate its machine learning algorithms. The trained model, which uses data from the NASA Center for Climate Simulation to model air pollution formation, can then be plugged into an existing full earth system model to provide global air quality simulations in half the time.

Algorithms Run Like the Wind on RAPIDS, NVIDIA DGX Systems

Satellite observations by NASA and other space agencies collect massive amounts of data about what’s happening on Earth, including detailed measurements of air quality.

This data is fed into NASA’s global air quality model, but the science involved is too complex to process fast enough for real-time insights. GPU-accelerated machine learning can change that, bringing scientists closer to detailed, live air quality maps.

“NASA’s global models quickly produce terabytes of data, and what we’d like to do is train the machine learning model on these huge datasets,” said Keller, who is part of the agency’s Goddard Space Flight Center, in an interview. “That’s where we quickly reached limitations with normal software and hardware, and where I turned to GPUs and RAPIDS software.”

NVIDIA developers collaborated with Keller to accelerate the training of his machine learning models using the cuDF and XGBoost software libraries. Running on three GPU-powered systems, including NVIDIA DGX-1, the team was able to cut down training time from almost a full working day down to seconds — enabling faster iteration.

“Before, you would hit the button and wait six or seven hours to get the results. Even to make a small tweak, you’d have to resubmit it and wait again,” he said. “Speeding up the training cycle was a total game changer for developing the models.”

The scientists’ air quality forecasts are publicly available through NASA, but the team also hopes it will be used by app developers, nonprofits and cities worldwide. Government groups including the Environmental Protection Agency, the State Department and the U.S. Army Public Health Center are also interested in the data as a way to track air quality and provide timely warnings of dangerous air.

These organizations can use NASA data and forecasts to build tools that explain to the public why the air is worse on a specific day, linking air quality index data to pollution episodes such as wildfires, industrial activities, weather or heavy traffic. Governments can also rely on the forecasts to quantify the impact of specific sources of emissions, like an individual power plant.