Carol Song is opening a door for researchers to advance science on Anvil, Purdue University’s new AI-ready supercomputer, an opportunity she couldn’t have imagined as a teenager in China.
“I grew up in a tumultuous time when, unless you had unusual circumstances, the only option for high school grads was to work alongside farmers or factory workers, then suddenly I was told I could go to college,” said Song, now the project director of Anvil.
And not just any college. Her scores on a national entrance exam opened the door to Tsinghua University, home to China’s most prestigious engineering school.
Along the way, someone told her computers would be big, so she signed up for computer science before she had ever seen a computer. She learned soon enough.
“We were building hardware from the ground up, designing microinstructions and logic circuits, so I got to understand computers from the inside out,” she said.
Easing Access to Supercomputers
Skip forward a few years to grad school at the University of Illinois when another big door opened.
While working in distributed systems, she was hired as one of the first programmers at the National Center for Supercomputing Applications, one of the first sites in a U.S. program funding supercomputers that researchers shared.
To make the systems more accessible, she helped develop alternatives to the crude editing tools of the day that displayed one line of a program at a time. And she helped pioneering researchers like Michael Norman create visualizations of their work.
GPUs Add AI to HPC
In 2005, she joined Purdue, where she has helped manage nearly three dozen research projects representing more than $60 million in grants as a senior research scientist in the university’s supercomputing center.
“All that helped when we started defining Anvil. I see researchers’ pain points when they are getting on a new system,” said Song.
Anvil links 1,000 Dell EMC PowerEdge C6525 server nodes with 2,000 of the latest AMD x86 CPUs and 64 NVIDIA A100 Tensor Core GPUs on a NVIDIA Quantum InfiniBand network to handle traditional HPC and new AI workloads.
The system, built by Dell Technologies, will deliver 5.3 petaflops and half a million GPU cycles per year to tens of thousands of researchers across the U.S. working on the National Science Foundation’s XSEDE network.
Anvil Forges Desktop, Cloud Links
To harness that power, Anvil supports interactive user interfaces as well as the batch jobs that are traditional in high performance computing.
“Researchers can use their favorite tools like Jupyter notebooks and remote desktop interfaces so the cluster can look just like in their daily work environment,” she said.
Anvil will also support links to Microsoft Azure, so researchers can access its large datasets and commercial cloud-computing muscle. “It’s an innovative part of this system that will let researchers experiment with creating workflows that span research and commercial environments,” Song said.
Fighting COVID, Exploring AI
More than 30 research teams have already signed up to be early users of Anvil.
One team will apply deep learning to medical images to improve diagnosis of respiratory diseases including COVID-19. Another will build causal and logical check points into neural networks to explore why deep learning delivers excellent results.
“We’ll support a lot of GPU-specific tools like NGC containers for accelerated applications, and as with every new system, users can ask for additional toolkits and libraries they want,” she said.
The Anvil team aims to invite industry collaborations to test new ideas using up to 10 percent of the system’s capacity. “It’s a discretionary use we want to apply strategically to enable projects that wouldn’t happen without such resources,” she said.
Opening Doors for Science and Inclusion
Early users are working on Anvil today and the system will be available for all users in about a month.
Anvil’s opening day has a special significance for Song, one of the few women to act as a lead manager for a national supercomputer site.
“I’ve been fortunate to be in environments where I’ve always been encouraged to do my best and given opportunities,” she said.
“Around the industry and the research computing community there still aren’t a lot of women in leadership roles, so it’s an ongoing effort and there’s a lot of room to do better, but I’m also very enthusiastic about mentoring women to help them get into this field,” she added.
Purdue’s research computing group shares Song’s enthusiasm about getting women into supercomputing. It’s home to one of the first chapters of the international Women in High-Performance Computing organization.
Purdue’s Women in HPC chapter sent an all-female team to a student cluster competition at SC18. It also hosts outside speakers, provides travel support to attend conferences and connects students and early career professionals to experienced mentors like Song.
Pictured at top: Carol Song, Anvil’s principal investigator (PI) and project director along with Anvil co-PIs (from left) Rajesh Kalyanam, Xiao Zhu and Preston Smith.