AI’s New Onramp: Meet the Data Science PC

by Jesse Clayton
data science PC

The trip to AI and big-data analytics is now just a click away. Starting today, three NVIDIA partners are selling online a new class of computers we call data science PCs.

The systems bundle the hardware and software data scientists need to hit an “on” button and start managing datasets and models to make AI predictions. Data science PCs tap NVIDIA TITAN RTX GPUs and NVIDIA RAPIDS software to deliver 3-6x speed-ups compared to CPU-only desktops.

Three experts in building high-end PCs — Digital Storm, Maingear and Puget Systems — are offering the products now. They’re targeting an expanding class of independent data scientists to help them achieve better results faster.

data science PC benchmark
A data science PC handled extract-transform-load (ETL) and XGBoost training on a dataset derived from New York City taxis, delivering end-to-end predictions in one-sixth the time of a CPU-only desktop.

Some of the world’s largest and most innovative organizations are already using GPU-accelerated servers and workstations to tackle their demanding data-science jobs.

For example, Walmart’s supermarket of the future that can compute in real time more than 1.6 terabytes of data generated per second using NVIDIA’s EGX platform. The Summit system at Oak Ridge National Laboratory can tap its 27,648 NVIDIA V100 Tensor Core GPUs to drive 3.3 exaflops of mixed-precision horsepower on AI tasks.

But data science isn’t just for large enterprises. Startups, researchers, students and enthusiasts are jumping into this burgeoning field. They’re contributing to the corporate momentum making the role of data scientist one of the fastest growing jobs in the U.S.

The data science PC aims to fuel this growing class of independent data science practitioners. The combination of powerful, pre-configured systems and a tested software stack can jumpstart their work.

The Speeds and Feeds

Under the hood, a data science PC includes one or two TITAN RTX GPUs, each with up to 24GB of memory. NVLink high-speed interconnect technology connects the two GPUs to tackle datasets that demand more GPU memory.

The systems can accommodate 48-128GB of main memory and storage options include drives that range up to 10TB.

Each data science PC will ship with Linux and NVIDIA RAPIDS, NVIDIA’s data science software stack pre-built with more than 200 libraries for end-to-end data science.

NVIDIA RAPIDS eases the job of porting existing code for GPU acceleration. Its APIs are modeled after popular libraries used in data science. In many cases, it’s only necessary to change a few lines of code in order to tap the potential of GPU acceleration.

Here are some of the key elements of RAPIDS:

  • cuDF is a Python GPU data-frame library for loading, joining, aggregating, filtering and otherwise manipulating data. The API is designed to be similar to Pandas, so existing code easily maps to the GPU.
  • cuML accelerates popular machine learning algorithms, including XGBoost, PCA, K-means, k-Nearest Neighbors and more. It is closely aligned with sciKit-learn.
  • cuGraph is a library of graph algorithms, similar to NetworkX, that works with data stored in a GPU data frame.

An ecosystem of startups in Inception, NVIDIA’s virtual accelerator program for startups focused on AI and data science, provides applications and services that run on top of RAPIDS. They include companies, such as Graphistry and OmniSci, that offer big-data visualization tools.

Data scientists can also use NVIDIA’s data science developer forum to ask questions and learn more about data science on GPUs.

The data science PC is here, ready to propel you to an AI future. Learn more from our partners Digital Storm, Maingear and Puget Systems.