NVIDIA Delivers More Than 6,000x Speedup on Key Algorithm for Hedge Funds

NVIDIA DGX-2 and accelerated Python libraries provide unprecedented speedup for STAC-A3 algorithm used to benchmark backtesting of trading strategies.
by John Ashley

NVIDIA’s AI platform is delivering more than 6,000x acceleration for running an algorithm that the hedge fund industry uses to benchmark backtesting of trading strategies.

This enormous GPU-accelerated speedup has big implications across the financial services industry.

Hedge funds — there are more than 10,000 of them — will be able to design more sophisticated models, stress test them harder, and still backtest them in just hours instead of days. And quants, data scientists and traders will be able to build smarter algorithms, get them into production more quickly and save millions on hardware.

Financial trading algorithms account for about 90 percent of public trading, according to the Global Algorithmic Trading Market 2016–2020 report. Quants, specifically, have grown to about a third of all trading on the U.S. stock markets today, according to the Wall Street Journal.

The breakthrough results have been validated by the Securities Technology Analysis Center (STAC), whose membership includes more than 390 of the world’s leading banks, hedge funds and financial services technology companies.

STAC Benchmark Infographic
Click to view the infographic in full.

NVIDIA demonstrated its computing platform’s capability using STAC-A3, the financial services industry benchmark suite for backtesting trading algorithms to determine how strategies would have performed on historical data.

Using an NVIDIA DGX-2 system running accelerated Python libraries, NVIDIA shattered several previous STAC-A3 benchmark results, in one case running 20 million simulations on a basket of 50 instruments in the prescribed 60-minute test period versus the previous record of 3,200 simulations. This is the STAC-A3.β1.SWEEP.MAX60 benchmark, see the official STAC Report for details.

STAC-A3 parameter-sweep benchmarks use realistic volumes of data and backtest many variants of a simplified trading algorithm to determine profit and loss scores for each simulation. While the underlying algorithm is simple, testing many variants in parallel was designed to stress systems in realistic ways.

According to Michel Debiche, a former Wall Street quant who is now STAC’s director of analytics research, “The ability to run many simulations on a given set of historical data is often important to trading and investment firms. Exploring more combinations of parameters in an algorithm can lead to more optimized models and thus more profitable strategies.”

The benchmark results were achieved by harnessing the parallel processing power of 16 NVIDIA V100 GPUs in a DGX-2 server and Python, which uses NVIDIA CUDA-X AI software along with NVIDIA RAPIDS and Numba machine learning software.

RAPIDS is an evolving set of libraries that simplifies GPU acceleration of common Python data science tasks. Numba allows data scientists to write Python that is compiled into the GPU’s native CUDA, making it easy to extend the capabilities of RAPIDS.

RAPIDS and Numba software make it possible for data scientists and traders to replicate this performance without needing in-depth knowledge of GPU programming.


Feature image credit: Lorenzo Cafaro