It’s a staggering challenge.
The proliferation of malware — malicious software that often targets the mountain of data on computers and mobile devices — poses a huge problem for the information security world, complicated by its taking on new forms and techniques.
Increasingly, firms like Avast Software are addressing the problem with GPUs.
The Czech security vendor has built a GPU-accelerated database that lets it process and analyze millions of samples dramatically faster than traditional tools, Peter Kovac, an Avast senior researcher told attendees during a session at the GPU Technology Conference.
“It allows us to do nearest-neighbor queries, rule-matching queries and classification of unknown records,” Kovac said of the database, dubbed Medusa.
Each day, Avast’s sensor network monitors hundreds of millions of user machines, and identifies hundreds of thousands of potential new malware files. When the network collects what it believes is a suspicious file, it sends it to the company’s submit servers, which pass it on to Medusa. There, the GPUs do their magic, applying queries and classifying files based on the results.
Medusa then stores records of those files to compare against future suspected malware. The database contains nearly 40 million clean files, nearly 3 million recent threats and about 2 million samples for which it hasn’t made a determination.
Kovac said there are “billions of billions” of possible combinations to analyze, which requires a stochastic approach. Because of the huge, random nature of the possibilities, Medusa comes up with a typical representative of a cluster that uses 60 to 80 percent of its attributes. The goal is to find a workable subset of conditions so Avast can maximize its success in identifying malware in real time.
Without GPUs, none of this would be possible because of the enormous amount of data in question.
“GPUs do the heavy lifting,” Kovac said.
How heavy? How about rule-matching queries happening 22X faster than on a CPU, rule generation that’s 20X faster and nearest-neighbor queries that happen 13X faster?
The results speak for themselves.
“Hundreds of results from the rule generator are released daily,” said Kovac. In other words, millions of users’ computers are spared.