Pharmaceutical companies have traditionally kept their data close to the vest because collaboration’s side effects may include compromising intellectual property and losing the edge over competitors.
But sharing data has major perks: The more data a pharma company has at its disposal, the better equipped its researchers are to quickly identify and develop promising new drugs. This can ultimately improve drug candidate success rates and reduce treatment costs.
Bringing a drug to market takes on average 13 years and close to $2 billion, said Hugo Ceulemans, project leader of MELLODDY — a new drug-discovery consortium that hopes to eliminate the tradeoff between data sharing and security.
The project will use cloud-based NVIDIA GPUs and a distributed approach known as federated learning to train AI models on data from multiple pharmaceutical companies while preserving IP.
An acronym for Machine Learning Ledger Orchestration for Drug Discovery, MELLODDY brings together 17 partners: 10 leading pharmaceutical companies, such as Amgen, Bayer, GSK, Janssen Pharmaceutica and Novartis; top European universities KU Leuven and the Budapest University of Technology and Economics; four trailblazing startups; and NVIDIA’s AI computing platform.
Each pharmaceutical partner will use its own cluster of NVIDIA V100 Tensor Core GPUs hosted on Amazon Web Services. MELLODDY developers will create a distributed deep learning model that can travel among these distinct cloud clusters, training on annotated data for an unprecedented 10 million chemical compounds.
Individual pharmaceutical companies will be able to finetune the AI model, tailoring it to their specific field of inquiry. As part of the data security mission of MELLODDY, each organization will keep its research projects confidential.
“We’re looking forward to becoming better at virtualizing drug discovery to bring more efficient, efficacious and safer therapies to patients,” said Ceulemans, scientific director of Discovery Data Sciences at Janssen Pharmaceutica. “When it comes to machine learning and data science, there’s no single industry that can afford to stand on the sidelines.”
Federated Learning: A New Frontier
MELLODDY aims to demonstrate how federated learning techniques could give pharmaceutical partners the best of both worlds: the ability to leverage the world’s largest collaborative drug compound dataset for AI training without sacrificing data privacy.
The $20 million project will run for three years, at which point the consortium will share learnings with the public.
Federated learning is a method of decentralized machine learning in which training data doesn’t have to be pooled into a single aggregating server. Instead, the machine learning model learns from data stored at different geographic locations, ensuring that each pharmaceutical company’s private dataset stays within its own secure infrastructure.
“The data is never put at risk,” said Mathieu Galtier, project coordinator for Owkin, a startup developing MELLODDY’s federated learning system. “The data sits in its own GPU server, while the algorithms travel from one to the other for training.”
Pharmaceutical datasets consist of historical information about different chemical compounds and their attributes. With the versatile MELLODDY federated learning model, each partner will be able to create anonymized queries about specific drug compounds. The query will be sent to each of the organization’s data repositories to identify any potential matches.
MELLODDY will also employ a blockchain ledger system so pharmaceutical partners can maintain visibility and control over the use of their datasets.
By enabling pharmaceutical companies to learn from each other’s findings without providing traditional competitors direct access to proprietary datasets, the consortium aims to improve the predictive performance of AI-based drug discovery. With smarter models comes speedier and cheaper drug development.