Researchers advance AI in the hunt for new drugs

AI-generated illustration of a computer chip in a pill
New research is bridging the gap between traditional biological research and modern AI technology. (AI-generated image via Adobe Firefly)
By Emily Leighton

Can AI predict the next life-saving drug?

The answer may lie in the labyrinth of ZINC 250K.

A free dataset containing over 250,000 molecules, ZINC 250K is frequently used in machine learning experiments. It teaches AI to decode molecular structures, pinpoint potential drug candidates and predict their success.

For Pingzhao Hu, PhD, and other computational biologists, it is an ideal training ground in the quest to streamline the drug discovery process.

Hu, associate professor with the Department of Biochemistry, and his research team are developing AI models to identify promising new drug candidates for study.

Pingzhao Hu, PhDPingzhao Hu, PhD, is developing AI models to identify promising new drug candidates for study. (Mac Lai/Schulich Medicine & Dentistry)

Their latest model, called GraphBAN, predicts how chemical compounds, like potential drugs, interact with proteins, the complex molecules that control functions in the body. It does this using a graph-based approach, representing molecules as networks of connected parts – similar to a web or a map.

By analyzing biological properties (how proteins behave) and chemical properties (how molecules are structured and react), GraphBAN can identify patterns and make accurate predictions.

Published recently in Nature Communications in collaboration with researchers at the University of Manitoba, what sets this advanced model apart is its ability to incorporate new, never-before-seen compounds.

Typically, AI models learn from existing data and struggle when faced with something new.

GraphBAN can recognize general patterns in how compounds and proteins interact – instead of simply memorizing past interactions, it learns the underlying rules and applies them to new cases.

“GraphBAN’s ability to make these predictions without needing prior interaction data is crucial for discovering new drug candidates and understanding how they might work in the body,” explained Hu, Canada Research Chair in Computational Approaches to Health Research.

This is where ZINC 250K enters the scene. Hu and his team used the dataset to validate GraphBAN’s accuracy by asking which of its compounds would interact with Pin1, a protein linked to cancer.

The catch? The model had never seen the ZINC 250K compounds or the Pin1 protein before.

The researchers validated the predicted interactions using a traditional, non-AI bioinformatics pipeline, demonstrating GraphBAN’s predictive prowess.

“This real-world test confirms that GraphBAN can effectively predict interactions between previously unseen compounds and proteins, highlighting its value as a tool for drug discovery,” said Hu.

The model can be used for large-scale drug screening, repurposing existing drugs for new diseases, or analyzing newly discovered proteins to determine their potential role in disease.

“This versatility makes GraphBAN valuable in many biomedical applications,” said Hu. “From identifying treatments for emerging health threats, like viral outbreaks, to advancing personalized medicine by helping tailor treatments to a patient’s unique genetic makeup.”

GraphBAN’s source code is open access, meaning other researchers can benefit and build on the work.

Members of the Hu LabPingzhao Hu (front right) and members of his research team. (supplied)

Hu's team is also tackling antibiotic discovery – currently a lengthy and expensive process – with another AI model.

Called CL-MFAP, this model learns by looking at molecules from three perspectives: their chemical formula, 2D structural graph and ‘fingerprint’ patterns. It was trained on a vast collection of molecules, allowing it to recognize key features and relationships.

Using a technique called contrastive learning, the model compares this information to identify key patterns in the molecules that distinguish effective antibiotic compounds from ineffective ones.

“CL-MFAP aims to identify promising antibiotic candidates faster and more cost-effectively than traditional methods,” said Hu.

The team is currently refining this model to identify compounds that can target harmful bacteria associated with inflammatory bowel disease, such as E. coli.

This research has been accepted as a conference paper at the prestigious International Conference on Learning Representations (ICLR), one of the world’s top AI conferences.

From Hu’s perspective, the two advanced AI models complement one another well. CL-MFAP, a powerful foundation model, captures rich molecular patterns and insights that GraphBAN can use to make better predictions, making the overall system more powerful and reliable.

“By bridging the gap between traditional biological research and modern AI technologies, these models offer more efficient, cost-effective alternatives to experimental drug testing,” said Hu. “This will speed up the process of discovering new therapies and improving precision treatments.”