AI-generated genetic guides improve CRISPR’s accuracy to target disease-causing superbugs

Illustration depicting CRISPR-Cas9 technologyCRISPR – an acronym for Clustered Regularly Interspaced Short Palindromic Repeats – is used naturally by bacteria to target and disable specific genes, including the toxic gene sequences in bacteria. (National Institutes of Health/Flickr)

By Cam Buchan

Using machine learning, Schulich Medicine & Dentistry researchers, led by David Edgell, PhD, and Greg Gloor, PhD, can more precisely target the gene-editing tool CRISPR-Cas9 as it seeks and destroys disease-causing genes in bacteria.

“CRISPR is often described as a pair of molecular scissors, which is a beautiful visual graphic. But it’s not as easy to work with as it seems,” said Gloor. “Many times, the scissors don't cut where you want at all, or they'll cut someplace else by mistake.”

To address this, the researchers developed a new method to better guide and predict CRISPR’s abilities, with findings recently published in Nature Communications.

These new techniques will be invaluable to solving the problem of antimicrobial resistance. The work represents more than two years of collaboration between PhD candidates Dalton Ham and Tyler Browne. 

The research team pictured together in the labFrom left, David Edgell, PhD, Tyler Browne, PhD candidate, Dalton Ham, PhD candidate, and Greg Gloor, PhD. (Emily Leighton/Schulich Medicine & Dentistry)

Antimicrobial resistance (AMR) happens when antibiotics stop working because bacteria and other microbes become resistant to them.

“Antibiotics are really blunt tools,” said Gloor, chair of the Department of Biochemistry and coauthor of the paper. “They do as much harm as good. They essentially kill everything indiscriminately, but the bacteria that survive are more serious than the ones you started with.”

What is antimicrobial resistance?

Antimicrobial resistance (AMR) happens when antibiotics stop working because bacteria and other microbes become resistant to them. It started mainly because of the overuse and misuse of antibiotics, and has resulted in ineffective treatments, longer hospital stays and higher health costs. Bacterial superbug infections caused by Clostridium difficile (C. diff) and Streptococcus pyogenes (flesh-eating bacteria) are difficult to treat with antibiotics because of AMR. 

Currently, the superbug problem costs the national health-care system $1.4 billion a year, a figure that is projected to grow to $7.6 billion by 2050, killing nearly 400,000 Canadians.

Source: CBC News

 
What is needed is a very specific solution, said Gloor.

“If you're dealing with E. coli OH157:H7, like what happened in Walkerton, you'd really like to target that one bacterium and not affect everything else.”

Winning the war against antimicrobial resistance means overcoming the challenge presented by the environment that bacteria live in called a biofilm – a slimy, sticky layer made up of lots of bacteria clumped together. Gloor described it as, “that slimy film on the bottom of your cat’s water dish.” 

The biofilm acts like a shield protecting the bacteria from harm and making it difficult for antibiotics or our bodies natural defence mechanisms to reach them to fight infections. 

Dalton Ham, PhD candidate, in the labUsing high-quality data generated by PhD candidate Dalton Ham (pictured), the crisprHAL model showed marked improvements in predicting the effectiveness in gRNA accuracy. (Emily Leighton/Schulich Medicine & Dentistry)

CRISPR – an acronym for Clustered Regularly Interspaced Short Palindromic Repeats – is used naturally by bacteria to target and disable specific genes, including the toxic gene sequences in bacteria like E. coli O157:H7. The CRISPR region of a bacterium keeps a record of sequences from encounters with viruses or harmful genes, like a mugshot of past invaders.

A molecule called a guide RNA (gRNA) sends the bacteria-cutting system to the matching sequence of the toxic gene and cuts it out like a pair of genetic scissors.

Edgell describes the gRNA as a GPS signal to direct CRISPR-Cas9 to a precise location.

“Imagine typing “Starbucks” into your car’s GPS. It would try and take you to 15 different Starbucks in London,” said Edgell, professor in Biochemistry. “But, if you give the GPS a little more information, such as the Starbucks at the corner of Richmond and Fanshawe, you would be taken to a single Starbucks. The more precise the information, the better the targeting. The same is true for a gRNA.” 

But it’s not perfect.

Gloor said that, with previous prediction tools, the majority of times locating the target doesn’t work because not all gRNA sequences are active for reasons we don’t understand.

How CRISPR technology works:

  • Bacteria have a system called CRISPR, which acts like molecular scissors to cut the gene at the precise locations as directed by a guide RNA (gRNA). 
  • The resulting cut in the gene’s DNA can disable the gene or disrupt the gene’s function. 


The problem lies in designing the gRNA. Previously, Edgell, Gloor and their trainees used CRISPR to kill Salmonella growing in a biofilm. For this, they used a method called bacterial conjugation – a process that bacteria themselves use to exchange DNA in a biofilm when adapting to their environment, including building resistance to antibiotics. What they found was that conjugation worked extremely well, but the majority of gRNA sequences failed to kill Salmonella as predicted by best-in-class prediction tools available at the time. This study was published in Nature Communications and highlighted by The New York Times in 2019.

“You have to design a lot of gRNAs and figure out which ones will work. The wrong gRNA might target the wrong genes, or cause unwanted mutations; delivery can be tricky and there are limits to length, which can limit its targeting ability,” Gloor said. “Finding the right gRNA is like finding a needle in a haystack.”

What is machine learning? 

  • Machine learning is an application of AI which enables a program (software) to learn from experiences and improve itself at a task with minimal human intervention.
  • For example, a machine learning algorithm may be trained on a data set consisting of thousands of images of flowers that are labeled with their names.
  • The algorithm learns to correctly identify a flower in a new photograph based on the differentiating characteristics it learned from other pictures. 


That’s where machine learning comes in.

Browne developed a machine learning architecture called crisprHAL, named after the HAL 9000 – the sentient computer that controls the systems of the Discovery One spacecraft and interacts with the crew in the 1968 film 2001: A Space Odyssey. crisprHAL can be trained on existing datasets to predict gRNA accuracy rates. When transfer learning was done using smaller amounts of high-quality data, generated by Ham, crisprHAL showed marked improvements in predicting the effectiveness in gRNA accuracy.

Tyler Browne, PhD candidate, in the computer labPhD candidate Tyler Browne developed a machine learning architecture called crisprHAL that can be trained to predict gRNA accuracy rates. (Emily Leighton/Schulich Medicine & Dentistry)

Moreover, the team found that crisprHAL can extend its predictions to different types of bacteria, offering a universal solution across multiple bacteria. The previous generation of prediction tools worked only in the exact species and with the exact assay that they were trained on – crisprHAL predictions seem to generalize to closely related species and to different ways of measuring CRISPR activity.

“Basically, we’re turning what was a coin-flip into a success rate that, in 8 out of 10 times, the gRNA will work the way we predict,” said Gloor. “With its ability to predict gRNAs across multiple bacterial species, predictions from crisprHAL are like having a currency that can be spent in any country.”

The breakthrough enables the targeting of harmful microbes and enhances the ability to engineer genomes with unmatched accuracy.

This paper is part of a series the researchers are creating to provide a comprehensive understanding of the gene-editing technology. 

Authors on the paper included Ham, Browne, Edgell, Pooja Banglorewala and Gloor. This paper was the culmination of many years of work funded by the Canadian Institutes of Health Research (CIHR), the MITACS Accelerate program in collaboration with Tesseraqt Optimization Inc., the Weston Family Foundation, and the Ontario Genomics Institute.