Lab Canada

Deep learning finds autism, cancer mutations in unexplored regions of the genome

Toronto, ON – Scientists and engineers have built a computer model that has uncovered disease-causing mutations in large regions of the genome that previously could not be explored. Their method seeks out mutations that cause changes in ‘gene splicing,’ and has revealed unexpected genetic determinants of autism, colon cancer and spinal muscular atrophy.

CIFAR senior fellow Brendan Frey, also a professor at the University of Toronto’s Donnelly Centre for Cellular & Biomolecular Research, is the lead author on a paper describing this work, which appeared in the December 18 edition of Science Express. The paper was co-authored by CIFAR senior fellows Timothy Hughes (University of Toronto) and Stephen Scherer (the Hospital for Sick Children and the University of Toronto) of the Genetic Networks program. Frey is appointed to the Genetic Networks program, and the Neural Computation & Adaptive Perception program. The research combines the latter groups’ pioneering work on deep learning with novel techniques in genetics.

Most existing methods examine mutations in segments of DNA that encode protein, which Frey refers to as low-hanging fruit. To find mutations outside of those segments, typical approaches such as genome-wide association studies take disease data and compare the mutations of sick patients to those of healthy patients, seeking out patterns. Frey compares the approach to lining up all the books your child likes to read and looking for whether a particular letter occurs more frequently than in other books.

“It doesn’t work, because it doesn’t tell you why your kid likes the book,” he says. “Similarly, genome-wide association studies can’t tell you why a mutation is problematic.”

But looking at splicing can do that. Splicing is important for the vast majority of genes in the human body. When mutations alter splicing, genes may produce no protein, the wrong one or some other problem, which could lead to disease.

Frey’s team, which includes researchers from engineering, biology and medicine, developed a computer model that mimics how the cell directs splicing by detecting patterns within DNA sequences, called the ‘splicing code’. The researchers then used their system to examine mutated DNA sequences and determine what effects the mutations would have, effectively scoring each mutation. Unlike existing methods, their technique provides an explanation for the effect of a mutation and can be used to find mutations outside of segments that code for protein.

To develop the computer model, Frey’s team fed experimental data into machine learning algorithms, in order to teach the computer how to examine a DNA sequence and output the splicing pattern.

Their method worked surprisingly well and has led to new discoveries. For example, using DNA sequences from five patients with autism provided by Scherer, the model was able to identify 39 new genes that could be implicated in autism spectrum disorder, a significant increase from about 100 previously known autism genes.

“Brendan’s work is groundbreaking because it represents a first serious attempt to decode the portions of that 98 percent of the human genome outside the genes that are typically studied in genetic disease studies,” says Scherer. “This is particularly exciting since it is thought these segments of DNA may contain much of the missing information that we have been looking for in studies like autism.”

Scherer and Frey began collaborating at CIFAR meetings five years ago and say they intend to use this model to analyze the genomes of 10,000 families with autism as part of the MSSNG study. The paper also sheds light on the genetic mechanisms that lead to spinal muscular atrophy, a leading cause of infant death, and nonpolyposis colorectal cancer.

Frey says his involvement in two CIFAR programs was crucial in making connections and in developing interdisciplinary expertise among his graduate students and postdoctoral fellows, including co-authors Hui Xiong, Babak Alipanahi, Leo Lee and Hannes Bretschneider. Also involved were Ben Blencowe of the University of Toronto and Nebojsa Jojic of Microsoft Research.

“My participation in the Neural Computation & Adaptive Perception program enabled my group to have access to the best techniques in deep learning,” he says. He adds that his interactions with members of the Genetic Networks program challenged him to take on some of the toughest questions in genetics.

“Many of us will soon know our complete human genome sequence, which will be like having an encyclopedic guide to ourselves that is written in an alien language,” says Frederick Roth, CIFAR senior fellow and co-director of the program in Genetic Networks. “This work promises to interpret the impact of mutations in a broader region of our genome than has been previously possible.”