NSF Stories

Algorithm created by deep learning finds potential therapeutic targets throughout the human genome

Researchers identified sites of methylation that could not be found with existing sequencing methods

Researchers at the New Jersey Institute of Technology and the Children's Hospital of Philadelphia have developed an algorithm through machine learning that helps predict sites of DNA methylation -- a process that can change the activity of DNA without changing its overall structure. The algorithm can identify disease-causing mechanisms that would otherwise be missed by conventional screening methods.

DNA methylation is involved in many key cellular processes and is an important component in gene expression. Errors in methylation are linked with a variety of human diseases.

The computationally intensive research was accomplished on supercomputers supported by the U.S. National Science Foundation through the XSEDE project, which coordinates nationwide researcher access. The results were published in the journal Nature Machine Intelligence.

Genomic sequencing tools are unable to capture the effects of methylation because the individual genes still look the same.

"Previously, methods developed to identify methylation sites in the genome could only look at certain nucleotide lengths at a given time, so a large number of methylation sites were missed," said Hakon Hakonarson, director of the Center for Applied Genomics at Children's Hospital and a senior co-author of the study. "We needed a better way of identifying and predicting methylation sites with a tool that could identify these motifs throughout the genome that are potentially disease-causing."

Children's Hospital and its partners at the New Jersey Institute of Technology turned to deep learning. Zhi Wei, a computer scientist at NJIT and a senior co-author of the study, worked with Hakonarson and his team to develop a deep learning algorithm that could predict where sites of methylation are located, helping researchers determine possible effects on certain nearby genes.

"We are very pleased that NSF-supported artificial intelligence-focused computational capabilities contributed to advance this important research," said Amy Friedlander, acting director of NSF's Office of Advanced Cyberinfrastructure.