Discovery Files

CRISPR and single-cell sequencing pinpoint causal genetic variants for traits and diseases

New approach provides roadmap to identify variants and genes, enabling deeper understanding of the noncoding genome and targets for therapies

A major challenge in human genetics is understanding which parts of the genome drive specific traits or contribute to disease risk. This challenge is even greater for genetic variants found in the 98% of the genome that does not encode proteins.

A new approach developed by researchers at New York University and the New York Genome Center combines genetic association studies, gene editing and single-cell sequencing to address these challenges and discover causal variants and genetic mechanisms for blood cell traits.

The U.S. National Science Foundation-supported approach, dubbed STING-seq and published in Science, addresses the challenge of directly connecting genetic variants to human traits and health, and can help scientists identify drug targets for diseases with a genetic basis.

Over the past two decades, genome-wide association studies, or GWAS, have become an important tool for studying the human genome. Using GWAS, scientists have identified thousands of genetic mutations or variants associated with many diseases, from schizophrenia to diabetes, as well as traits such as height. These studies are conducted by comparing the genomes of large populations to find variants that occur more often in those with a specific disease or trait.

GWAS can reveal what regions of the genome and potential variants are implicated in diseases or traits. However, these associations are nearly always found in the 98% of the genome that does not code for proteins, which is less understood than the well-studied 2% of the genome that codes for proteins.

A complication is that many variants are found near each other within the genome and travel together through generations, a concept known as linkage. That can make it difficult to tease apart which variant plays a truly causal role from other variants that are just located nearby. Even when scientists can identify which variant is causing a disease or trait, they do not always know what gene the variant impacts.

"A major goal for the study of human diseases is to identify causal genes and variants, which can clarify biological mechanisms and inform drug targets for these diseases,” said Neville Sanjana at the NYU Grossman School of Medicine and the study's co-senior author.

The research team created a workflow called STING-seq — Systematic Targeting and Inhibition of Noncoding GWAS loci with single-cell sequencing. STING-seq works by taking biobank-scale GWAS and looking for likely causal variants using a combination of biochemical hallmarks and regulatory elements. The researchers then use CRISPR to target each of the regions of the genomes implicated by GWAS and conduct single-cell sequencing to evaluate gene and protein expression. 

In their study, the researchers illustrated the use of STING-seq to discover target genes of noncoding variants for blood traits. Blood traits — such as the percentages of platelets, white blood cells, and red blood cells — are easy to measure in routine blood tests and have been well-studied in GWAS. As a result, the researchers were able to use GWAS representing nearly 750,000 people from diverse backgrounds to study blood traits.

Once the researchers identified 543 candidate regions of the genome that may play a role in blood traits, they used a version of CRISPR called CRISPR inhibition that can silence precise regions of the genome.

After CRISPR silencing of regions identified by GWAS, the researchers looked at the expression of nearby genes in individual cells to see if particular genes were turned on or off. If they saw a difference in gene expression between cells where variants were and were not silenced, they could link specific noncoding regions to target genes. By doing this, the researchers could pinpoint which noncoding regions are central to specific traits (and which ones are not) and often also the cellular pathways through which these noncoding regions work.

The scientists envision these new processes being used to identify causal variants for a wide range of diseases that can either be treated with gene editing — as has been used in sickle cell anemia — or with drugs that target specific genes or cellular pathways.