NSF Stories

Tricky grass genome sequenced after two decades of investment and research

Process of discovery led to new genomic tools, 'reference genome' for future work

Imagine if as you read this article, blocks of text kept rearranging themselves, recombining and repeating, seemingly at random. Now imagine those blocks of text are genes. You might have some idea of the challenges overcome by researchers supported by the National Science Foundation (NSF) who have worked for nearly two decades to sequence the genome of an important grass species.

This breakthrough discovery resulted in genome research techniques applicable to all plants, and a genome sequence that can serve as a sort of template for future sequencing of large, complex grass genomes.

Aegilops tauschii is a type of wild goatgrass found from western Asia all the way to China and the Indian subcontinent. It's one of several grasses whose genetic material combined over thousands of years to become bread wheat, or common wheat, one of the world's most valuable and widely grown crops.

Genes from A. tauschii contributed to bread wheat's tolerance of cold, disease and other stresses - the factors that make both goatgrass and wheat able to grow in so many different places. A. tauschii also added to common wheat the gluten suitable for making bread.

"It made wheat what today is wheat," said Jan Dvořák, biologist at the University of California, Davis and leader of an international team of researchers that published a paper in the journal Nature announcing the genome sequencing.

The problem for the research team were strings of DNA called transposable elements, or transposons. Known as "jumping genes," they can actually change their position in a genome.

Like many other grasses, A. tauschii has a huge genome compared to mammals, including humans, and that genome is loaded with transposons. Scientists believe this construction gives goatgrass an evolutionary advantage. It generates DNA with an extraordinarily high rate of turnover, producing mutations that lead to new biological functions and other potentially beneficial changes. This allows the grass to adapt to changing conditions.

Mammalian genomes evolve comparatively slowly, with a turnover rate in the tens of millions of years. In contrast, over 80 percent of A. tauschii's genome originated in the past three million years.

That massive, transposon-rich genome may be helpful for the plant's survival, but it makes it hard to study. "It's like if someone took a book from you, tore out the pages, and then handed them back to you to reassemble it. But, when you look at the text on the pages, you find that most of the sentences are very similar to each other ," Dvořák said.

Two decades of investment

NSF began funding the research that led to the sequencing of A. tauschii in 1999. That was just three years after scientists successfully completed the first whole-genome sequencing of bacteria, the first whole-genome sequencing in history.

The A. tauschii team was aware of the issues ahead of them, Dvořák said. Very large genomes are a basic biological feature of wheat, rye, barley and other grasses. Size wasn't the only problem, though. Conifers, the class of plants that includes firs, pines and cedars, have bigger genomes but presented fewer sequencing difficulties.

The problem with A. tauschii and similar grasses is their size combined with their high concentration of very similar transposons. Those transposons represented 84 percent of the goatgrass' genome sequence.

Overcoming such challenges meant developing new methods and technology, creating genome sequencing approaches that benefit the entire biological community.

"Everything depends on technology, and it was NSF funding that allowed us over the years to develop new technology - such as genome fingerprinting - that ultimately allowed us to do the genome sequencing of Aegilops tauschii," Dvořák said.

New template for genome sequencing and comparisons

With the A. tauschii sequencing complete, the goatgrass has now become an important "reference genome," opening new possibilities for research. For future grass genome sequencing projects, A. tauschii has now created a kind of template, providing researchers with a better idea of how to study large, transposon-rich genomes, and potentially leading to quicker and easier sequencing.

From an evolutionary biology perspective, the A. tauschii genome sequence will also allow scientists to build a more accurate picture of how bread wheat evolved, and how hybrid plants emerge from their progenitors.

Bread wheat is a hybrid with six sets of chromosomes - two each from the three kinds of grasses it descends from. That hybridization began hundreds of thousands of years ago, but bread wheat got its last two chromosomes from A. tauschii much more recently, as recently as 8,000 years ago.

"This step led to bread wheat, the most widely grown source of food grown globally, with only rice as a close competitor," Dvořák said.

The A. tauschii sequence has immediate usefulness for researchers studying grasses and other agricultural projects. For example, a scientist looking to find the genetic mechanism that causes flowering in a similar species can look at the A. tauschii sequence and get a good idea of where to find it.

By comparing, you can say "Aha, this gene has a similar function and location. It is a good candidate," Dvořák said.

Already, the research team has found one direct application for its work. Over the past two decades, scientists have discovered a wheat disease, first identified in Uganda, that can lead to total crop loss when it strikes. Most worryingly, the disease can overcome the pathogen-resistance in thousands of wheat varieties. And wheat breeders have found difficulties in breeding new, resistant varieties. By looking for clues in the genome sequence, researchers have found two new genes in the genome sequence of A. tauschii for resistance to this disease and have bred them into the wheat genome. Dvořák said

"The applications for this research are just beginning," he said.

Contributors to the research include scientists from USDA-ARS, Albany, California, Johns Hopkins University, Maryland, University of Georgia, Athens in the U.S., and from Germany, Canada, China, U.K., France, and Switzerland.