UNIVERSITY PARK, Pa. — Using a new set of tools, an international research collaboration including scientists from Penn State, Rockefeller University and Johns Hopkins University have reconstructed genetic blueprints for 51 species including cats, dolphins, kangaroos, penguins, sharks and turtles. The development deepens our understanding of evolution and the links between humans and animals, according to the researchers who developed novel algorithms and computer software that cut the sequencing time from months — or decades in the case of the human genome — to a matter of days.
The new software and resulting genomes are described in a new paper published in the journal Nature Biotechnology. The lead author is Delphine LaRiviere, assistant research professor of biochemistry and molecular biology in the Penn State Eberly College of Science and the Huck Institute for Life Sciences.
The open-source software is available online via Galaxy, a web-based platform developed at Penn State that offers scientific software for free to the public and supports tens of thousands active users every month.
“Galaxy allows users of all skill levels to perform complex analysis of large amounts of data, including genetic data,” said Anton Nekrutenko, professor of biochemistry and molecular biology in the Penn State Eberly College of Science, co-corresponding author of the new paper and co-developer of Galaxy. “Now we have developed a new pipeline that combines information from multiple genetic sequencing techniques and allows users to assemble nearly complete genomes in record time.”
The team, working with the Vertebrate Genomes Project, sequenced the genomes of 51 vertebrate species, prioritizing those that are useful models for understanding human evolution. The newly assembled genomes may also have implications for human health, as early work in drug development begins in mice and other animal model systems.
Mammals, a subset of vertebrates that includes primates, dogs, cats, mice and humans, share 50% to 99% of the same DNA and nearly all the genes from a common ancestor that lived roughly 200 million years ago. By comparing the complete genomes of these species, researchers can start to identify when and where DNA sequences diverged and the implications of those differences for humans. But, researchers said, this work has been limited by the number and quality of vertebrate genomes available, which has focused on a few key species.
Vertebrate genomes are billions of characters long, too long for any gene sequencing technology to read in one complete pass. Researchers must rely on tools that break down the genome into smaller, easier to read segments. Computer programs then take those segments and determine how they fit together, like pieces of a jigsaw puzzle. But traditional technology was not able to finish the puzzle.
“Have you ever done a massive jigsaw puzzle where at some point all that’s left is blue sky, and you don’t think you’ll ever be able to fit the right pieces together? The old software would basically give up on these hard parts of the genome. That’s the problem with genome assembly,” said Michael Schatz, a Bloomberg Distinguished Professor of computer science and biology at Johns Hopkins and co-corresponding author of the paper. “Our new program, using the latest sequencing data and the latest assembly algorithms, knows how to work through those parts to get a more complete picture.”
To test their technology, researchers mapped the genome of the zebra finch, a songbird that had already been sequenced to study brain development. The new technology was far better at reassembling segments of the genome, creating a more accurate and complete map.
“We plan to continue working with the Vertebrate Genomes Project to sequence the genomes of at least one species across all 275 vertebrate orders,” Nekrutenko said. “This will also support the Earth BioGenome Project’s goal of producing reference genomes for the approximately 1.8 million known eukaryotic species over the next decade.”
Previously, only some research groups would have had access to the resources to assemble these genomes, the researchers said. But now anyone with access to the internet can use Galaxy’s graphical interface to produce this complex information.
“In some ways, we’re building an evolutionary time machine,” Schatz said. “We can trace how vertebrates evolved over time and eventually gave rise to genes and sequences that are uniquely found in humans. Having the genes of our evolutionary cousins mapped out will help us better understand ourselves.”
In addition to LaRiviere and Nekrutenko, the research team involved in this project at Penn State includes software engineer Marius van den Beek and system architect Nate Coraor. The research team was co-led by Erich Jarvis and Giulio Formenti at Rockefeller University and includes researchers from more than 25 institutions worldwide.
Editor’s Note: A version of this story was originally published by Johns Hopkins.