New Long Read Assemblers for De Novo Genomes Promise Speed, Scalability

GenomeWebAndrew P. Han | Aug 16, 2019

NEW YORK – Newly released algorithms can assemble de novo human genomes from long read sequencing data in just a few hours’ time. 

Shasta, an in-memory computing-driven algorithm developed by researchers at the Chan Zuckerberg Initiative (CZI) and tested by researchers from the University of California, Santa Cruz, can complete a de novo human genome assembly in under six hours, the authors wrote, for an average cost of $70 per sample. 

Using reads generated by the Oxford Nanopore Technologies PromethIon sequencing instrument, the researchers were able to create “near chromosome-level” scaffolds for eleven genomes. While Shasta had less-contiguous assemblies (contig N50s between 19.3 and 37.8 megabases) than some other long read assemblers, Shasta had fewer misassembles, the authors wrote. They posted their study to BioRxiv July 26. 

And earlier in July, two former Pacific Biosciences veterans, working on their own now, described Peregrine, an assembler that uses an indexing scheme to assemble reads that meet certain accuracy and length requirements. Using previously generated datasets of PacBio long reads, the authors reported that they were able to assemble a genome with 30x coverage in 100 minutes wall clock time. The N50 score was greater than 20 megabases. They also posted a preprint to BioRxiv.

Developers for both algorithms said they hoped their assemblers could increase the pace of genomic research and help researchers find new structural variants. 

“Shasta and other tools are cheap and quick, designed with the intent to be on the cloud,” said Benedict Paten, a computational geneticist at UC-Santa Cruz and an author of the Shasta preprint. “They really give us the power to scale out nanopore sequencing. We’re easily talking about assembling hundreds of de novo genomes in the next couple years.”

[ Read more.]