Skip to Main Content
Meeting the challenges of non-referenced genome assembly from short-read sequence dataAuthor(s): M. Parks; A. Liston; R. Cronn
Source: Acta Horticulturae. 859: 323-332
Publication Series: Scientific Journal (JRNL)
Station: Pacific Northwest Research Station
PDF: View PDF (2.48 MB)
DescriptionMassively parallel sequencing technologies (MPST) offer unprecedented opportunities for novel sequencing projects. MPST, while offering tremendous sequencing capacity, are typically most effective in resequencing projects (as opposed to the sequencing of novel genomes) due to the fact that sequence is returned in relatively short reads. Nonetheless, there is great interest in applying MPST to genome sequencing in non-model organisms. We have developed a bioinformatics pipeline to assemble short-read sequence data into nearly complete chloroplast genomes using a combination of de novo and reference-guided assembly, while decreasing reliance on a reference genome. Initially, short-read sequences are assembled into larger contigs using de novo assembly. De novo contigs are then aligned to the corresponding reference genome of the most closely related taxon available and merged to form a consensus sequence. The consensus sequence and reference are in turn 'merged' such that aligned de novo sequence remains unaffected while missing sequence is filled in using the reference sequence. This chimeric reference is then utilized in reference-guided assembly to align the original short-data, resulting in a draft plastome. Using two established Pinus reference plastomes, our method has been effective in the assembly of 33 chloroplast genomes within the genus Pinus, and results with four species representing other genera of Pinaceae suggest the method will be of general use in land plants, particularly once limitations of PCR-based chloroplast enrichment are overcome.
- You may send email to firstname.lastname@example.org to request a hard copy of this publication.
- (Please specify exactly which publication you are requesting and your mailing address.)
- We recommend that you also print this page and attach it to the printout of the article, to retain the full citation information.
- This article was written and prepared by U.S. Government employees on official time, and is therefore in the public domain.
CitationParks, M.; Liston, A.; Cronn, R. 2010. Meeting the challenges of non-referenced genome assembly from short-read sequence data. Acta Horticulturae. 859: 323-332.
Keywordsnext-generation sequencing, massively parallel sequencing, Pinus, Illumina
- Pyrosequencing of the northern red oak (Quercus rubra L.) chloroplast genome reveals high quality polymorphisms for population management
- Multiplexed fragaria chloroplast genome sequencing
- Multiplex sequencing of plant chloroplast genomes using Solexa sequencing-by-synthesis technology
XML: View XML