Whole genome sequencing (WGS) refers to the comprehensive examination of a genome by reading and stitching together short fragments to determine an organism’s complete chromosomal (nuclear) and mitochondrial DNA sequence. De novo sequencing refers to sequencing a novel genome when a reference or template sequence is not available. Sequencing reads are assembled as contigs (contiguous consensus sequences from collections of overlapping reads). Once a de novo genome has been completely sequenced, assembled and annotated, a draft or common reference sequence is generated. Focused sequencing approaches such as exome or targeted resequencing are frequently used to determine genomic variations such as single-nucleotide polymorphisms (SNPs), copy number variations (CNVs), re-arrangements and indels. While WGS is well suited on its own to determine genomic variations, sequencing depth afforded by focused or targeted resequencing is currently significantly more cost effective.
Typically, WGS generates a single consensus sequence. This sequence does not distinguish between variants on homologous chromosomes. Genome phasing identifies alleles on both maternal and paternal chromosomes offering haplotype information. Phased sequencing is important in genetic disorders where there are disruptions to alleles in cis and trans positions on a chromosome. It’s ideal in studies where variant linkage and allele expression is important. For phasing applications we recommend a 10X Genomics approach that includes GemCode/Chromium + Illumina HiSeq 2x125 reads. Deliverables of this service include 186 Gb of sequencing data which is approximately a 48x genome.
Targeted sequencing is one of the most popular applications of next generation sequencing. Targeted sequencing can be broken into three different approaches:
We describe the advantages/disadvantages of whole exome sequencing vs. whole genome sequencing in a recent blog post.
See Genohub's up-to-date list of available whole genome sequencing services.
While whole genome sequencing and re-sequencing represent ~90% of all DNA based sequencing applications, it’s important to not lose sight of the myriad of new protocols available to count or detect epi-genomic features. These include genotyping, measuring DNA-protein interactions and epigenetic markers. Several examples of these protocols are listed below:
This Beckman based high throughput library prep kit is designed for use with the Beckman Coulter Biomek FXP liquid handler. The kit contains enough reagents allowing users to construct 96 libraries in less than 6 hours, 3 hours if size selection is not performed. The automated protocol contains three SPRI size selection options for recovering 150-350 bp, 250-450 bp and 350-700 bp insert sizes. Supported sample inputs include at least 1 µg of sheared DNA, genomic DNA, cDNA and amplicons. While the kit cost is reasonable, you will need an upfront investment to purchase the Biomek FXP liquid handler.
Amplification biases and dropouts in coverage in high GC and AT rich genomic regions are the main reasons why users would want to use this kit. While several polymerases claim to decrease gaps in coverage and handle GC/AT rich regions, the standard to which each polymerase is benchmarked is a PCR-free library. Launched in 2011, this kit is based on the Kozarewa et al., 2009 paper which first described the approach. Taking advantage of adapters which contain flow cell and primer binding regions, the user is able to stop the library construction process after adapter ligation. Reduced library bias and gaps in coverage allow users to prepare libraries from difficult, small bacterial genomes to whole-human genomes. To accommodate an amplification free library, the user will have to supply at least 500 ng - 2 µg of genomic DNA. The procedure takes approximately 5 hours, with 4 hours of hands on time. While eliminating amplification may lead you to think the procedure is faster, users are required to perform qPCR post ligation for quantitation. Yields after ligation are typically sub-nanomolar requiring careful pre-flow cell loading dilutions. Users are able to multiplex up to 96 samples using single indexed barcodes (6 or 8 base index).
The NEXTflex Rapid DNA-Seq kit is a faster and more versatile kit compared to its predecessor, the NEXTflex DNA-Seq kit. The kit accommodates DNA inputs between 1 ng to 1 µg and can produce sequenceable libraries in under 2 hours with as few as 6 cycles of PCR. While similar in library completion time to Nextera, the Rapid DNA-Seq kit is ligation based and does not use transposases. The End Repair and Adenylation steps are combined into a single reaction reducing time and bead cleanups. The kit contains 5 bead based size selection options post ligation: 300-400 bp, 350-500 bp, 400-600 bp, 500-700 bp and 650-800 bp. The kit utilizes “enhanced adapter ligation technology” and offers compatibility with clinical FFPE and degraded DNA samples. As with the earlier NEXTflex DNA-Seq kit, this kit is compatible with up to 96 barcoded adapters.
As one of the most widely adopted library preparation kits on the market, the TruSeq Nanokit has been thoroughly validated for use with many different genomic types. The procedure takes about 1 day to perform with ~8 hours of hands on time. Users are able to multiplex up to 24 samples using single indexed barcodes (6 base index) or up to 96 using dual indexed barcodes (8 base indices).
The Nextera XT kit was designed for preparing sequence ready libraries from samples consisting of small genomes (bacteria, archaea, viruses), PCR amplicons and plasmids. Library preparation takes 90 minutes and only requires 1 ng of sample input. The kit uses a single transposase enzymatic reaction to simultaneously fragment and add adapters and recommends as few as 12 cycles of PCR. The kit contains a unique quantification method using beads to normalize library amounts prior to pooling and sequencing. This reduces the need to perform a lengthy qPCR step to measure library concentration. The kit has barcoding options allowing the user to pool up to 96 samples together.
Amplification biases and dropouts in coverage in high GC and AT rich genomic regions are the main reasons why users would want to use this kit. While several polymerases are now claiming to decrease gaps in coverage and handle GC/AT rich regions, the standard to which each polymerase is benchmarked is a PCR-free library. Launched in 2013, this kit is based on the Kozarewa et al., 2009 paper which first described the approach. Taking advantage of adapters which contain flow cell and primer binding regions, the user is able to stop the library construction process after adapter ligation. Reduced library bias and gaps in coverage allow users to prepare libraries from difficult, small bacterial genomes to whole-human genomes. To accommodate an amplification free library, the user will have to supply at least 1-2 µg of genomic DNA. The procedure takes approximately 5 hours, with 4 hours of hands on time. While eliminating amplification may lead you to think the procedure is faster, the mandatory qPCR post ligation compensates for time saved. Yields after ligation are typically sub-nanomolar requiring careful pre-flow cell loading dilutions. Users are able to multiplex up to 24 samples using single indexed barcodes (6 base index) or up to 96 using dual indexed barcodes (8 base indices).
The Kapa DNA kit contains a novel enzyme, Kapa HiFi HotStart DNA Polymerase, which enables amplification across a wide range of genomic species with varying GC content. The enzyme claims to reduce sequence bias, and improve uniform sequence coverage. The kit manual asks for 1- 5 µg of sheared input dsDNA and follows a ligation based approach to adding on adapters and constructing Illumina compatible libraries. Kits are available in 10 and 50 reaction sizes with options to order with their Real Time PCR Library quantification kits. If your sample genome has considerable GC content resulting in dropout in coverage, the Kapa HiFi HotStart DNA Polymerase is a popular and effective option for amplification. Adapter barcodes are not provided with their Illumina compatible kits.
Outperforming its predecessor, the NEBNext DNA library kit, the NEBNext Ultra, reduces library preparation time from 6 to under 3 hours, while allowing as little as 5 ng of input DNA. While the kit does not include barcoded adapters, the kit is compatible with NEB’s Multiplex Oligos for Illumina indexing. The indexed adapters contain a stem loop structure and are ligated immediately after adenylation. The loop of the adapter contains a modified base that is cleaved using NEB’s USER enzyme, revealing primer binding sites for amplification. 24 barcoded adapters are available for multiplexing applications. The kit also contains NEBNext High Fidelity 2X PCR Master Mix which is designed to reduce GC bias.
This kit has been replaced with the NuGen Encore Rapid Library Kit.
The Encore Rapid Library Kit is designed to produce libraries from as little as 100 ng of double stranded DNA or double stranded cDNA without PCR amplification. This workflow makes it compatible with several applications, including RNA-Seq, Digital Gene Expression (DGE), genomic DNA sequencing and amplicon sequencing. The Encore Rapid system is designed for integration into Nugen’s Ovation, RNA-Seq System v2, and Ovation RNA-Seq FFPE systems. The absence of amplification steps makes this protocol suited for analyzing genomes that have high GC content. Multiplexing is possible by purchasing a separate barcode module. This allows the user to multiplex up to 96 samples using inline or dedicated barcode designs.
The Ion Plus Fragment Kit is designed to produce Ion Torrent (PGM) compatible libraries with as little as 100 ng of input DNA. The kit contains a proprietary Ion Shear™ enzyme which enzymatically shears genomic DNA, eliminating the need to do this mechanically. As a result, the procedure can be perform in less than 2 hours.
The PacBio Template Prep Kits require as little as 250 ng of sheared DNA input to create libraries with insert sizes between 250 bp - < 3 Kb and 3 Kb - 10 Kb. The kit utilizes unique universal hairpin adapters (SMRTbell) to ligate onto double stranded DNA fragments. The SMRTbell template preparation method creates a structurally linear and topologically circular DNA morphology enabling consensus sequencing of the same template. Once templates or libraries are constructed, single molecule, real time sequencing can begin.
Paired end Illumina sequencing refers to sequencing both ends of a DNA fragment, while single end or single read sequencing refers to sequencing from one end of a fragment. Single end sequencing is usually sufficient for counting applications. For de novo whole genome sequencing, phased sequencing or targeted sequencing paired-end is recommended as reads are more likely to align better to a reference genome. Paired-end reads form longer contigs for de novo sequencing and help fill in gaps in the consensus sequence. DNA alignments across repetitive regions are improved with paired end reads, as are detection of rearrangements, indels and variants. The cost differences and the importance of paired-end vs. single end for RNA applications can be found in our sequencing guide.
Data from a DNA-Seq run can be delivered as raw or 'analyzed'. Below are the data deliverables you can expect from a whole genome sequencing service: