Whole Genome Sequencing and Re-sequencing Guide

Library and data analysis recommendations, kits, services and costs

⬅️ NGS Handbook

Whole Genome Sequencing (WGS)

Whole genome sequencing (WGS) refers to the comprehensive examination of a genome by reading and stitching together short fragments to determine an organism’s complete chromosomal (nuclear) and mitochondrial DNA sequence. De novo sequencing refers to sequencing a novel genome when a reference or template sequence is not available. Sequencing reads are assembled as contigs (contiguous consensus sequences from collections of overlapping reads). Once a de novo genome has been completely sequenced, assembled and annotated, a draft or common reference sequence is generated. Focused sequencing approaches such as exome or targeted resequencing are frequently used to determine genomic variations such as single-nucleotide polymorphisms (SNPs), copy number variations (CNVs), re-arrangements and indels. While WGS is well suited on its own to determine genomic variations, sequencing depth afforded by focused or targeted resequencing is currently significantly more cost effective.

Phased sequencing

Typically, WGS generates a single consensus sequence. This sequence does not distinguish between variants on homologous chromosomes. Genome phasing identifies alleles on both maternal and paternal chromosomes offering haplotype information. Phased sequencing is important in genetic disorders where there are disruptions to alleles in cis and trans positions on a chromosome. It’s ideal in studies where variant linkage and allele expression is important. For phasing applications we recommend a 10X Genomics approach that includes GemCode/Chromium + Illumina HiSeq 2x125 reads. Deliverables of this service include 186 Gb of sequencing data which is approximately a 48x genome.

Targeted re-sequencing

Targeted sequencing is one of the most popular applications of next generation sequencing. Targeted sequencing can be broken into three different approaches:

  1. Exome sequencing
  2. Amplicon based targeting of genes
  3. Probe based hybridization and targeting of genes

We describe the advantages/disadvantages of whole exome sequencing vs. whole genome sequencing in a recent blog post.

See Genohub's up-to-date list of available whole genome sequencing services.


Other Applications of DNA-Seq


While whole genome sequencing and re-sequencing represent ~90% of all DNA based sequencing applications, it’s important to not lose sight of the myriad of new protocols available to count or detect epi-genomic features. These include genotyping, measuring DNA-protein interactions and epigenetic markers. Several examples of these protocols are listed below:

DNA-protein interactions
  • DNAse-Seq
  • MNAse-Seq
  • X-ChIP
  • ChIP-Seq
  • FAIRE-Seq
  • ATAC-Seq
  • Chia-PET
  • Hi-C
  • 3-C, 4-C, 5-C
  • Capture-C
  • HiTS-FLIP
Epigenetics
  • Bisulfite-Seq
  • Methyl-Seq
  • RRBS
  • PBAT
  • Me-DIP
  • oxBS-Seq
  • TAB-Seq
  • MBDCap-Seq
  • BisChIP-Seq
Genotyping
  • RAD-Seq
  • ddRAD-Seq
  • nextRAD
  • Capture-Seq
  • ezRAD
Low input DNA-Seq
  • MDA
  • DOP-PCR
  • Os-Seq
  • MALBAC
  • Nuc-Seq

DNA-Seq Workflow


  1. DNA Extraction- The first step in any DNA-seq workflow is the process of purifying DNA from a cellular, plasma, viral or microbiome samples. Isolation of DNA must be optimized so that the purified product has high yield, purity and integrity. Methods to extract DNA from a sample can be broken down into the following categories:
    • Organic extraction
    • Silica membrane
    • Filter plate
    • Magnetic beads
    The DNA extraction method used is critical, especially for hard to extract samples. Specialized procedures are available for DNA extraction from Buccal swabs, cultured cells, tissue, food and feed, cell free DNA in plasma, and formalin-fixed, paraffin-embedded (FFPE) samples. If you need help, fill out our complimentary consultation form and we'll be happy to offer our recommendations.

  2. DNA QC- Once DNA has been extracted it needs to be measured and quantified. Measuring the concentration of DNA is usually performed on a spectrophotometer or a fluorescent detection system (Qubit). DNA is typically run on an agarose gel to examine size and integrity. In lieu of agarose gels, several microfluidic instruments are available and produce an electropherogram plot of concentration, yield and size. Examples include the Bioanalyzer, TapeStation, LabChip and Fragment Analyzer.

  3. Ordering Sequencing and/or Library Prep Services- Quotes for DNA-Seq services can be obtained instantly on Genohub.

  4. DNA Sample Submission- Typically 100 to 1000 nanograms of DNA are required for whole genome or whole exome sequencing. Targeted panels or amplicon based sequencing can use as little as 1 to 10 ng of input material. Other applications will have specific input requirements. See our guide for recommendations on shipping DNA samples.

  5. DNA Library Preparation- DNA or fragment DNA library preparation methods are available for sequencing whole genomes, as well as targeted regions within genomes, e.g. ChIP-seq, Methy-seq and Amplicon-seq. The two main DNA library preparation methods involve either ligation, where adapters are ligated on to end-repaired inserts or by transposition where a transposase simultaneously fragments and tags DNA in a single reaction, called “tagmentation” (Nextera chemistry). Tagmentation improves upon ligation based methods by combining several library prep steps into one reaction. The protocol is very sensitive to the amount and length of starting DNA used. Conditions such as temperature and reaction time must be tightly controlled and attention must be paid to biases introduced by any enzymatic protocol. Library preparation protocols that employ both ligation and tagmentation are described below.

  6. Sequencing- Parameters for your sequencing run will depend on your experiment. As a general recommendation, for whole genome sequencing we recommend at least 30x coverage of a human genome using a minimum of 2x150 bp reads. PacBio or Roche 454 reads on top of short Illumina reads are useful for obtaining longer contigs and closing gaps in a genome. See our coverage guide for more information.

  7. Data Analysis- Data analysis requirements vary based on your application. They range from processing sequencing reads from an instrument to data aggregation and mining of data across multiple sample types. Data analysis can be categorized into three broad stages of primary, secondary and tertiary analysis. To learn more about the types of DNA data analysis available we recommend reading our bioinformatics data analysis listings page


  8. DNA-Seq Library Preparation Kits


    Beckman SPRIworks HT (Illumina-compatible)

    This Beckman based high throughput library prep kit is designed for use with the Beckman Coulter Biomek FXP liquid handler. The kit contains enough reagents allowing users to construct 96 libraries in less than 6 hours, 3 hours if size selection is not performed. The automated protocol contains three SPRI size selection options for recovering 150-350 bp, 250-450 bp and 350-700 bp insert sizes. Supported sample inputs include at least 1 µg of sheared DNA, genomic DNA, cDNA and amplicons. While the kit cost is reasonable, you will need an upfront investment to purchase the Biomek FXP liquid handler.

    Protocol Overview:

    1. End repair
    2. Adenyation
    3. Ligation
    4. Optional bead size selection
    5. PCR (Automation optional, T-robot on Biomek FXP)

    Bioo Scientific NEXTflex PCR-Free (Illumina-compatible)

    Amplification biases and dropouts in coverage in high GC and AT rich genomic regions are the main reasons why users would want to use this kit. While several polymerases claim to decrease gaps in coverage and handle GC/AT rich regions, the standard to which each polymerase is benchmarked is a PCR-free library. Launched in 2011, this kit is based on the Kozarewa et al., 2009 paper which first described the approach. Taking advantage of adapters which contain flow cell and primer binding regions, the user is able to stop the library construction process after adapter ligation. Reduced library bias and gaps in coverage allow users to prepare libraries from difficult, small bacterial genomes to whole-human genomes. To accommodate an amplification free library, the user will have to supply at least 500 ng - 2 µg of genomic DNA. The procedure takes approximately 5 hours, with 4 hours of hands on time. While eliminating amplification may lead you to think the procedure is faster, users are required to perform qPCR post ligation for quantitation. Yields after ligation are typically sub-nanomolar requiring careful pre-flow cell loading dilutions. Users are able to multiplex up to 96 samples using single indexed barcodes (6 or 8 base index).

    Protocol Overview:

    1. Acoustic DNA shearing
    2. End-repair
    3. Adenylation
    4. Ligation
    5. Gel-free or gel size-selection
    6. qPCR quantitation

    Bioo Scientific NEXTflex Rapid DNA-Seq (Illumina-compatible)

    The NEXTflex Rapid DNA-Seq kit is a faster and more versatile kit compared to its predecessor, the NEXTflex DNA-Seq kit. The kit accommodates DNA inputs between 1 ng to 1 µg and can produce sequenceable libraries in under 2 hours with as few as 6 cycles of PCR. While similar in library completion time to Nextera, the Rapid DNA-Seq kit is ligation based and does not use transposases. The End Repair and Adenylation steps are combined into a single reaction reducing time and bead cleanups. The kit contains 5 bead based size selection options post ligation: 300-400 bp, 350-500 bp, 400-600 bp, 500-700 bp and 650-800 bp. The kit utilizes “enhanced adapter ligation technology” and offers compatibility with clinical FFPE and degraded DNA samples. As with the earlier NEXTflex DNA-Seq kit, this kit is compatible with up to 96 barcoded adapters.

    Protocol Overview:

    1. End repair & adenylation
    2. Ligation
    3. PCR

    Illumina TruSeq Nano Kit

    As one of the most widely adopted library preparation kits on the market, the TruSeq Nanokit has been thoroughly validated for use with many different genomic types. The procedure takes about 1 day to perform with ~8 hours of hands on time. Users are able to multiplex up to 24 samples using single indexed barcodes (6 base index) or up to 96 using dual indexed barcodes (8 base indices).

    Protocol Overview:

    1. Acoustic DNA shearing
    2. End-repair
    3. Adenylation
    4. Ligation
    5. Gel-free or gel size selection
    6. PCR
    7. QC

    Illumina Nextera XT

    The Nextera XT kit was designed for preparing sequence ready libraries from samples consisting of small genomes (bacteria, archaea, viruses), PCR amplicons and plasmids. Library preparation takes 90 minutes and only requires 1 ng of sample input. The kit uses a single transposase enzymatic reaction to simultaneously fragment and add adapters and recommends as few as 12 cycles of PCR. The kit contains a unique quantification method using beads to normalize library amounts prior to pooling and sequencing. This reduces the need to perform a lengthy qPCR step to measure library concentration. The kit has barcoding options allowing the user to pool up to 96 samples together.

    Protocol Overview:

    1. Tagmentation of genomic DNA
    2. PCR amplification
    3. Library normalization and pooling

    Illumina TruSeq DNA PCR-Free

    Amplification biases and dropouts in coverage in high GC and AT rich genomic regions are the main reasons why users would want to use this kit. While several polymerases are now claiming to decrease gaps in coverage and handle GC/AT rich regions, the standard to which each polymerase is benchmarked is a PCR-free library. Launched in 2013, this kit is based on the Kozarewa et al., 2009 paper which first described the approach. Taking advantage of adapters which contain flow cell and primer binding regions, the user is able to stop the library construction process after adapter ligation. Reduced library bias and gaps in coverage allow users to prepare libraries from difficult, small bacterial genomes to whole-human genomes. To accommodate an amplification free library, the user will have to supply at least 1-2 µg of genomic DNA. The procedure takes approximately 5 hours, with 4 hours of hands on time. While eliminating amplification may lead you to think the procedure is faster, the mandatory qPCR post ligation compensates for time saved. Yields after ligation are typically sub-nanomolar requiring careful pre-flow cell loading dilutions. Users are able to multiplex up to 24 samples using single indexed barcodes (6 base index) or up to 96 using dual indexed barcodes (8 base indices).

    Protocol Overview:

    1. Acoustic DNA shearing
    2. End-repair
    3. Adenylation
    4. Ligation
    5. Gel-free or gel size selection
    6. qPCR quantitation

    Kapa DNA Library (Illumina-compatible)

    The Kapa DNA kit contains a novel enzyme, Kapa HiFi HotStart DNA Polymerase, which enables amplification across a wide range of genomic species with varying GC content. The enzyme claims to reduce sequence bias, and improve uniform sequence coverage. The kit manual asks for 1- 5 µg of sheared input dsDNA and follows a ligation based approach to adding on adapters and constructing Illumina compatible libraries. Kits are available in 10 and 50 reaction sizes with options to order with their Real Time PCR Library quantification kits. If your sample genome has considerable GC content resulting in dropout in coverage, the Kapa HiFi HotStart DNA Polymerase is a popular and effective option for amplification. Adapter barcodes are not provided with their Illumina compatible kits.

    Protocol Overview:

    1. End repair
    2. A-tailing
    3. Adapter ligation
    4. PCR

    NEB NEBNext Ultra DNA (Illumina-compatible)

    Outperforming its predecessor, the NEBNext DNA library kit, the NEBNext Ultra, reduces library preparation time from 6 to under 3 hours, while allowing as little as 5 ng of input DNA. While the kit does not include barcoded adapters, the kit is compatible with NEB’s Multiplex Oligos for Illumina indexing. The indexed adapters contain a stem loop structure and are ligated immediately after adenylation. The loop of the adapter contains a modified base that is cleaved using NEB’s USER enzyme, revealing primer binding sites for amplification. 24 barcoded adapters are available for multiplexing applications. The kit also contains NEBNext High Fidelity 2X PCR Master Mix which is designed to reduce GC bias.

    Protocol:

    1. End repair / dA-Tailing
    2. Adapter ligation
    3. USER adapter cleavage
    4. PCR

    NuGEN Encore NGS Library

    This kit has been replaced with the NuGen Encore Rapid Library Kit.


    NuGEN Encore Rapid Library (Illumina-compatible)

    The Encore Rapid Library Kit is designed to produce libraries from as little as 100 ng of double stranded DNA or double stranded cDNA without PCR amplification. This workflow makes it compatible with several applications, including RNA-Seq, Digital Gene Expression (DGE), genomic DNA sequencing and amplicon sequencing. The Encore Rapid system is designed for integration into Nugen’s Ovation, RNA-Seq System v2, and Ovation RNA-Seq FFPE systems. The absence of amplification steps makes this protocol suited for analyzing genomes that have high GC content. Multiplexing is possible by purchasing a separate barcode module. This allows the user to multiplex up to 96 samples using inline or dedicated barcode designs.

    Protocol:

    1. Fragment
    2. End-repair
    3. Add adapters and ligate
    4. Final repair

    Life Tech Ion Plus Fragment (for the Ion Torrent platform)

    The Ion Plus Fragment Kit is designed to produce Ion Torrent (PGM) compatible libraries with as little as 100 ng of input DNA. The kit contains a proprietary Ion Shear™ enzyme which enzymatically shears genomic DNA, eliminating the need to do this mechanically. As a result, the procedure can be perform in less than 2 hours.

    Protocol:

    1. Enzymatic shearing of DNA
    2. End-repair
    3. Blunt ended ligation of barcoded adapters
    4. Nick repair
    5. PCR amplification

    PacBio DNA Template Prep (for the PacBio platform)

    The PacBio Template Prep Kits require as little as 250 ng of sheared DNA input to create libraries with insert sizes between 250 bp - < 3 Kb and 3 Kb - 10 Kb. The kit utilizes unique universal hairpin adapters (SMRTbell) to ligate onto double stranded DNA fragments. The SMRTbell template preparation method creates a structurally linear and topologically circular DNA morphology enabling consensus sequencing of the same template. Once templates or libraries are constructed, single molecule, real time sequencing can begin.

    Protocol:

    1. Fragmentation
    2. End-repair
    3. A-tailing
    4. Ligation of adapters
    5. Annealing of sequencing primer to templates
    6. Polymerase binding


    When to use paired-end or single reads in DNA-Seq applications


    Paired end Illumina sequencing refers to sequencing both ends of a DNA fragment, while single end or single read sequencing refers to sequencing from one end of a fragment. Single end sequencing is usually sufficient for counting applications. For de novo whole genome sequencing, phased sequencing or targeted sequencing paired-end is recommended as reads are more likely to align better to a reference genome. Paired-end reads form longer contigs for de novo sequencing and help fill in gaps in the consensus sequence. DNA alignments across repetitive regions are improved with paired end reads, as are detection of rearrangements, indels and variants. The cost differences and the importance of paired-end vs. single end for RNA applications can be found in our sequencing guide.


    Data analysis expectations


    Data from a DNA-Seq run can be delivered as raw or 'analyzed'. Below are the data deliverables you can expect from a whole genome sequencing service:

    1. Raw Data- Raw reads are typically delivered in a FASTQ file format. Raw reads and Phred quality scores are typically provided together
    2. Quality of run- FASTQC offers quality control checks on raw sequencing data so you can determine whether to proceed with further analysis. These include base and sequence quality scores, GC content, N content, length distribution, duplication levels, over-represented sequences and Kmer content.
    3. Variant Calls and Alignments- Mapped reads are provided in a BAM file format, while variant calls, including SNVs, CNVs, Indels and SVs are provided in a VCF file format.
    4. Annotations- Detailed information about breakpoints and interpretation of variants are often provided in a CNS file format



    Whole human genome sequencing and re-sequencing services and costs


    A. Whole genome library preparation and sequencing services cost between $1,700 - $1,900 per sample

    1. 30x coverage guaranteed
    2. 2 week guaranteed turnaround time
    3. 2x150 bp paired end sequencing
    4. Sample QC, library prep QC and DNA library preparation
    5. Data delivered as FASTQ files with SNP/INDEL, copy number variation, and structural variation reports
    6. Unlimited data storage

    Search for Whole Genome Sequencing Services


    B. Whole animal and plant genome library preparation and sequencing services cost between $1,700 - $1,800 per sample

    1. 700 million paired end reads per sample guaranteed
    2. 2 week guaranteed turnaround time
    3. Includes sample QC, library prep and DNA library preparation
    4. Data delivered as FASTQ files
    5. Unlimited data storage

    Search for Whole Plant and Animal Genome Sequencing Services