⬅️ NGS Handbook
Exome is a term used to describe the sum of all regions in the genome comprised of exons. Exons are
DNA regions that are transcribed into messenger RNA, as opposed to introns which are removed by
splicing proteins.
Exome sequencing is a capture based method developed to identify variants in the coding region of
genes that affect protein function. The typical workflow required to sequence and analyze an exome
is as follows:
- Nucleic acid isolation, also known as sample preparation.
- Fragment DNA for capture and short read NGS. See DNA
Fragmentation Methods.
- Construct a library.
- Target and capture exons using biotinylated probes.
- Amplify captured targets.
- Quality control using qPCR.
- Sequence using an instrument with a 2x100 or 2x150 read length.
- Analyze captured information, call variants.
While exome capture methods using PCR, hybrid capture and
molecular inversion probes exist, the most common and efficient strategies are in-solution capture
methods. In-solution capture utilizes pools of oligonucleotides or probes bound to magnetic beads,
whose sequence has been designed to hybridize to exon regions. After binding to genomic DNA, these
probes are pulled down and washed, allowing exon regions to be selectively sequenced. Several
commercial kits for exome capture are described below.
While there are approximately 180,000 exons in the human genome, constituting less than 2% of total
sequence, the exome contains ~80-90% of known disease causing variants making it a cost-effective
alternative to whole genome sequencing. When performing exome-seq, users should not only consider
average on-target coverage but also the local coverage of particular sites of interest. When
choosing between exome and whole genome sequencing (WGS), consider that exome sequencing has the
advantage that oligonucleotides are designed to particular genomic regions where typical coverage
with WGS is not enough for SNP calling. It is also more affordable enabling the analysis of more
individuals and populations. With WGS, you can detect variants in regions not covered by exome
capture allowing or the identification of structural and non-coding variants associated with
disease.
See Genohub's up-to-date list of available
Exome sequencing and library prep services
Comparison of Exome Capture Kits
Exome-Seq Kits |
Targeted Region |
Number of Probes |
Probe Type |
Genomic DNA input required |
Adapter addition |
Probe Length (mer) |
Probe Design |
Price per capture (negotiable) |
Designed on build |
Hybridization time (hours) |
Agilent SureSelect XT2 V6 Exome |
60 Mb |
~758,086 |
biotinylated cRNA baits |
100 ng |
Ligation |
120 |
Non-overlapping, paired-end reads used to fill gaps |
$270 |
GRCh37 (hg19) |
16 |
Agilent SureSelect XT2 V5 Exome |
51 Mb |
~655,872 |
biotinylated cRNA baits |
100 ng |
Ligation |
120 |
Non-overlapping, paired-end reads used to fill gaps |
$200 |
GRCh37 (hg19) |
16 |
IDT xGEN Exome Panel |
39 Mb |
429,826 |
biotinylated DNA baits |
500 ng |
Ligation |
not described |
Non-overlapping |
$250 |
GRCh37 (hg19) |
4 |
Illumina Nextera Rapid Capture Expanded Exome |
62 Mb |
>340,000 |
biotinylated DNA bait |
50 ng |
Transposase |
95 |
Non-overlapping (adjacent to each other) |
$250 |
GRCh37 (hg19) |
24-48 |
Roche Nimblegen SeqCap EZ Exome v3.0 |
64 Mb |
>2,100,000
|
biotinylated DNA bait |
1 ug
|
Ligation |
60 - 90 |
Overlapping baits |
$600 |
GRCh37 (hg19) |
72 |
Exome Kit Descriptions and Protocol Overview
Agilent HaloPlex
The HaloPlex Exome Kit contains 2.5 million probes designed to cover human coding regions. The kit is
designed for targeting smaller capture regions in a quick amount of time. The target size is 37 MB,
it’s designed for 21,522 genes and targets 357,999 exons. The overall work flow takes 1.5 days and
requires 250 ng of input DNA.
Protocol Overview:
- The procedures begins with a genomic DNA digest using restriction enzymes
- Hybridization of HaloPlex probe library to DNA digests. Hybridization results in genomic DNA
fragment circularization and incorporation of Illumina indexes and flow cell binding motifs
- DNA probe hybrids are captured using streptavidin coated magnetic beads binding to biotinylated
probe DNA
- Targeted fragments are amplified, producing a sequence ready, target enriched library
Agilent SureSelect
The Agilent SureSelect Exome kits contain an in-solution capture method which utilize long 120 mer,
biotinylated cRNA baits for enriching exome regions from genomic DNA fragments.
Protocol Overview:
- Starting with genomic DNA, samples are sheared resulting in small DNA fragments
- Libraries are prepared with Illumina compatible adapters and indices
- Biotinylated cRNA baits are incubated with the library for 16 hours
- Targeted regions are selected using magnetic streptavidin beads
- Targeted regions are amplified, producing a sequence ready library
Agilent SureSelect QXT
The SureSelect QXT kit combines a transposase based library preparation method with SureSelect’s
well characterized target capture system. Enrichment probes are 120 nt long baits that allow the
capture of exomes, gene panels or custom targets. The biggest advantage of QXT is the hybridization
time, only 90 minutes. The protocol is similar to SureSelect (described above), but only requires 50
ng of starting material and takes less than one day to complete. As of 9/2014, there hasn’t been a
publication we know of that has used this kit and it hasn’t been compared head to head as others
have in the references below. If you’ve used this kit, we’d love to hear your
feedback.
IDT xGEN Exome
The IDT xGEN Exome panel consists of 429,826 biotinylated probes designed to capture 19,396 genes (a
39 Mb region). The probes are individually synthesized and quality controlled as opposed to array
synthesized probes, reducing truncations in the pools. While the protocol requires at least 500 ng
of a constructed library, hybridization times are relatively short (4 hours) compared to competitor
exome panels. The xGEN panel only targets coding sequences (CDS) in the RefSeq database.
Protocol Overview:
- Blocking oligos are prepared, combined with a DNA library and dried down
- 5' biotinylated capture probes are added and hybridized to the library for 4 hours
- Streptavidin beads are prepared and added to the hybridized targets
- Unbound DNA is removed by washing
- Remaining fragments are amplified, enriching targeted regions
Illumina Nextera Rapid Capture Exome
Nextera exome kits come in two different panels: Nextera Rapid Capture Exome and Nextera Rapid
Capture Expanded Exome. The former hybridizes to 45 Mb of targeted coding sequence while the
expanded panel delivers 62 Mb of exons, untranslated regions (UTRs) and miRNA. The entire library
prep and hybridization capture requires only 50 ng of genomic DNA and takes 5 hours to complete,
making it one of the more user-friendly exome capture kits.
Protocol Overview:
- Libraries are prepared from as little as 50 ng of genomic material with Nextera's transposase
based chemistry
- DNA libraries are denatured
- Denatured single stranded DNA is hybridized to biotinylated probes designed to regions on the
exome
- Fragments are enriched with streptavidin beads and eluted
- Fragments are amplified, producing sequence ready enriched targets
Illumina TruSeq Exome
The TruSeq Exome Enrichment Kits are an in-solution sequence capture system designed for isolating
human exon regions. Kits include 340,427, 95 mer probes constructed against the human NCBI37/hg 19
reference genome. The probe set is designed to enrich 201,121 exons spanning 20,794 genes of
interest. The kit covers 64 Mb of the human genome, with each 95 mer probe targeting libraries that
are 300-400 bp. In addition to major exon regions, the kit provides coverage of non-coding DNA in
exon flanking regions, including UTRs and promoters.
Protocol Overview:
- The workflow begins with creating pooled indexed libraries from up to 6 samples
- Sample libraries are denatured into single stranded DNA and hybridized to 95 mer biotin labelled
probes
- Pools are then enriched with streptavidin magnetic beads and pulled from solution using a
magnet
- Enriched DNA fragments are eluted from the beads and hybridized for a second enrichment
reaction
- Fragments are amplified, producing sequence ready enriched targets
Roche Nimblegen SeqCap
SeqCap EZ Exome Library Kits utilize an in-solution based capture method to enable enrichment of the
entire human exome. The kit targets 64 Mb of the human genome using 2.1 million long
oligonucleotides.
Protocol Overview:
- The protocol begins with genomic DNA library preparation
- The library is hybridized to the SeqCap oligo pool
- The sequence library is hybridized to the oligo pool and magnetic beads are used to pull down
the captured genomic DNA fragments
- Enriched DNA fragments are eluted from the beads and hybridized for a second enrichment
reaction
- Unbound fragments are washed and the enriched fragment pool is amplified, producing sequence
ready enriched targets
MYcroarray MYbaits
MYbaits are a fully customizable in-solution DNA capture system using custom biotinylated RNA baits.
Baitsets are priced in modules of 20,000 unique bait sequences, which is particularly affordable for
designs targeting hundreds or thousands of loci. MYbaits kits are compatible with any type of
barcoded NGS library.
Protocol Overview:
- A genomic DNA library is heat denatured and hybridized to the RNA baits, typically overnight (or
longer for degraded or very rare targets)
- After hybridization, the biotinylated baits bound to captured genomic DNA are pulled out of
solution using streptavidin-coated magnetic beads
- Non-specifically bound DNA is washed away and captured DNA is eluted from the RNA baits using
heat
- Post-captured DNA is amplified, producing sequence ready enriched targets
Factors Affecting Capture Efficiency
Unlike with PCR amplification where two primers must anneal near a target in the genome and amplify
it with specificity, exon capture involves hybridization of a single probe, usually attached to a
magnetic bead. After hybridization, you rely on the pulldown of that target, followed by
amplification. There is significantly more variability with hybridization capture, which results in
on-target rates that are typically no better than 75% (see calculating on-target rates below).
Several other factors affect efficiency of capture, those include:
- GC rich regions - UTRs and promoter regions are typically very GC rich, often lowering
capture efficiency and increasing variability between these regions and other more balanced
ones.
- Quality of DNA - Poor quality DNA, typical with extractions from FFPE, can introduce
bias as certain regions tend to be more fragmented than others. If capture isn't balanced, this
results in bias and complications during down stream SNP calling and others forms of analysis.
- Quantity of DNA - Low input amounts of DNA usually require a lot more PCR cycles in
order to get enough library for the hybridization of capture probes to be efficient. Higher PCR
cycles can result in a significant amount of PCR duplicates, making conclusions from data
analysis less informative. While traditionally 1 ug of a reasonably diverse library was required
for capture, newer capture protocols now only require 50 ng.
- Pseudogenes - Can reduce evenness of coverage.
- Fragment or insert size - The inserts required from the kits listed above vary based on the size
of the probes being used in capture. It is important fragmentation of DNA be tuned to meet those
size requirements for efficient capture.
- Repeat elements - Will reduce the evenness with which reads are distributed across the
exome, resulting in the need for more sequencing to call de novo SNPs.
It is important to remember regardless of the kit you end up choosing, variability in capture
efficiency and coverage is inherent in exome sequencing. That being said, many of these variables
are controllable. Whenever you improve sequence depth, breath and evenness of coverage, de novo
variants are a lot easier to call. In the next few sections we highlight how to calculate the amount
of sequencing you need for your exome-seq project and go into detail about the capture kits
commercially available.
If you're looking for exome-seq services, start your search on our
NGS Matching Engine.
An experienced Genohuob scientist will help you through choosing the right kit and coverage for your
application.
Calculating the Amount of Sequencing You Need for Your Exome Study
Calculate On-Target Rate or Enrichment Efficiency
Enrichment efficiency = Passing Filter (PF) reads mapped to target ÷
Total number of PF reads mapped to reference
If using the correct blocking oligos, this rate usually ranges from 0.65 – 0.75.
Calculate Mean Normalized Coverage
Mean normalized coverage is equivalent to how much sequencing is required to yield a given percentage
of targeted bases at a particular read depth.
Normalized Coverage = Coverage at each base position ÷ Average
Coverage over all base positions
Determine How Much Exome Sequencing Data You Need
First, identify the mean sequencing coverage required.
Mean Sequencing Coverage = Desired Coverage ÷ Normalized
Coverage
For example if your desired coverage is 20X and mean normalized coverage is 0.2.
The mean sequencing coverage would be:
20X ÷ 0.2 = 100X
Determine Amount of Sequencing You Need to Meet Your Coverage Requirement
Required Number of PF Mapped Sequence = Targeted Bases × Mean
Sequencing Coverage ÷
Enrichment Efficiency
Let’s assume you’re using the Roche Nimblegen SeqCap EZ Exome v3.0 kit, which has a targeted region
of 64 Mb (see table above). The mean sequencing coverage you require, calculated above is 100X, and
let’s say your on-target rate or enrichment efficiency is ~0.70.
So, 64Mb × 100 ÷ 0.70 = 9.1 Gb
of passing filter and mapped sequence.
Recommended Read Length for your Exome Study
While not a hard requirement, we generally recommend paired-end 2x100 read lengths for exome capture
sequencing. ~80% of exons are <200 bp in length (Sakharkar, 2004), so a 2x100 read should be ideal
for most experiments.
Basic Recommendations for Calling Variants and Analyzing Depth of Coverage
Exome variant analysis recommendations
- Trim adapter sequences using cut-adapt
- Input files: FASTA, FASTQ
- Output files: same as input
- Align reads to a human genome build using BWA, post-process data using SAMtools, remove
duplicate reads and convert to a BAM file using Picard
- Input files: FASTA, FASTQ
- Output files: BAM
- Call SNPs using GATK HaplotypeCaller to produce a VCF file and call germline SNPs and indels
- Input files: BAM
- Output: VCF
Exome coverage analysis recommendations
- Use an existing exome BED file or use Bedtools to create intervals in the genomic regions you’d
like to analyze
- Compute read depth of your genomic intervals using GATK DepthOfCoverage
Off-Target Reads (reads that don't align to the target region)
Exome sequencing tends to produce a large portion of off-target reads. In some cases as much as 40%
of the data produced by capture methods are not of exon regions. The three main types of off-target
reads include:
- Intron and intergenic reads
- Viral and bacterial genome reads
- Mitochondrial reads
Poorly designed biotinylated probes (baits), fragmented input DNA, small capture regions,
inefficient hybridization conditions, and spurious adapter to adapter annealing are the reasons such
a large portion of exome reads can end up being off-target. When calculating the number of reads
required to meet
mean on target exon coverage, off-target rates need to be considered.
Off-target reads can include functionally important genomic regions such as promoters, conserved
non-coding sequences, untranslated regions (UTRs) and microRNA. A recent review (Samuels et al.)
suggests data mining and extracting useful information from non-exonic reads is valuable. Tools for
analyzing these off-target reads are also described.
Considerations for Whole Exome Sequencing:
1. What sequencing instrument and read length should I choose for exome-seq?
To obtain coverage typically required for whole human exome sequencing, we recommend the Illumina
HiSeq or Nextseq 500 platforms. You should use a minimum read length of 2x100, or 200 bp. While
sequencing at a longer read length gives you slight improvements in single nucleotide variant (SNV)
calls, the improvements are modest and not necessary worth the extra time, cost. Similarly for
INDELs, the number of variants identified do not increase significantly beyond a 2x100 bp length
sequencing run. Certainly, longer read lengths do increase mean coverage. Improved coverage using
longer reads increases with greater sequencing depth. So if you’re trying to obtain 200x coverage,
consider a longer read length. If you’re aiming for 100x coverage, unless the costs are relatively
the same, we recommend starting with a 2x100 bp sequencing run.
2. How much sequencing coverage do I need for exome sequencing?
For accurate variant identification we recommend at least 100x coverage. In many cases higher
coverage will reduce the chance of a false positive SNV call. Higher coverage will also be necessary
to increase sensitivity and improves the detection of rare variants. If you need advice on the depth
of sequencing required for your exome project, we’re happy to offer advice. Fill our our
complimentary consultation form and
describe your exome sequencing project. A member of the Genohub scientific staff will assist you.
3. How do I calculate the sequencing coverage or depth required for my whole exome sequencing
study?
Several factors must be considered when calculating coverage for exome sequencing. These include:
- Average on target rate or probe enrichment efficiency
- Mean normalized coverage
- Desired coverage vs. mean coverage
See the detailed example in calculating exome coverage.
4. Which exome sequencing capture kit should I use for my study?
We’ve summarized the specifications for each capture kit in this table. In
general, we’ve found the Roche Nimblegen SeqCap v3.0 + UTR kit contains the
largest designed target and coding regions. Nimblegen SeqCap v3.0 and Agilent SureSelect XT2 v5 +UTR
exome capture kits also tend to have lower off target enrichment compared to Illumina Nextera Rapid
Capture (expanded exome). The Agilent SureSelect XT2 v5 + UTR exome kits generally have been shown
to have the highest accuracy in SNV detection and best GC-rich region enrichment.
While we’ve just made some recommendations above, all three kits are frequently used in many
facilities and results with each will likely be adequate for most applications. In many cases the
amount of DNA you have will be a deciding factor on the kit that you should use. With some kits you
can start with as little as 50 ng of input material. Others require at least 1 ug. Most exome
capture kits allow you to barcode and pool your samples prior to capture, reducing capture reagent
costs. Several older kit versions are available to users who prefer capturing each barcoded library
individually.
5. How can I compare the annotation and exome capture design between each kit?
You can download annotated files providing information on genomic regions covered by the capture
probes and genes included in these regions:
6. Should I choose Whole Genome Sequencing (WGS) or Whole Exome Sequencing (WES) for my
project?
There are advantages/disadvantges for both WGS and WES. With the decreasing cost of WGS, these need
to be carefully considered. We've laid out each of these in an earlier blog post:
Whole Genome Sequencing (WGS) vs. Whole Exome Sequencing (WES).
Exome Sequencing Service Categories:
A. Standard Exome Sequencing Services - Prices Range from $550 - $800 per sample
- 100x sequencing coverage
- Appropriate for non-cancer based applications
- Transposase or Ligation based library preparation
- Options to choose between IDT xGEN and Illumina Nextera Rapid Capture
Search for Standard Whole Exome Sequencing Services
B. Deep or High Coverage Exome Sequencing Service - Prices range from $760 - $1,800 per
sample
- 200x sequencing coverage
- Appropriate for case samples or tumor normal pairs
- Ligation based library preparation
- Options to choose between Agilent SureSelect, NimbleGen SeqCap, IDT and Illumina Nextera
Search for High Depth Whole Exome Sequencing Services
C. Clinical Grade Whole Human Exome Sequencing - Prices range from $850 - $1,800 per sample
- Performed in a CLIA certified facility
Search for Clinical Whole Exome Sequencing Services
Published References Comparing Exome Capture Kits:
- Asan et al: Comprehensive comparison of three commercial human whole-exome capture platforms.
Genome Biol. 2011 Sep 28;12(9):R95.
- Bainbridge, M.N. et al., 2010. Whole exome capture in solution with 3Gbp of data. Genome
Biology.
- Clark et al: Performance comparison of exome DNA sequencing technologies. Nat Biotechnol. 2011
Sep 25. doi: 10.1038/nbt.1975
- Kingsmore, S.F. & Saunders, C.J., 2011. Deep Sequencing of Patient Genomes for Disease
Diagnosis : When Will It Become Routine? ScienceTranslationalMedicine, 3(87), p.1-4. Review of
Bainbridge et al and discussion of WGS and targeted or Exome-Seq. They also suggest that an
exome costs 5-15 fold less that a WGS.
- Maxmen, A., 2011. Exome Sequencing Deciphers Rare Diseases. Cell, 144, p.635-637. A review of
the undiagnosed Diseases Program at NIH. Exome-Seq and high-resolution microarrays for
genotyping. They mention the team’s first reported discovery of a new disease, which was
published in The New England Journal of Medicine.
- Natsoulis, G. et al., 2011. A Flexible Approach for Highly Multiplexed Candidate Gene Targeted
Resequencing. PloS one, 6(6).
- Parla et al: A comparative analysis of exome capture. Genome Biol. 2011 Sep 29;12(9):R97.
- Sakharkar et al: Distributions of exons and introns in the human genome. In Silico Biol. 2004;
4(4):387-93.
- Samuels, D.C., 2013. Finding the lost treasures in exome sequencing data. Trends in Genetics,
29, p.593-599.
- Sulonen et al: Comparison of solution-based exome capture methods for next generation
sequencing. Genome Biol. 2011 Sep 28;12(9):R94.