Exome is a term used to describe the sum of all regions in the genome comprised of exons. Exons are DNA regions that are transcribed into messenger RNA, as opposed to introns which are removed by splicing proteins.
Exome sequencing is a capture based method developed to identify variants in the coding region of genes that affect protein function. The typical workflow required to sequence and analyze an exome is as follows:
While exome capture methods using PCR, hybrid capture and molecular inversion probes exist, the most common and efficient strategies are in-solution capture methods. In-solution capture utilizes pools of oligonucleotides or probes bound to magnetic beads, whose sequence has been designed to hybridize to exon regions. After binding to genomic DNA, these probes are pulled down and washed, allowing exon regions to be selectively sequenced. Several commercial kits for exome capture are described below.
While there are approximately 180,000 exons in the human genome, constituting less than 2% of total sequence, the exome contains ~80-90% of known disease causing variants making it a cost-effective alternative to whole genome sequencing. When performing exome-seq, users should not only consider average on-target coverage but also the local coverage of particular sites of interest. When choosing between exome and whole genome sequencing (WGS), consider that exome sequencing has the advantage that oligonucleotides are designed to particular genomic regions where typical coverage with WGS is not enough for SNP calling. It is also more affordable enabling the analysis of more individuals and populations. With WGS, you can detect variants in regions not covered by exome capture allowing or the identification of structural and non-coding variants associated with disease.
See Genohub's up-to-date list of available Exome sequencing and library prep services
|Exome-Seq Kits||Targeted Region||Number of Probes||Probe Type||Genomic DNA input required||Adapter addition||Probe Length (mer)||Probe Design||Price per capture (negotiable)||Designed on build||Hybridization time (hours)|
|Agilent SureSelect XT2 V6 Exome||60 Mb||~758,086||biotinylated cRNA baits||100 ng||Ligation||120||Non-overlapping, paired-end reads used to fill gaps||$270||GRCh37 (hg19)||16|
|Agilent SureSelect XT2 V5 Exome||51 Mb||~655,872||biotinylated cRNA baits||100 ng||Ligation||120||Non-overlapping, paired-end reads used to fill gaps||$200||GRCh37 (hg19)||16|
|IDT xGEN Exome Panel||39 Mb||429,826||biotinylated DNA baits||500 ng||Ligation||not described||Non-overlapping||$250||GRCh37 (hg19)||4|
|Illumina Nextera Rapid Capture Expanded Exome||62 Mb||>340,000||biotinylated DNA bait||50 ng||Transposase||95||Non-overlapping (adjacent to each other)||$250||GRCh37 (hg19)||24-48|
|Roche Nimblegen SeqCap EZ Exome v3.0||64 Mb||>2,100,000
||biotinylated DNA bait||1 ug
||Ligation||60 - 90||Overlapping baits||$600||GRCh37 (hg19)||72|
The HaloPlex Exome Kit contains 2.5 million probes designed to cover human coding regions. The kit is designed for targeting smaller capture regions in a quick amount of time. The target size is 37 MB, it’s designed for 21,522 genes and targets 357,999 exons. The overall work flow takes 1.5 days and requires 250 ng of input DNA.
The Agilent SureSelect Exome kits contain an in-solution capture method which utilize long 120 mer, biotinylated cRNA baits for enriching exome regions from genomic DNA fragments.
The SureSelect QXT kit combines a transposase based library preparation method with SureSelect’s well characterized target capture system. Enrichment probes are 120 nt long baits that allow the capture of exomes, gene panels or custom targets. The biggest advantage of QXT is the hybridization time, only 90 minutes. The protocol is similar to SureSelect (described above), but only requires 50 ng of starting material and takes less than one day to complete. As of 9/2014, there hasn’t been a publication we know of that has used this kit and it hasn’t been compared head to head as others have in the references below. If you’ve used this kit, we’d love to hear your feedback.
The IDT xGEN Exome panel consists of 429,826 biotinylated probes designed to capture 19,396 genes (a 39 Mb region). The probes are individually synthesized and quality controlled as opposed to array synthesized probes, reducing truncations in the pools. While the protocol requires at least 500 ng of a constructed library, hybridization times are relatively short (4 hours) compared to competitor exome panels. The xGEN panel only targets coding sequences (CDS) in the RefSeq database.
Nextera exome kits come in two different panels: Nextera Rapid Capture Exome and Nextera Rapid Capture Expanded Exome. The former hybridizes to 45 Mb of targeted coding sequence while the expanded panel delivers 62 Mb of exons, untranslated regions (UTRs) and miRNA. The entire library prep and hybridization capture requires only 50 ng of genomic DNA and takes 5 hours to complete, making it one of the more user-friendly exome capture kits.
The TruSeq Exome Enrichment Kits are an in-solution sequence capture system designed for isolating human exon regions. Kits include 340,427, 95 mer probes constructed against the human NCBI37/hg 19 reference genome. The probe set is designed to enrich 201,121 exons spanning 20,794 genes of interest. The kit covers 64 Mb of the human genome, with each 95 mer probe targeting libraries that are 300-400 bp. In addition to major exon regions, the kit provides coverage of non-coding DNA in exon flanking regions, including UTRs and promoters.
SeqCap EZ Exome Library Kits utilize an in-solution based capture method to enable enrichment of the entire human exome. The kit targets 64 Mb of the human genome using 2.1 million long oligonucleotides.
MYbaits are a fully customizable in-solution DNA capture system using custom biotinylated RNA baits. Baitsets are priced in modules of 20,000 unique bait sequences, which is particularly affordable for designs targeting hundreds or thousands of loci. MYbaits kits are compatible with any type of barcoded NGS library.
Unlike with PCR amplification where two primers must anneal near a target in the genome and amplify it with specificity, exon capture involves hybridization of a single probe, usually attached to a magnetic bead. After hybridization, you rely on the pulldown of that target, followed by amplification. There is significantly more variability with hybridization capture, which results in on-target rates that are typically no better than 75% (see calculating on-target rates below). Several other factors affect efficiency of capture, those include:
It is important to remember regardless of the kit you end up choosing, variability in capture efficiency and coverage is inherent in exome sequencing. That being said, many of these variables are controllable. Whenever you improve sequence depth, breath and evenness of coverage, de novo variants are a lot easier to call. In the next few sections we highlight how to calculate the amount of sequencing you need for your exome-seq project and go into detail about the capture kits commercially available.
If you're looking for exome-seq services, start your search on our NGS Matching Engine. An experienced Genohuob scientist will help you through choosing the right kit and coverage for your application.
Enrichment efficiency = Passing Filter (PF) reads mapped to target / Total number of PF reads mapped to reference.
If using the correct blocking oligos, this rate usually ranges from 0.65 – 0.75.
Mean normalized coverage is equivalent to how much sequencing is required to yield a given percentage of targeted bases at a particular read depth.
Normalized Coverage = Coverage at each base position / Average Coverage over all base positions
First, identify the mean sequencing coverage required.
Mean Sequencing Coverage = Desired Coverage / Normalized Coverage.
Let’s say your desired coverage is 20X and mean normalized coverage is 0.2.
The mean sequencing coverage would be
20X / 0.2 = 100X.
Required Number of PF Mapped Sequence = (Targeted Bases) x (Mean Sequencing Coverage) / Enrichment Efficiency.
Let’s assume you’re using the Roche Nimblegen SeqCap EZ Exome v3.0 kit, which has a targeted region of 64 Mb (see table above). The mean sequencing coverage you require, calculated above is 100X, and let’s say your on-target rate or enrichment efficiency is ~0.70.
(64Mb)(100X) / 0.70 = 9.1 Gb of passing filter and mapped sequence.
While not a hard requirement, we generally recommend paired-end 2x100 read lengths for exome capture sequencing. ~80% of exons are <200 bp in length (Sakharkar, 2004), so a 2x100 read should be ideal for most experiments.
Exome sequencing tends to produce a large portion of off-target reads. In some cases as much as 40% of the data produced by capture methods are not of exon regions. The three main types of off-target reads include:
Poorly designed biotinylated probes (baits), fragmented input DNA, small capture regions, inefficient hybridization conditions, and spurious adapter to adapter annealing are the reasons such a large portion of exome reads can end up being off-target. When calculating the number of reads required to meet mean on target exon coverage, off-target rates need to be considered.
Off-target reads can include functionally important genomic regions such as promoters, conserved non-coding sequences, untranslated regions (UTRs) and microRNA. A recent review (Samuels et al.) suggests data mining and extracting useful information from non-exonic reads is valuable. Tools for analyzing these off-target reads are also described.
To obtain coverage typically required for whole human exome sequencing, we recommend the Illumina HiSeq or Nextseq 500 platforms. You should use a minimum read length of 2x100, or 200 bp. While sequencing at a longer read length gives you slight improvements in single nucleotide variant (SNV) calls, the improvements are modest and not necessary worth the extra time, cost. Similarly for INDELs, the number of variants identified do not increase significantly beyond a 2x100 bp length sequencing run. Certainly, longer read lengths do increase mean coverage. Improved coverage using longer reads increases with greater sequencing depth. So if you’re trying to obtain 200x coverage, consider a longer read length. If you’re aiming for 100x coverage, unless the costs are relatively the same, we recommend starting with a 2x100 bp sequencing run.
For accurate variant identification we recommend at least 100x coverage. In many cases higher coverage will reduce the chance of a false positive SNV call. Higher coverage will also be necessary to increase sensitivity and improves the detection of rare variants. If you need advice on the depth of sequencing required for your exome project, we’re happy to offer advice. Fill our our complimentary consultation form and describe your exome sequencing project. A member of the Genohub scientific staff will assist you.
Several factors must be considered when calculating coverage for exome sequencing. These include:
See the detailed example in calculating exome coverage.
We’ve summarized the specifications for each capture kit in this table. In general, we’ve found the Roche Nimblegen SeqCap v3.0 + UTR kit contains the largest designed target and coding regions. Nimblegen SeqCap v3.0 and Agilent SureSelect XT2 v5 +UTR exome capture kits also tend to have lower off target enrichment compared to Illumina Nextera Rapid Capture (expanded exome). The Agilent SureSelect XT2 v5 + UTR exome kits generally have been shown to have the highest accuracy in SNV detection and best GC-rich region enrichment.
While we’ve just made some recommendations above, all three kits are frequently used in many facilities and results with each will likely be adequate for most applications. In many cases the amount of DNA you have will be a deciding factor on the kit that you should use. With some kits you can start with as little as 50 ng of input material. Others require at least 1 ug. Most exome capture kits allow you to barcode and pool your samples prior to capture, reducing capture reagent costs. Several older kit versions are available to users who prefer capturing each barcoded library individually.
You can download annotated files providing information on genomic regions covered by the capture probes and genes included in these regions:
There are advantages/disadvantges for both WGS and WES. With the decreasing cost of WGS, these need to be carefully considered. We've laid out each of these in an earlier blog post: Whole Genome Sequencing (WGS) vs. Whole Exome Sequencing (WES).