The term RNA-Seq refers to a next generation sequencing approach that offers a snapshot of the
entire transcriptome or messenger RNA (mRNA) profile at a given moment in time. RNA-Seq allows for
the detection of transcript isoforms, allele specific gene expression, gene fusions, and single
nucleotide variants, all without the need for knowing anything about the sample’s sequence
composition. The term RNA-Seq is frequently inaccurately used, as RNA is not directly sequenced.
Single RNA strands are converted to complementary DNAs (cDNA) and then turned into double stranded
DNA before being sequenced. So while the initial starting input material is RNA, material loaded on
the sequencing instrument is DNA.
Whole transcriptome sequencing by RNA-Seq involves a snap-shot measurement of the complete complement
of transcripts in a cell. These include transcripts such as mRNA and all non-coding RNAs. By looking
at the whole transcriptome, researchers are able to determine global expression levels of each
transcript, identify exons, introns and map their boundaries. In addition, RNA-Seq can be used for
the identification of splicing variants. To accurately look at the whole transcriptome, most library
preparation protocols first start with the removal of ribosomal RNA (rRNA) which otherwise takes up
the majority of all sequencing reads. Assuming you’re not interested in ribosomal RNA, removing
these transcripts allows for more of the sequencing reads to be focused on transcripts you’re
actually interested in sequencing, giving you improved sensitivity toward low expressed
transcripts.
Messenger RNA-Seq or mRNA-Seq is a targeted RNA-Seq protocol that enriches for all polyadenylated
(poly-A) transcripts of the transcriptome. mRNA-Seq is a method used in studying transcription in
disease states as well as expression in variety of research based applications. Only around 1-2% of
the entire transcriptome is comprised of poly-A tailed RNA, the coding part of the genome. By
targeting mRNA, sequencing depth is improved as resources are dedicated to the sequencing of coding
genes. This makes identifying rare variants and low expressed mRNA transcripts easier.
While whole transcriptome and mRNA-seq represent ~90% of all RNA based sequencing applications, it’s
important to not lose site of the myriad of new protocols available to detect transcription events,
RNA-protein interactions, RNA modifications, RNA structure and low input RNA. Several examples of
these are listed here:
- RNA Extraction- Total RNA is extracted by phenol chloroform, gel or column
enrichment. Depending on the sample type, there are several considerations one should make
while extracting total RNA for the purposes of next generation sequencing. These include
ensuring your RNA is free of organic solvents that might inhibit down-stream library
preparation, RNA is not fragmentated into sizes that are too short for library preparation
and sequencing, quality of your total RNA is not diminished by your extraction method. If
you need help, fill out our complimentary consultation
form and we'll be happy to offer our recommendations.
- RNA QC- RNA sample quality will vary significantly depending on where total
RNA was extracted. RNA quality from FFPE or degraded samples can have poor RNA integrity
while total RNA extracted from fresh tissue can be in superb condition. Running total RNA on
a electropherogram or gel to detect 28S and 18S rRNA bands is the most common method for
ensuring RNA quality is good. The 28S rRNA band should be approximately twice as intense as
the 18S rRNA band on a gel. RIN values or RNA integrity can be measured using a Bioanalyzer
or other type of electropherogram.
- RNA Sample Submission- Typically 100 ng – 1 µg of total RNA is required for
mRNA and whole transcriptome sequencing. See our guide for recommendations on shipping RNA
samples.
- Order Sequencing and/or Library Prep Services- Quotes for RNA-Seq services
can be obtained instantly on Genohub.
- RNA Library Preparation- The type of RNA library preparation performed
depends on whether you’re interested in examining the entire transcriptome or just coding
transcripts in the case of mRNA-seq. If examining the transcriptome, it’s recommended that
rRNA and potentially other non-coding RNAs be removed from your total RNA (see methods for
removing rRNA from total RNA). If you’re interested in interrogating mRNA, poly T
oligonucleotides fixed to magnetic beads are added to total RNA and selectively bind to
messenger RNAs. Anything not bound is removed during a wash step. mRNAs are eluted from the
beads and used in the first step of library preparation. While this doesn’t completely
eliminate non-coding RNA, it does significantly reduce the proportion of rRNA in your final
sequencing results. After your depletion or selection strategy has been chosen, all RNA-seq
library preparation applications have a reverse transcription step where RNA is converted to
cDNA and sequencing adapters added by ligation. The steps for library preparation include:
- Total RNA isolation
- mRNA enrichment or rRNA depletion from total RNA
- RNA Fragmentation
- Reverse transcription - 1st strand synthesis
- Second strand synthesis
- Ligation of adapters
- Amplification
Commonly used RNA-Seq library preparation kits for whole transcriptome or mRNA-Seq can be found
below.
- Sequencing- Parameters for your sequencing run will depend on your experiment. As a
general recommendation, for differential expression profiling we recommend at least between 10-25M
single 1x50 or 1x100 reads. For de novo assembly or alternative splicing, we recommend around 100M
paired 2x100 or 2x150 reads. See Transcriptome Sequencing in our coverage guide
for more information.
- Data Analysis- Again, analysis requirements will depend on your experiment.
Typically RNA-seq data is filtered, mapped and assembled. Quantification of splice variants,
junctions and differential expression are commonly performed to interpret data. For general pipeline
recommendations, go to: RNA-Seq Data Analysis Recommendations
RNA-Seq Library Preparation Kits
The NEXTflex Rapid Directional RNA-Seq Kit is optimized for starting inputs between 10 ng – 1 ug of
Total RNA or 10 – 100 ng of purified mRNA or ribosomal depleted RNA. The kit utilizes a
“directional” or “stranded” approach that identifies from which of two DNA strands a given RNA
transcript was derived. When RNA is copied back into cDNA during RNA-Seq library prep, the
information about which of the strands was copied into RNA is lost unless a “directional” or
“stranded” approach is used to preserve it’s identity. Strandedness is useful for transcription
annotation, increases the percentage of alignable reads and provides insight into antisense
transcription. Stranded character is retained due to the directionality of the adapters added.
During second strand synthesis, dUTP is used in place of dTTP. Just before PCR, Uracil DNA
Glycosylase (UDG) is used to catalyze excision of the uracil base, cleaving the uridine containing
strand.
This kit’s unique characteristics include a reverse transcription step that’s performed at 50ºC
instead of 42ºC. The NEXTflex Rapid Thermostable enzyme included in the kit reduces secondary
structure in RNA before transcription, improves yield and read through in complex GC regions. The
protocol combines end-repair with second strand synthesis, reducing the need for a separate step.
The kit is available with magnetic mRNA poly(A) beads and up to 48 RNA barcodes.
Protocol Overview:
- Poly A selection of mRNA or ribosomal depletion of rRNA
- Fragmention with a high divalent cation buffer
- Random hexamer priming
- First strand synthesis with Actinomycin D (to prevent DNA synthesis)
- Second strand synthesis (using dUTP instead of dTTP)
- A-tailing
- Adapter ligation
- UDG (reagent depletes second strand)
- PCR amplification
The NEXTflex Rapid RNA-Seq Kit is similar to the Rapid Directional RNA-Seq kit above, but does not
provide strand information. Second strand synthesis does not contain dUTP and UDG is not used prior
to PCR. The kit does contain the same thermostable reverse transcriptase step and fast workflow.
Magnetic mRNA beads and up to 48 barcodes are also available.
Similar to the NEXTflex Rapid Directional RNA-Seq kit, this kit uses a stranded approach to
determine from which two DNA strands a given RNA transcript was derived. The main feature of this
kit has to do with the ‘Q’ in qRNA-Seq which stands for quantitative. This kit utilizes a series of
9,216 molecular labels during ligation to ensure that each fragmented molecule is tagged with a
unique index. The unique labels help differentiate between duplicates and unique fragments. While
de-duplication is typically performed using start and stop sites, fragments with similar start and
stop sites are collapsed and lost during counting. Using stochastic labels, this kit gives a more
accurate representation of transcript expression. The protocol is similar to the previously
mentioned Directional RNA-Seq kit, but includes molecular indices and up to 96 sample barcodes which
are added during PCR.
Clontech SMARTer
SMART stands for “Switching Mechanism at 5’ End of RNA Template”. The procedure allows you to add
known sequence
to the 3’ and 5’ ends of the cDNA fragment without ligation. Generally ligation based approaches
yield a maximum
of 40-50% doubly ligated product. By using a ligation-free approach, the user is able to start with
significantly lower inputs of material. The Clontech SMARTer kit allows you to start with as little
as 1-2 ng of
total RNA which makes this kit ideal for laser capture microscopy, cells sorted with flow cytometry
or other
techniques that yield small input RNA amounts. SMART technology’s benefits are that it enriches for
full length
transcripts and maintains true representation of the original mRNA transcripts, factors that are
critical for
transcriptome sequencing and gene expression analysis.
Protocol Overview:
- Modified oligo(dT) or random hexamer primer primers the first strand synthesis reaction.
- When SMARTScribe RT reaches the end of the 5’end of the mRNA, the enzyme’s terminal transferase
activity
adds a few additional nucleotides to the 3’ end of the cDNA
- SMARTer oligo base pairs with non template nucleotide stretch creating an extended template.
- SMARTScribe RT template switches and starts and continues replicating to the end of the
oligonucleotide.
- Resulting single stranded cDNA contains the 5’ end of the mRNA as well as sequence
complementary to the
SMARTer Oligonucleotide.
- Covaris shearing of full length cDNA
- Library Preparation
Epicentre ScriptSeq
The ScriptSeq kit has been developed for preparing libraries from 500 pg - 50 ng of either ribosomal
depleted or
polyA enriched RNA. Directional and paired end libraries can be constructed using this kit for
Illumina
platforms.
Protocol Overview:
- RNA Fragmentation
- Random hexamer tagging
- RNA removal
- Annealing of 3’-end blocked terminal tagging oligo
- cDNA synthesis
- DNA purification
- PCR amplification and barcode addition
Gnomegen RNA Profiling Kit
The Gnomegen RNA profiling is designed specifically for identifying the 3’ regions of RNA
transcripts. It is
marketed as a cost effective alternative to whole transcriptome sequencing, as it focuses mainly on
identifying
changes in RNA expression levels. It is not designed for RNA discovery, annotation or identification
of
alternative splicing sites. The input material required for library construction is as low as 1 ng
of total RNA
and can be performed in 1 day.
Protocol Overview:
- Fragmentation of Total RNA
- 5’adapter ligation
- Reverse transcription with a polyA RT primer
- PCR
Illumina TruSeq RNA (Discontinued)
The TruSeq RNA Kit from Illumina is designed for generating mRNA libraries from total RNA. The kit
is optimized
for starting inputs between 0.1 - 4 µg of total RNA, but the user may also start with the entire
fraction of
mRNA isolated from 0.1 - 4 µg of total RNA. Needless to say, the kit comes with polyA beads for the
purification
procedure. All reagents in the kit are master mixed for ease of use during library construction. The
v2 version
of this kit contains 12 indexing adapters, with up to 24 available. There are no other significant
differences
between v1 and v2. Illumina recommends the use of high quality starting material, particularly total
RNA with a
high RNA integrity number (RIN) greater than or equal to 8. Alternatively, the user may run their
sample on a
gel and look for a 28S band that is twice as intense as the 18S band. The kit contains all the
components you
need for RNA-Seq library prep, except for the reverse transcriptase enzyme. Illumina recommends the
purchase of
SuperScript II for first strand synthesis.
Protocol Overview:
- PolyA selection of mRNA
- Fragmention with a high divalent cation buffer
- Random hexamer priming
- First strand synthesis (using SuperScript II)
- Second strand synthesis
- End repair
- A-tailing
- Adapter ligation
- PCR
Illumina TruSeq Stranded
The TruSeq Stranded kit is essentially the same as the Illumina TruSeq RNA kit except for the fact
it provides
“directional” or “stranded” information that identifies from which of two DNA strands a given RNA
transcript was
derived. When RNA is copied back into cDNA during RNA-Seq library prep, the information about which
of the
strands was copied into RNA is lost unless a “directional” or “stranded” approach is used to
preserve its
identity. Strandedness is useful for transcription annotation, increases the percentage of alignable
reads and
provides insight into antisense transcription. Stranded character is retained due to the
directionality of the
adapters. The P7 adapter is on the 3’ end of the cDNA strand. During second strand synthesis, dUTP
is used in
place of dTTP, eliminating the second strand during amplification since the PCR polymerase used
cannot read
through dUTP.
The kit is optimized for 0.1 - 4 µg of total RNA input and may be used with either polyA or
ribosomal depletion
beads. The kit includes 12 adapter barcodes with a total of 24 available in the “low-throughput”
version of the
kit. The “high-throughput” version contains 96 unique dual indexed adapters. The move to dual
indexed adapters
is a strategy that decreases the number of adapters that need to be synthesized to make 96.
Essentially 12
adapters containing unique indices are annealed with 8 adapters containing another set of unique
indices,
allowing for 96 possible combinations, with the need to only synthesize 20 adapters. There are no
particular
user benefits of “dual” over “single” indices. In fact, the main disadvantage is that using “dual”
indices,
there are several sequencing cycles that are lost, reading “dark regions”.
The TruSeq Stranded mRNA kit comes with the polyA beads, while the TruSeq Stranded Total RNA kit
contains
ribosomal depletion beads. The ribo-depletion beads are essentially from Epicentre (Ribo-Zero) and
contain
biotinylated probes that selectively bind and remove rRNA. By depleting rRNA, the user is able to
not only
sequence polyA genes, but a broad range of other non-coding transcripts, including long non-coding
RNA
(lincRNA), small nuclear RNA (snRNA), small nucleolar RNA (snoRNA) and other species. The TruSeq
Stranded Total
RNA Kit can be purchased with Ribo-Zero Gold beads, designed for rRNA removal across human, mouse
and rat
species, Ribo-Zero Globin, for removal of rRNA and globin in a single step and Ribo-Zero Plant for
specific
removal of cytoplasmic mitochondrial and chloroplast rRNA from leaf, seed and root tissue.
Protocol Overview:
- PolyA selection of mRNA or ribosomal depletion of rRNA
- Fragmention with a high divalent cation buffer
- Random hexamer priming
- First strand synthesis (using SuperScript II) with Actinomycin D (to prevent DNA synthesis)
- Second strand synthesis (using dUTP instead of dTTP)
- End repair
- A-tailing
- Adapter ligation
- PCR (polymerase cannot read through dUTP)
NEB NEBNext Ultra Directional RNA
The NEB NEBNext Ultra Directional RNA kit requires 100 ng - 1 µg of Total RNA or 10 - 100 ng of
purified mRNA or ribosomal depleted RNA as starting input. The protocol is similar to Illumina’s
TruSeq Directional RNA-Seq Kit.
See Genohub's up-to-date list of available library prep services for the following applications:
RNA
(polyA-selected) Illumina library prep services
RNA
(rRNA-depleted) Illumina library prep services
If you're doing your own library preparation see the list of
facilities
that offer Illumina sequencing.
RNA-Seq Data Analysis Recommendations
Evaluating the quality of your data and extracting biological importance is one of the most
important steps in any RNA-Seq application. It’s important to discuss your project with an
experienced bioinformatician or learn about the best tools to properly analyze your data. We do not
recommend the one pipeline fits all approach offered by several commercial black-box software
providers. Each RNA tool (and there are many) needs to be carefully examined to see if it is the
right tool for your application. With that disclaimer, we recommend starting with the Tuxedo suite
of software for differential gene and transcript expression analysis of RNA-seq experiments (1).
Tuxedo's tools enable short-read mapping, identification of splice junctions, transcript and isoform
detection. The tools are open source and most importantly use peer-reviewed statistical methods
(2-5).
3 main components to a Tuxedo based RNA-Seq data analysis pipeline:
- Bowtie2 (2)
- Tophat (3)
- Cufflinks (4-5)
Bowtie2 is a fast, memory efficient aligner designed to quickly align large sets of short reads to
the genome. Bowtie is the basis for tools like TopHat and Cufflinks.
Tophat is a fast splice junction mapper that can be used with Bowtie or Bowtie2. Tophat uses Bowtie
to map RNA-seq reads and then analyze the mapping results to identify splice junctions between
exons.
Cufflinks uses alignment data from Tophat and assembles RNA-seq reads into transcripts, provides an
abundance estimate, measures differential expression and regulation of the transcriptome. Cuffdiff
can measure differential expression levels from CDS, gene, isoform and TSS transcript level.
How Many Replicates are Needed for RNA-Seq?
To properly interpret read count differences in RNA-Seq data you should consider where variation can
be introduced. When thinking about the replicates you need for a RNA-Seq study, you're likely
thinking about biological replicates. As a general answer we recommend at least 6 biological
replicates per sample group. For a more detailed and ‘experimentally validated’ answer, we recommend
you read How
Many Replicates are Sufficient for Differential Gene Expression. Replicates for experimental
variation should also be considered. These include:
- Sample variation - When you extract total RNA from a sample, only a small
fraction of nucleic acid is actually sampled and represented in your library. This causes
sampling variation and should be considered in your analysis.
- Technical variation - Library preparation for RNA-seq is a series of
coordinated enzymatic reactions that may each contribute to variation between libraries.
Technical variation should be controlled for in your experiment.
- Biological variation - This type of variation is what you are actually
interested in measuring. The number of biological replicates you need for a whole transcriptome
or mRNA-seq experiment depends on what you are trying to compare or measure statistically.
Paired or Single-end Reads for RNA-Seq?
In paired-end sequencing, both ends of a transcript are sequenced as opposed to single-end
sequencing where one end is sequenced. Advantages of paired-end sequencing for RNA applications
include better alignment and transcript assembly. As a result, paired end sequencing is recommended
for the following RNA-seq applications:
- De novo assembly
- Discovery of novel non-coding RNAs
- Splice isoform detection
- Resolution at the 3’end of your transcript
- Identification of polycistronic mRNAs and operons
Single end sequencing is sufficient for differential expression studies, where you’re interested in
examining a profile of all coding transcripts in a sample. For counting applications, sequencing
both ends of a transcript is not critical.
Methods for removing rRNA from Total RNA
The most common methods for removing rRNA from total RNA are:
Oligo-bead based. Biotin labeled oligonucleotides complementary to rRNA or other
non-coding RNAs are mixed with your total RNA, hybridize and pulled down by streptavidin beads.
RNAseH cleavage. DNA oligos designed to hybridize to rRNA are incubated with total
RNA before RNAseH is introduced. RNAseH belongs to a family of non-specific endonucleases and
catalyzes cleavage of 3’O-P (phosphodiester) bonds of RNA in DNA/RNA duplexes. After a cleanup, rRNA
is no longer available for reverse transcription.
Priming. Reverse transcription primer sets can been designed, e.g. oligo(dT) primers
or 'not so random priming' to specifically avoid rRNA and other non-coding products.
CRISPR/Cas9. CRISPR has been shown in at least one publication (6) to specifically
cleave rRNA targets, and knock them down.
Considerations for Whole Transcriptome and mRNA Sequencing:
1. What sequencing instruments are recommended for RNA-seq, specifically whole transcriptome or
mRNA-sequencing?
We recommend the Illumina Nextseq, HiSeq 2500 and HiSeq 3K/4K instruments. They each offer enough
throughput and a variety of read length offerings to make them completely suitable for all RNA
sequencing applications.
2. Will my RNA-Seq results be stranded or directional?
Yes, almost all commercially available library preparation kits use techniques to preserve strand
information. Directional or stranded RNA-seq identifies from which of two DNA strands a given RNA
transcript was derived. When RNA is copied into cDNA during transcription, the information about
which of the strands was copied into RNA is lost unless a “directional” or “stranded” approach is
used to preserve it’s identity. Strandedness is useful for transcription annotation, increases the
percentage of alignable reads and provides insight into antisense transcription.
3. What technique offers greater read-depth, mRNA-seq or whole transcriptome sequencing?
mRNA-seq offers greater read depth than whole transcriptome sequencing. mRNA-seq is a technique that
enriches poly-adenylated RNA so sequencing reads are focused on a subset of RNA, giving you higher
sequencing depth. For whole transcriptome work, rRNA depletion will remove rRNA, focusing reads on
mRNA and other non-coding RNAs.
4. How long of a sequencing read should I use for mRNA-seq and whole transcriptome
sequencing?
Longer read lengths are important for de novo transcript assembly and identifying transcript
isoforms. We recommend paired 2x100 or 2x150 read lengths for these applications. For mRNA
differential gene expression studies, a long read length is typically not required when there is an
available reference genome. For these studies we recommend single end 1x50 or 1x100 read
lengths.
Direct RNA Sequencing - Actual Sequencing of RNA not DNA
To date, almost all RNA-seq studies, including all Illumina based RNA-sequencing involves the
sequencing of cDNA. In 2009, Helicos™ published a paper (7) describing their ability to directly
sequence RNA however this technology is not widely commercially available. In 2015, Oxford Nanopore
announced the ability to sequence RNA strands on the MiniION™ and PromethION™ without needing to
convert to double stranded DNA. Directly sequencing RNA offers the following advantages:
- Direct sequencing of RNA eliminates biases associated with transcription and provides a more
accurate measure of transcript profiles.
- Modified RNA bases can be directly measured, whereas with cDNA conversion they are lost
- It’s faster than cDNA library preparation, no reverse transcription step is necessary
Whole Transcriptome and mRNA Sequencing Services:
A. Standard differential mRNA expression library and sequencing services cost between $210 - $300
per sample
- 10 million reads per sample
- Minimum 1x50 single end read length
- mRNA library preparation with poly(A) beads
- Ligation based library preparation
- Appropriate for counting applications
Search for Standard Differential mRNA Expression Sequencing Services
B. Whole transcriptome library and sequencing services cost between $330 - $400 per
sample
- 50 million paired end reads per sample (25M reads in each direction)
- Appropriate for examining mRNA and non-coding transcripts
- Ligation based library preparation
- Appropriate for more contextual examinations of the transcriptome
Search for Whole Transcriptome Sequencing Services
C. High depth RNA sequencing services cost between $780 - $900 per sample
- 200 million paired end reads per sample (100M reads in each direction)
- Paired-end reads that are 2x75 or greater in length
- Ideal for transcript discovery, splice site identification, gene fusion detection, de novo
transcript assembly
Search for High Depth RNA Sequencing Services
References
- Trapnell, C., et al. (2013) Differential analysis of gene regulation at transcript resolution
with RNA-seq. Nature Biotechnology 31, 46-53.
- Langmead B, Salzberg S. (2012) Fast gapped-read alignment with Bowtie 2. Nature Methods.
9:357-359.
- Trapnell, C., et al. (2009) TopHat: discovering splice junctions with RNA-Seq. Bioinformatics
(Oxford, England), 25(9):1105-1111.
- Trapnell, C., et al. (2010). Transcript assembly and quantification by RNA-Seq reveals
unannotated transcripts and isoform switching during cell differentiation. Nature Biotechnology,
28(5): 511-515.
- Trapnell,C., et al. (2012) Differential gene and transcript expression analysis of RNA-seq
experiments with TopHat and Cufflinks. Nature Protocols 7,562–578.
- Gu, W., et al., 2016. Depletion of Abundant Sequences by Hybridization (DASH): using Cas9 to
remove unwanted high-abundance species in sequencing libraries and molecular counting
applications. Genome Biology 17:41.
- Ozsolak, F., et al., 2009. Direct RNA Sequencing. Nature 461, 814-818.