The term RNA-Seq refers to a next generation sequencing approach that offers a snapshot of the entire transcriptome or messenger RNA (mRNA) profile at a given moment in time. RNA-Seq allows for the detection of transcript isoforms, allele specific gene expression, gene fusions, and single nucleotide variants, all without the need for knowing anything about the sample’s sequence composition. The term RNA-Seq is frequently inaccurately used, as RNA is not directly sequenced. Single RNA strands are converted to complementary DNAs (cDNA) and then turned into double stranded DNA before being sequenced. So while the initial starting input material is RNA, material loaded on the sequencing instrument is DNA.
Most applications of RNA-Seq fall under two broad categories:
Whole transcriptome sequencing by RNA-Seq involves a snap-shot measurement of the complete complement of transcripts in a cell. These include transcripts such as mRNA and all non-coding RNAs. By looking at the whole transcriptome, researchers are able to determine global expression levels of each transcript, identify exons, introns and map their boundaries. In addition, RNA-Seq can be used for the identification of splicing variants. To accurately look at the whole transcriptome, most library preparation protocols first start with the removal of ribosomal RNA (rRNA) which otherwise takes up the majority of all sequencing reads. Assuming you’re not interested in ribosomal RNA, removing these transcripts allows for more of the sequencing reads to be focused on transcripts you’re actually interested in sequencing, giving you improved sensitivity toward low expressed transcripts.
Messenger RNA-Seq or mRNA-Seq is a targeted RNA-Seq protocol that enriches for all polyadenylated (poly-A) transcripts of the transcriptome. mRNA-Seq is a method used in studying transcription in disease states as well as expression in variety of research based applications. Only around 1-2% of the entire transcriptome is comprised of poly-A tailed RNA, the coding part of the genome. By targeting mRNA, sequencing depth is improved as resources are dedicated to the sequencing of coding genes. This makes identifying rare variants and low expressed mRNA transcripts easier.
While whole transcriptome and mRNA-seq represent ~90% of all RNA based sequencing applications, it’s important to not lose site of the myriad of new protocols available to detect transcription events, RNA-protein interactions, RNA modifications, RNA structure and low input RNA. Several examples of these are listed here:
Commonly used RNA-Seq library preparation kits for whole transcriptome or mRNA-Seq can be found below.
The NEXTflex Rapid Directional RNA-Seq Kit is optimized for starting inputs between 10 ng – 1 ug of Total RNA or 10 – 100 ng of purified mRNA or ribosomal depleted RNA. The kit utilizes a “directional” or “stranded” approach that identifies from which of two DNA strands a given RNA transcript was derived. When RNA is copied back into cDNA during RNA-Seq library prep, the information about which of the strands was copied into RNA is lost unless a “directional” or “stranded” approach is used to preserve it’s identity. Strandedness is useful for transcription annotation, increases the percentage of alignable reads and provides insight into antisense transcription. Stranded character is retained due to the directionality of the adapters added. During second strand synthesis, dUTP is used in place of dTTP. Just before PCR, Uracil DNA Glycosylase (UDG) is used to catalyze excision of the uracil base, cleaving the uridine containing strand.
This kit’s unique characteristics include a reverse transcription step that’s performed at 50ºC instead of 42ºC. The NEXTflex Rapid Thermostable enzyme included in the kit reduces secondary structure in RNA before transcription, improves yield and read through in complex GC regions. The protocol combines end-repair with second strand synthesis, reducing the need for a separate step. The kit is available with magnetic mRNA poly(A) beads and up to 48 RNA barcodes.
The NEXTflex Rapid RNA-Seq Kit is similar to the Rapid Directional RNA-Seq kit above, but does not provide strand information. Second strand synthesis does not contain dUTP and UDG is not used prior to PCR. The kit does contain the same thermostable reverse transcriptase step and fast workflow. Magnetic mRNA beads and up to 48 barcodes are also available.
Similar to the NEXTflex Rapid Directional RNA-Seq kit, this kit uses a stranded approach to determine from which two DNA strands a given RNA transcript was derived. The main feature of this kit has to do with the ‘Q’ in qRNA-Seq which stands for quantitative. This kit utilizes a series of 9,216 molecular labels during ligation to ensure that each fragmented molecule is tagged with a unique index. The unique labels help differentiate between duplicates and unique fragments. While de-duplication is typically performed using start and stop sites, fragments with similar start and stop sites are collapsed and lost during counting. Using stochastic labels, this kit gives a more accurate representation of transcript expression. The protocol is similar to the previously mentioned Directional RNA-Seq kit, but includes molecular indices and up to 96 sample barcodes which are added during PCR.
SMART stands for “Switching Mechanism at 5’ End of RNA Template”. The procedure allows you to add known sequence to the 3’ and 5’ ends of the cDNA fragment without ligation. Generally ligation based approaches yield a maximum of 40-50% doubly ligated product. By using a ligation-free approach, the user is able to start with significantly lower inputs of material. The Clontech SMARTer kit allows you to start with as little as 1-2 ng of total RNA which makes this kit ideal for laser capture microscopy, cells sorted with flow cytometry or other techniques that yield small input RNA amounts. SMART technology’s benefits are that it enriches for full length transcripts and maintains true representation of the original mRNA transcripts, factors that are critical for transcriptome sequencing and gene expression analysis.
The ScriptSeq kit has been developed for preparing libraries from 500 pg - 50 ng of either ribosomal depleted or polyA enriched RNA. Directional and paired end libraries can be constructed using this kit for Illumina platforms.
The Gnomegen RNA profiling is designed specifically for identifying the 3’ regions of RNA transcripts. It is marketed as a cost effective alternative to whole transcriptome sequencing, as it focuses mainly on identifying changes in RNA expression levels. It is not designed for RNA discovery, annotation or identification of alternative splicing sites. The input material required for library construction is as low as 1 ng of total RNA and can be performed in 1 day.
The TruSeq RNA Kit from Illumina is designed for generating mRNA libraries from total RNA. The kit is optimized for starting inputs between 0.1 - 4 µg of total RNA, but the user may also start with the entire fraction of mRNA isolated from 0.1 - 4 µg of total RNA. Needless to say, the kit comes with polyA beads for the purification procedure. All reagents in the kit are master mixed for ease of use during library construction. The v2 version of this kit contains 12 indexing adapters, with up to 24 available. There are no other significant differences between v1 and v2. Illumina recommends the use of high quality starting material, particularly total RNA with a high RNA integrity number (RIN) greater than or equal to 8. Alternatively, the user may run their sample on a gel and look for a 28S band that is twice as intense as the 18S band. The kit contains all the components you need for RNA-Seq library prep, except for the reverse transcriptase enzyme. Illumina recommends the purchase of SuperScript II for first strand synthesis.
The TruSeq Stranded kit is essentially the same as the Illumina TruSeq RNA kit except for the fact it provides “directional” or “stranded” information that identifies from which of two DNA strands a given RNA transcript was derived. When RNA is copied back into cDNA during RNA-Seq library prep, the information about which of the strands was copied into RNA is lost unless a “directional” or “stranded” approach is used to preserve its identity. Strandedness is useful for transcription annotation, increases the percentage of alignable reads and provides insight into antisense transcription. Stranded character is retained due to the directionality of the adapters. The P7 adapter is on the 3’ end of the cDNA strand. During second strand synthesis, dUTP is used in place of dTTP, eliminating the second strand during amplification since the PCR polymerase used cannot read through dUTP.
The kit is optimized for 0.1 - 4 µg of total RNA input and may be used with either polyA or ribosomal depletion beads. The kit includes 12 adapter barcodes with a total of 24 available in the “low-throughput” version of the kit. The “high-throughput” version contains 96 unique dual indexed adapters. The move to dual indexed adapters is a strategy that decreases the number of adapters that need to be synthesized to make 96. Essentially 12 adapters containing unique indices are annealed with 8 adapters containing another set of unique indices, allowing for 96 possible combinations, with the need to only synthesize 20 adapters. There are no particular user benefits of “dual” over “single” indices. In fact, the main disadvantage is that using “dual” indices, there are several sequencing cycles that are lost, reading “dark regions”.
The TruSeq Stranded mRNA kit comes with the polyA beads, while the TruSeq Stranded Total RNA kit contains ribosomal depletion beads. The ribo-depletion beads are essentially from Epicentre (Ribo-Zero) and contain biotinylated probes that selectively bind and remove rRNA. By depleting rRNA, the user is able to not only sequence polyA genes, but a broad range of other non-coding transcripts, including long non-coding RNA (lincRNA), small nuclear RNA (snRNA), small nucleolar RNA (snoRNA) and other species. The TruSeq Stranded Total RNA Kit can be purchased with Ribo-Zero Gold beads, designed for rRNA removal across human, mouse and rat species, Ribo-Zero Globin, for removal of rRNA and globin in a single step and Ribo-Zero Plant for specific removal of cytoplasmic mitochondrial and chloroplast rRNA from leaf, seed and root tissue.
The NEB NEBNext Ultra Directional RNA kit requires 100 ng - 1 µg of Total RNA or 10 - 100 ng of purified mRNA or ribosomal depleted RNA as starting input. The protocol is similar to Illumina’s TruSeq Directional RNA-Seq Kit.
See Genohub's up-to-date list of available library prep services for the following applications:
RNA (polyA-selected) Illumina library prep services
RNA (rRNA-depleted) Illumina library prep services
Evaluating the quality of your data and extracting biological importance is one of the most important steps in any RNA-Seq application. It’s important to discuss your project with an experienced bioinformatician or learn about the best tools to properly analyze your data. We do not recommend the one pipeline fits all approach offered by several commercial black-box software providers. Each RNA tool (and there are many) needs to be carefully examined to see if it is the right tool for your application. With that disclaimer, we recommend starting with the Tuxedo suite of software for differential gene and transcript expression analysis of RNA-seq experiments (1). Tuxedo's tools enable short-read mapping, identification of splice junctions, transcript and isoform detection. The tools are open source and most importantly use peer-reviewed statistical methods (2-5).
3 main components to a Tuxedo based RNA-Seq data analysis pipeline:
Bowtie2 is a fast, memory efficient aligner designed to quickly align large sets of short reads to the genome. Bowtie is the basis for tools like TopHat and Cufflinks.
Tophat is a fast splice junction mapper that can be used with Bowtie or Bowtie2. Tophat uses Bowtie to map RNA-seq reads and then analyze the mapping results to identify splice junctions between exons.
Cufflinks uses alignment data from Tophat and assembles RNA-seq reads into transcripts, provides an abundance estimate, measures differential expression and regulation of the transcriptome. Cuffdiff can measure differential expression levels from CDS, gene, isoform and TSS transcript level.
To properly interpret read count differences in RNA-Seq data you should consider where variation can be introduced. When thinking about the replicates you need for a RNA-Seq study, you're likely thinking about biological replicates. As a general answer we recommend at least 6 biological replicates per sample group. For a more detailed and ‘experimentally validated’ answer, we recommend you read How Many Replicates are Sufficient for Differential Gene Expression. Replicates for experimental variation should also be considered. These include:
In paired-end sequencing, both ends of a transcript are sequenced as opposed to single-end sequencing where one end is sequenced. Advantages of paired-end sequencing for RNA applications include better alignment and transcript assembly. As a result, paired end sequencing is recommended for the following RNA-seq applications:
Single end sequencing is sufficient for differential expression studies, where you’re interested in examining a profile of all coding transcripts in a sample. For counting applications, sequencing both ends of a transcript is not critical.
The most common methods for removing rRNA from total RNA are:
Oligo-bead based. Biotin labeled oligonucleotides complementary to rRNA or other non-coding RNAs are mixed with your total RNA, hybridize and pulled down by streptavidin beads.
RNAseH cleavage. DNA oligos designed to hybridize to rRNA are incubated with total RNA before RNAseH is introduced. RNAseH belongs to a family of non-specific endonucleases and catalyzes cleavage of 3’O-P (phosphodiester) bonds of RNA in DNA/RNA duplexes. After a cleanup, rRNA is no longer available for reverse transcription.
Priming. Reverse transcription primer sets can been designed, e.g. oligo(dT) primers or 'not so random priming' to specifically avoid rRNA and other non-coding products.
CRISPR/Cas9. CRISPR has been shown in at least one publication (6) to specifically cleave rRNA targets, and knock them down.
We recommend the Illumina Nextseq, HiSeq 2500 and HiSeq 3K/4K instruments. They each offer enough throughput and a variety of read length offerings to make them completely suitable for all RNA sequencing applications.
Yes, almost all commercially available library preparation kits use techniques to preserve strand information. Directional or stranded RNA-seq identifies from which of two DNA strands a given RNA transcript was derived. When RNA is copied into cDNA during transcription, the information about which of the strands was copied into RNA is lost unless a “directional” or “stranded” approach is used to preserve it’s identity. Strandedness is useful for transcription annotation, increases the percentage of alignable reads and provides insight into antisense transcription.
mRNA-seq offers greater read depth than whole transcriptome sequencing. mRNA-seq is a technique that enriches poly-adenylated RNA so sequencing reads are focused on a subset of RNA, giving you higher sequencing depth. For whole transcriptome work, rRNA depletion will remove rRNA, focusing reads on mRNA and other non-coding RNAs.
Longer read lengths are important for de novo transcript assembly and identifying transcript isoforms. We recommend paired 2x100 or 2x150 read lengths for these applications. For mRNA differential gene expression studies, a long read length is typically not required when there is an available reference genome. For these studies we recommend single end 1x50 or 1x100 read lengths.
To date, almost all RNA-seq studies, including all Illumina based RNA-sequencing involves the sequencing of cDNA. In 2009, Helicos™ published a paper (7) describing their ability to directly sequence RNA however this technology is not widely commercially available. In 2015, Oxford Nanopore announced the ability to sequence RNA strands on the MiniION™ and PromethION™ without needing to convert to double stranded DNA. Directly sequencing RNA offers the following advantages: