RNASequel – accurate and repeat tolerant realignment of RNA-seq reads

RNA-seq is a key technology for understanding the biology of the cell because of its ability to profile transcriptional and post-transcriptional regulation at single nucleotide resolutions. Compared to DNA sequencing alignment algorithms, RNA-seq alignment algorithms have a diminished ability to accurately detect and map base pair substitutions, gaps, discordant pairs and repetitive regions. These shortcomings adversely affect experiments that require a high degree of accuracy, notably the ability to detect RNA editing.

rna-seq

RNA-seq realignment schematic. A spliced read aligner is used to identify sample specific novel splice junctions that are used to generate a splice junction index. Read 1 and read2 from each read pair are independently mapped to the genome and splice junction index using a contiguous read aligner. Low quality alignments are removed, the genomic and splice junction alignments are merged and the read pairs are resolved using an empirically determined fragment size distribution.

Researchers at the University of Toronto have developed RNASequel, a software package that runs as a post-processing step in conjunction with an RNA-seq aligner and systematically corrects common alignment artifacts. Its key innovations are a two-pass splice junction alignment system that includes de novo splice junctions and the use of an empirically determined estimate of the fragment size distribution when resolving read pairs. They demonstrate that RNASequel produces improved alignments when used in conjunction with STAR or Tophat2 using two simulated datasets. They then show that RNASequel improves the identification of adenosine to inosine RNA editing sites on biological datasets. This software will be useful in applications requiring the accurate identification of variants in RNA sequencing data, the discovery of RNA editing sites and the analysis of alternative splicing.

rna-seq

Alignment rates as percentages of the total number of pairs for the first (A) and second (B) simulated datasets with the indicated alignment methods. For a description of the alignment types see the benchmarking methods description.

Availability – RNASequel implemented in C++ is available under the GNU Public License from: https://github.com/GWW/RNASequel.

Wilson GW, Stein LD. (2015) RNASequel: accurate and repeat tolerant realignment of RNA-seq reads. Nucleic Acids Res [Epub ahead of print]. [article]

Leave a Reply

Your email address will not be published. Required fields are marked *

*

Time limit is exhausted. Please reload CAPTCHA.