Ambiguous splice sites distinguish circRNA and linear splicing in the human genome

Identification of splice sites is critical to gene annotation and to determine which sequences control circRNA biogenesis. Full-length RNA transcripts could in principle complete annotations of introns and exons in genomes without external ontologies, i.e., ab initio. However, whether it is possible to reconstruct genomic positions where splicing occurs from full-length transcripts, even if sampled in the absence of noise, depends on the genome sequence composition. If it is not, there exist provable limits on the use of RNA-Seq to define splice locations (linear or circular) in the genome.

Stanford University researchers provide a formal definition of splice site ambiguity due to the genomic sequence by introducing a definition of equivalent junction, which is the set of local genomic positions resulting in the same RNA sequence when joined through RNA splicing. They show that equivalent junctions are prevalent in diverse eukaryotic genomes and occur in 88.64% and 78.64% of annotated human splice sites in linear and circRNA junctions, respectively. The observed fractions of equivalent junctions and the frequency of many individual motifs are statistically significant when compared against the null distribution computed via simulation or closed-form. The frequency of equivalent junctions establishes a fundamental limit on the possibility of ab initio reconstruction of RNA transcripts without appealing to the ontology of “GT-AG” boundaries defining introns. Said differently, completely ab initio is impossible in the vast majority of splice sites in annotated circRNAs and linear transcripts.

The set of junctions built for each gene and used for the equivalent junction analysis

rna-seq

For each gene, the set of junctions considered for the equivalent junction analysis is obtained as the union of all junction sets across different transcripts transcribed from the gene.

Availability: Two python scripts generating an equivalent junction sequence per junction are available at: https://github.com/salzmanlab/Equivalent-Junctions.

Dehghannasiri R, Szabo L, Salzman J. (2018) Ambiguous splice sites distinguish circRNA and linear splicing in the human genome. Bioinformatics [Epub ahead of print]. [abstract]

Leave a Reply

Your email address will not be published. Required fields are marked *

*

Time limit is exhausted. Please reload CAPTCHA.