RNA sequencing has been proposed as a means of increasing diagnostic rates in studies of undiagnosed rare inherited disease. Recent studies have reported diagnostic improvements in the range of 7.5–35% by profiling splicing, gene expression quantification and allele specific expression. To-date however, no study has systematically assessed the presence of gene-fusion transcripts in cases of germline disease. Fusion transcripts are routinely identified in cancer studies and are increasingly recognized as having diagnostic, prognostic or therapeutic relevance. Isolated reports exist of fusion transcripts being detected in cases of developmental and neurological phenotypes, and thus, systematic application of fusion detection to germline conditions may further increase diagnostic rates. However, current fusion detection methods are unsuited to the investigation of germline disease due to performance biases arising from their development using tumor, cell-line or in-silico data.
Mayo Clinic researchers describe a tailored approach to fusion candidate identification and prioritization in a cohort of 47 undiagnosed, suspected inherited disease patients. They modify an existing fusion transcript detection algorithm by eliminating its cell line-derived filtering steps, and instead, prioritize candidates using a custom workflow that integrates genomic and transcriptomic sequence alignment, biological and technical annotations, customized categorization logic, and phenotypic prioritization.
The researchers demonstrate that our approach to fusion transcript identification and prioritization detects genuine fusion events excluded by standard analyses and efficiently removes phenotypically unimportant candidates and false positive events, resulting in a reduced candidate list enriched for events with potential phenotypic relevance. They describe the successful genetic resolution of two previously undiagnosed disease cases through the detection of pathogenic fusion transcripts. Furthermore, we report the experimental validation of five additional cases of fusion transcripts with potential phenotypic relevance.
Fusion candidate BLAST categorization rationale
Putative fusion sequences were BLASTN aligned to the human genome and transcriptome to enable categorization. A) Candidates aligning to abundant hematological genes (Globins, T-cell receptors) were not considered further due to their overrepresentation in blood samples and observed overrepresentation in fusion analysis results. These might represent artifacts or transient biological events. B & C) Full length candidates producing unbroken alignments against the human transcriptome or genome were classified as likely known transcripts or genomic sequence respectively. D) When the candidate produced no alignment against the human genome or transcriptome or only a part alignment was possible, the candidate was classified as a likely artifact, potentially containing low quality or non-human sequence including adapters. E) When the candidate produced multiple alignments within the gene boundaries of a single gene but did not align completely to a known transcript, it was classified as a potential novel transcript of a known gene. This category has the also potential to capture aberrant single-gene events. F) When the candidate produced two hits to separate immunoglobulins the event was classed as potentially representing immune diversity. Alternatively these may be generated by alignment artifacts due to high homology between immunoglobulin genes. G) When two distinct alignments were produced against two different chromosomes, the candidate was defined as a potential interchromosomal fusion. Fused genes with known homology were flagged to enable additional checking for alignment artifacts. H) When the candidate aligned to two distinct genes or regions on a single chromosome, it was classified as a potential intrachromosomal fusion. Fused genes with known homology were flagged to enable additional checking for alignment artifacts. Intrachromosomal candidates occurring between neighboring genes were annotated as potential read-through events. These events could represent true fusions or aberrant transcriptional events but might also represent biologically normal events that occur due to co-transcription of neighboring genes that have yet to be re-classified as single genes. Interchromosomal and intrachromosomal candidates were annotated as homologous when the two hits occurred against known homologous genes based on the Duplicated Genes Database (http://dgd.genouest.org/). Such instances might represent artifacts due to misalignment between closely homologous genes or might equally represent true aberrant events, preferentially occurring due to homology at the genomic sequence level.
The approach described here can be implemented to enable the detection of phenotypically relevant fusion transcripts in studies of rare inherited disease. Fusion transcript detection has the potential to increase diagnostic rates in rare inherited disease and should be included in RNA-based analytical pipelines aimed at genetic diagnosis.