New dual RNA-seq database measures gene activity in bacteria and their human hosts

A free-to-use database developed by researchers at the CRG and the UAB can help researchers and clinical experts to quickly detect the defensive genes that are triggered or evaded by multiple types of disease-causing bacteria, furthering the development of new types of antibiotics.

The database, called DualSeqDB, can also be used to detect whether different types of bacteria use similar genes to infect humans and other mammals, highlighting promising therapeutic targets for the development of new antibiotics. It is described in a new research paper published in the journal Nucleic Acids Research.

“Pneumonia, tuberculosis and other curable diseases caused by bacteria are still a major problem today, and although many of these are easily treated using antibiotics where they are available, the bacteria continue to evolve and some are becoming increasingly resistant to our antibiotic repertoire”, says Benjamin Lang, postdoctoral researcher at the Centre for Genomic Regulation (CRG) and one of the creators of the database.

According to the authors, scientists and clinical experts can use the database to identify promising new drug targets by comparing how infection processes work across unrelated bacterial species. Researchers can also add their own datasets to DualSeqDB so that they can be more easily accessed and used by other researchers around the world.

“The public health implications of running out of antibiotic treatment options are severe, especially as multi-resistant strains of common pathogens are appearing in clinics worldwide,” says Dr. Lang. “Studying gene activity during infection processes using sequencing is one of the most highly cost-effective methods of discovering new drug targets.”

DualSeqDB was created by identifying and combining previously existing datasets. Using a well-defined pipeline, the teams standardized the gene expression data from different sources so that it was directly comparable.

“To our knowledge, we have created the first database that contains information about how pathogens and their natural hosts simultaneously change their gene expression during infection”, says Javier Macho Rendón, predoctoral researcher at the Universitat Autònoma de Barcelona (UAB).

During infection, pathogens trigger the expression of unique genes that help them replicate within a host and ensure their survival. In turn, the host activates complex mechanisms to recognize and kill pathogens.

Researchers can track this arms race during the infection process using sequencing data from RNA transcripts simultaneously gathered from both bacteria and host. This ‘dual RNA-seq’ allows researchers to identify new traits at a molecular level that would otherwise remain undetected.
DualSeqDB was created by Gian Tartaglia and Benjamin Lang at the Centre for Genomic Regulation (CRG) and Javier Macho Rendón, Marc Ramos Llorens and Marc Torrent at the Universitat Autonoma de Barcelona (UAB).

Pipeline used to process raw sequencing data from dual RNA-seq studies

Pipeline used to process raw sequencing data from dual RNA-seq studies. Raw sequencing data were downloaded from the corresponding repository in FastQ format and adapter sequences were removed with Trimmomatic. Pathogen and host genomes and annotations were downloaded to build genome indices. Trimmed FastQ files were then mapped to the host index genome with HISAT2 and the unmapped reads were subsequently mapped to the pathogen index genome using Bowtie2. From this point onward, pathogen and host reads were analyzed in parallel. Mapped reads were quantified with FeatureCounts and their respective annotation files, creating a matrix of read counts. This matrix containing control and treated samples is then used as input for the DESeq2 R package to perform a differential expression analysis. The differential gene expression changes (measured as log2 FC) and corresponding P-values (Benjamini–Hochberg correction for multiple testing) were calculated using DESeq2.

Raw sequencing data were downloaded from the corresponding repository in FastQ format and adapter sequences were removed with Trimmomatic. Pathogen and host genomes and annotations were downloaded to build genome indices. Trimmed FastQ files were then mapped to the host index genome with HISAT2 and the unmapped reads were subsequently mapped to the pathogen index genome using Bowtie2. From this point onward, pathogen and host reads were analyzed in parallel. Mapped reads were quantified with FeatureCounts and their respective annotation files, creating a matrix of read counts. This matrix containing control and treated samples is then used as input for the DESeq2 R package to perform a differential expression analysis. The differential gene expression changes (measured as log2 FC) and corresponding P-values (Benjamini–Hochberg correction for multiple testing) were calculated using DESeq2.

“Given the current momentum of sequencing technologies in research and clinics, we expect that our database will grow continuously and become a comprehensive repository that will help us in the fight against infectious diseases,” concludes Dr. Marc Torrent, Associate Professor at the Department of Biochemistry and Molecular Biology at the UAB.

SourceUniversitat Autònoma de Barcelona

Availability – DualSeqDB is freely available at http://www.tartaglialab.com/dualseq.

Macho Rendón J, Lang B, Ramos Llorens M, Gaetano Tartaglia G, Torrent Burgas M. (2020) DualSeqDB: the host-pathogen dual RNA sequencing database for infection processes. Nucleic Acids Res gkaa890. [article]

Leave a Reply

Your email address will not be published. Required fields are marked *

*

Time limit is exhausted. Please reload CAPTCHA.