Shape analysis of high-throughput transcriptomics experiment data

The recent growth of high-throughput transcriptome technology has been paralleled by the development of statistical methodologies to analyze the data they produce. Some of these newly developed methods are based on the assumption that the data observed or a transformation of the data are relatively symmetric with light tails, usually summarized by assuming a Gaussian random component. It is indeed very difficult to assess this assumption for small sample sizes.

In this article, researchers from the University of Maryland utilize L-moments statistics as the basis of exploratory data analysis, the assessment of distributional assumptions, and the hypothesis testing of high-throughput transcriptomic data. In particular, they use L-moments ratios for assessing the shape (skewness and kurtosis) of high-throughput transcriptome data. Based on these statistics, they propose an algorithm for identifying genes with distributions that are markedly different from the majority in the data. In addition, the researchers also illustrate the utility of this framework to characterize the robustness of distributional assumptions. They apply it to RNA-seq data and find that methods based on the simple test for differential expression analysis using L-moments as weights are robust.

rna-seqInterpretation of the SO-plot. Based on a sample of size 6, examples of 4 main types of sample shape (bottom row) and where they occur on the SO-plot (top row) are shown.

Okrah K, Corrada Bravo H. (2015) Shape analysis of high-throughput transcriptomics experiment data. Biostatistics [Epub ahead of print]. [abstract]

Leave a Reply

Your email address will not be published. Required fields are marked *


Time limit is exhausted. Please reload CAPTCHA.