Single-cell RNA sequencing has enabled the decomposition of complex tissues into functionally distinct cell types. Often, investigators wish to assign cells to cell types through unsupervised clustering followed by manual annotation or via ‘mapping’ to existing data. However, manual interpretation scales poorly to large datasets, mapping approaches require purified or pre-annotated data and both are prone to batch effects.
To overcome these issues, researchers from the British Columbia Cancer Research Centre have developed CellAssign, a probabilistic model that leverages prior knowledge of cell-type marker genes to annotate single-cell RNA sequencing data into predefined or de novo cell types. CellAssign automates the process of assigning cells in a highly scalable manner across large datasets while controlling for batch and sample effects. The developers demonstrate the advantages of CellAssign through extensive simulations and analysis of tumor microenvironment composition in high-grade serous ovarian cancer and follicular lymphoma.
Overview of CellAssign
a, CellAssign takes raw count data from a heterogeneous scRNA-seq population, along with a set of known marker genes, for various cell types under study. Using CellAssign for inference, each cell is probabilistically assigned to a given cell type without any need for manual annotation or intervention, accounting for any batch- or sample-specific effects. b, An overview of the CellAssign probabilistic graphical model. The random variables and data that form the model, along with the distributional assumptions, are shown. c, Descriptions of the random variables used in the CellAssign probabilistic model, along with their prior distributions.