Single-cell RNA sequencing (scRNA-seq) is a powerful tool to study heterogeneity and dynamic changes in cell populations. Clustering scRNA-seq is essential in identifying new cell types and studying their characteristics. University of Pennsylvania researchers developed CellBIC (single Cell BImodal Clustering) to cluster scRNA-seq data based on modality in the gene expression distribution. Compared with classical bottom-up approaches that rely on a distance metric, CellBIC performs hierarchical clustering in a top-down manner. CellBIC outperformed the bottom-up hierarchical clustering approach and other recently developed clustering algorithms while maintaining the hierarchical structure of cells. Importantly, CellBIC identifies type 2 diabetes and age specific β cell signatures characterized by SIX3 and CDH2, respectively.
CellBIC implements top-down hierarchical clustering using bimodal pattern
(A) Step 1: Boolean membership is obtained using a Gaussian mixture model. (B) Step 2: A gene group is selected based on the Boolean membership. Only genes observed in one mode significantly are included. (C) Step 3: A membership matrix is obtained using the selected gene set. Cells are divided into two groups based on the membership matrix. (D) A top-down clustering is performed by applying A-C recursively. (E) A membership matrix obtained by CellBIC when using human pancreatic α and β cells (3). (F) A classical bottom-up hierarchical clustering using human pancreatic α and β cells (3). The point to cut the tree is not well defined for the bottom-up hierarchical clustering.