Skip to main content

Table 2 Summary of the clustering analysis approaches for scRNA-seq data

From: Guidelines for bioinformatics of single-cell sequencing data analysis in Alzheimer’s disease: review, recommendation, implementation and application

Method

Clustering strategy

Dimension reduction

Similarity

Notes

Expression-based

SC3 [111]

consensus k-means in multiple similarity matrices

PCA

Euclidean distance, Spearman’s correlation, Pearson’s correlation

Joint calculation of multiple similarity matrices increases the computational burden

SIMLR [112]

A Gaussian kernel is jointly learned on Euclidean and Spearman’s correlations to infer block structure in cell-cell similarity.

t-SNE on learned cell-cell similarity

Euclidean distance, Spearman’s correlation, Pearson’s correlation

Searches for consensus block structures in multiple similarities

DBSCAN [113]

density-based clustering

user choice (usually t-SNE is preferred)

NA

Results may vary due to the stochasticity of t-SNE

PhenoGraph [114]

k-nearest neighbor graph

NA

Jaccard index, Euclidean distance

Jaccard index is used to prune spurious links. GN modularity is optimized by Louvain’s algorithm

SNN-Cliq [115]

shared k-nearest neighbor graph

NA

Euclidean distance

Maximal clique search is performed for small cliques. Quasi-cliques connecting the detected maximal cliques are further detected to identify dense subnetworks.

MetaCell [116]

k-nearest neighbor graph

NA

Pearson’s correlation

A series of regularizations are performed to construct a balanced, symmetrized, and weighted graph. This is followed by a variant k-means search in the graph.

scvis [117]

Model-based deep generative modeling to train deep neural network-based model

Deep neural network-based

NA

Log-likelihood of noise model serves as the loss to train a deep auto-encoder-based model.

scVI [96]

Model-based deep generative modeling to train deep neural network-based model

Deep neural network-based

NA

Similar to scvis. Additional noise parameters for dropout reads by ZINB and library sizes as Gaussian noises.

DESC [118]

Neural network based dimension reduction + Louvain’s method-based iterative clustering.

Deep neural network-based

NA

Autoencoder learns cluster-specific gene expressions, and handles technical variances (e.g. batch effects) when they are smaller than biological variances. GPU enabled to scale up for millions of cells. Combination of Louvain’s clustering and t-distribution based cluster assignment refines the clusters iteratively in the bottleneck layer.

Genotype-based

demuxlet [110]

supervised clustering of cells based on genotypes

NA

NA

likelihood of cell belonging to an individual is calculated based on alternate allele frequency

Vireo [108]

supervised clustering of cells based on genotypes

NA

NA

variational Bayesian inference allows estimation on the number of unique individuals with distinct genotypes. Cells are assigned to the individual with maximum likelihood

scSplit [109]

unsupervised clustering of cells based on allele fraction model

NA

NA

Expectation-Maximization (EM) optimization of Allele Fraction model to probability of observing alternate alleles from individuals.

Souporcell [72]

mixture modeling

NA

NA

minimap2 instead of STAR aligner to optimize variant calling in scRNA-seq reads. The mixture model is fitted in the allele fraction model to perform clustering in genotype space.

DENDRO [107]

phylogeny reconstruction based on genetic divergence in cells

NA

NA

Intended for tumoral heterogeneity. Genetic divergence is modeled with nuisance variables such as dropout rates and library sizes.