Clustering Single-cell RNA-Seq (scRNA-Seq) data from multiple samples or conditions are often challenged by confounding factors, such as batch effects and biologically relevant variability. Existing batch effect removal methods typically require strong assumptions on the composition of cell populations being near identical across samples. Here we present CIDER, a meta-clustering workflow based on inter-group similarity measures. The prototype of this method is firstly applied in Hu et al., Cancer Cell 2020.
For more informtion please see our preprint. The published version and new citation information will be available soon.
If you have scRNA-Seq data corrected by an integration algorithm (e.g. Seurat-CCA, Harmony, Scanrama…). You can use CIDER to evaluate if the biological populations are correctly aligned.
Before running CIDER evaluation functions, make sure that you have a Seurat object (e.g.
seu.integrated) with corrected PCs in
seu.integrated@email@example.com. Seurat-CCA automatically put the corrected PCs there. If other methods are used, the corrected PCs can be added using
seu.integrated@firstname.lastname@example.org <- corrected.PCs.
library(CIDER) seu.integrated <- hdbscan.seurat(seu.integrated) ider <- getIDEr(seu.integrated, verbose = FALSE) seu.integrated <- estimateProb(seu.integrated, ider)
The evaluation scores (IDER-based similarity and empirical p values) can be visualised by the
scatterPlot function. A detailed tutorial of evaluation is available.
p1 <- scatterPlot(seu.integrated, "tsne", colour.by = "similarity") p2 <- scatterPlot(seu.integrated, "tsne", colour.by = "pvalue") plot_grid(p1,p2, ncol = 3)
seu is a Seurat object with initial clustering annotation stored in
initial_cluster of metadata and batch information in
Batch. The asCIDER example here contains two steps: computing IDER-based similarity matrix (
getIDER) and performing the final clustering (
ider <- getIDEr(seu, group.by.var = "initial_cluster", batch.by.var = "Batch") seu <- finalClustering(seu, ider, cutree.h = 0.45)
Here is the code used to compute the similarity matrix within one batch.
library(CIDER) # make sure have a colume called "Batch" in the data and assign a uniform value # to all cells, for example: seu$Batch <- "onebatch" # run the getDistMat, the input needs to be list(seu), the tmp.initial.clusters # is the group information that you want to compute the similarity matrix with. # The output is the similarity matrix. dist <- getDistMat(seu_list = list(seu), tmp.initial.clusters = "cell_type")
A more dedicated function is under construction.
Please use Issues to report bugs or seek help. Thank you!
Z. Hu, A. A. Ahmed, C. Yau. An interpretable meta-clustering framework for single-cell RNA-Seq data integration and evaluation. bioRxiv 2021.03.29.437525; doi: https://doi.org/10.1101/2021.03.29.437525