Annotating unknown clusters in scRNAseq data using CIPR:A cross-species reference comparison

scRNAseq

Author

Atakan Ekiz

Published

July 27, 2020

We have recently published a Shiny (R) app called Cluster Identity Predictor (CIPR) in BMC Bioinformatics that helps annotate unknown clusters in single cell RNAseq (scRNAseq) datasets.

CIPR does its thing by comparing the gene expression signatures of unknown clusters against signatures from known cell types. CIPR features 7 reference datasets (2 from mouse and 5 from humans) in addition to the ability to provide a custom reference dataset (see more here). The pre-loaded reference files are as follows:

Immunological Genome Project (ImmGen) microarray data from sorted mouse immune cells. This dataset contains 296 samples from 20 different cell types (253 subtypes).

Mouse RNAseq data from sorted cells reported in Benayoun et al. (2019). This dataset contains 358 sorted immune and nonimmune samples from 18 different lineages (28 subtypes).

Blueprint/Encode RNAseq data that contains 259 sorted human immune and nonimmune samples from 24 different lineages (43 subtypes).

Human Primary Cell Atlas that contains microarray data from 713 sorted immune and nonimmune cells (37 main cell types and 157 subtypes).

DICE (Database for Immune Cell Expression(/eQTLs/Epigenomics) that contains 1561 human immune samples from 5 main cell types (15 subtypes).

Human microarray data from sorted hematopoietic cells reported in Novershtern et al. (2011). This dataset contains data from 211 samples and 17 main cell types (38 subtypes)

Human RNAseq data from sorted cells reported in Monaco et al. (2019). This dataset contains 114 samples originating from 11 main cell types (29 subtypes)

I routinely use CIPR for analyzing scRNAseq datasets and I was curious to see how predictions would change if I used different reference datasets on the same experimental data. To satisfy my curiosity, I tested CIPR on scRNAseq data from Tirosh et al. published in Science in 2016. This highly cited study characterizes the immune landscape in human melanoma tumors.

Using Seurat R package, I found 15 single cell clusters (numbered from 0 to 14) in this dataset, and I examined the identity scores predicted by CIPR using differentially expressed genes (logFC comparison method). The top predictions are summarized in the table below which shows unknown clusters (first column) and the predictions using different reference datasets (note that Immgen reference originates from mouse while the others are human references):

Cluster	Immgen	BP-Encode	HPCA	DICE	Hematopoietic diff	Presorted
0	CD8 Eff	CD8 Tem	Tgd	CD9 Nai-act	CD8 Tem	CD8 Tem
1	CD4 Nai-Early act	Treg	Treg-CD4 Tcm	CD4 Th17-Nai act	Nai CD4	Tfh
2	CD8 Nai-Early act	CD8 Tem	CD8 Tcm	CD4 Th1/17	CD8 Tcm-Tem	MAIT-gDT
3	Nai CD4	CD4	Nai CD4	Treg	Nai CD4	Nai CD8
4	Neut	Mono-Neut	Mono	Mono	Mono	Neut
5	Pre-T	NK	Tgd	NK	NK	MAIT-gdT
6	B cell	B cell	B cell	B cell	B cell	B cell
7	B cell	B cell	B cell	B cell	B cell	B cell
8	NK	NK	NK	NK	NK	NK
9	Stromal-Eosino	Muscle-B cell	Gametocyte-B cell	B cell	Erythroid-B cell	B cell
10	CD8 Eff	CD8 Tcm-Tem	CD4 Tem	CD4 Th1/17	CD8 Tem	CD8 Eff
11	Mac	Epithelial-Mac	Tissue stem-DC/Mac	Mono	pDC	pDC
12	Neut	Muscle-Neut	Tissue stem-DC	Mono	pDC	Neut
13	B cell	B cell	B cell	B cell	B cell	B cell
14	B cell	B cell	B cell	B cell	B cell	B cell

At a glance, we can see some clusters are annotated consistently while some others show more variance. It is pretty interesting that natural killer cells (NK) and B cell clusters were identified similarly by all the reference datasets, including the ImmGen reference. This suggests that NK cells and B cells are characterized by a unique gene signature that is consistent between mice and humans!

On the other hand, myeloid cells (such as macrophages, monocytes, and granulocytes) show high variance, indicating that these cells have overlapping gene signatures which makes it difficult to tease them apart. Along these lines, myeloid cell subsets may also have a higher degree of species-specificity.

If you used CIPR in your studies, let me know how it worked for you!