Cellular composition in three brain cohorts from two brain regions
We analyzed three cohorts each consisting of post-mortem brains from AD and control subjects (Table S1), namely the Rush Religious Orders Study and Memory and Aging Project dorsolateral prefrontal cortex (DLPFC) [7, 8], Mayo Clinic temporal cortex (TCX-Mayo) [4, 12], and Mount Sinai VA Medical Center Brain Bank temporal cortex (TCX-MSBB) [18]. We generated the TCX-Mayo RNAseq dataset, and downloaded DLPFC and TCX-MSBB RNAseq datasets from the AMP-AD knowledge portal on Synapse (www.synapse.org).
Cell proportions (Table S2) were estimated for DLPFC, TCX-Mayo and TCX-MSBB datasets independently using the digital sorting algorithm (DSA) method [16] and the top 100 marker genes (Table S3) obtained from R package BRETIGEA [15] for each of the following cell types – neuron, oligodendrocyte, microglia, oligodendrocyte progenitor cell (OPC), astrocyte and endothelial cell.
An inspection about the pairwise correlation between marker genes (Fig. 1a) revealed that markers of OPC have poor median pairwise Pearson correlation values of 0.12 in DLPFC, 0.11 in TCX-Mayo and 0.06 in TCX-MSBB respectively, whereas among the other five cell types neuronal markers have the highest median correlation (0.68 in DLPFC, 0.78 in TCX-Mayo and 0.67 in TCX-MSBB), and microglia markers have the lowest correlation (0.37 in DLPFC, 0.42 in TCX-Mayo and 0.44 in TCX-MSBB). In addition, a computer simulation study (Fig. S1) demonstrated that the estimated proportions of OPC were not robust upon using different selection of marker genes. Therefore, we did not include OPC in downstream analyses in this study.
In all three datasets, neuronal cell proportion estimates were significantly lower in AD compared to controls (Fig. 1b). The magnitude of this decrease was the greatest for TCX-Mayo (AD mean proportion = 28.0%, Control = 35.7%; ratio of AD:control cell proportions = 0.78), followed by TCX-MSSM (AD = 42.3%, control = 49.3%; ratio = 0.87) and DLPFC (AD = 42.4%, control = 47.4%; ratio = 0.89). The estimated proportions of microglia were significantly higher in AD vs. controls for all datasets, with higher magnitude in TCX-Mayo (AD:control ratio = 1.19) and TCX-MSBB (AD:control ratio = 1.19) than for DLPFC (AD:control ratio = 1.06). The estimated proportions of astrocytes and endothelial cells were significantly higher in AD vs. controls for DLPFC and TCX-Mayo datasets, although the magnitude was greater in TCX-Mayo (1.40 and 1.30 respectively) than in DLPFC (1.07 and 1.14 respectively) for both cell types. Oligodendrocyte proportion is significantly higher in AD in DLPFC with AD:control ratio 1.14 and TCX-MSBB with AD:control ratio 1.27, although remains unchanged in TCX-Mayo with the ratio 0.94. Collectively, these findings demonstrate that the proportions of CNS cell types are different in post-mortem AD vs. control brains for most cell types. Although these proportional changes with AD are mostly consistent across the different studies, their extent varies across brain regions, with TCX tending towards higher magnitude of neuronal loss and microglia proliferation than DLPFC. It needs to be emphasized that the cell proportion changes estimated here are relative values, rather than absolute cell proportion changes between ADs and controls.
Differential expression analyses
In this study, three computational approaches were applied to identify cell intrinsic differential expression in individual cell types (CI-DEGs, Table S4-S6), namely CellCODE [14], PSEA [13] and our method WLC. Differentially expressed genes from bulk brain tissue (bulk-DEGs) were identified through linear regression without adjusting for cellular composition (Table S7). For the DLPFC, TCX-Mayo and TCX-MSBB datasets, we obtained bulk-DEGs and CI-DEGs from the three computer algorithms for neuronal, oligodendrocytic, microglial, astrocytic and endothelial cell types respectively.
We compared bulk-DEGs across the three datasets (Fig. 2a, top panel). Similarly, CI-DEGs from CellCODE, PSEA and WLC are compared across datasets (Fig. 2a, lower panels), such that CI-DEGs shared between datasets are required to be consistent in the designated cell type. All DEGs are identified at nominal p-value cutoff 0.05 and shared CI-DEGs have the same direction of change in the compared datasets. The ratio of overlap between any two datasets over all DEGs, i.e. the number in overlapping areas of the Venn diagram over the total number (Fig. 2a, top panel), is 30.0% or 1711/5697 in up-regulated bulk-DEGs, and 34.8% or 2214/6371 in down-regulated bulk-DEGs. This ratio of overlap in bulk-DEGs is much higher than that in CI-DEGs (2.7, 4.7 and 10.0% in up-regulated genes from CellCODE, PSEA and WLC respectively; 3.1, 6.8 and 9.3% in down-regulated genes from CellCODE, PSEA and WLC respectively).
Consensus CI-DEGs between DLPFC and TCX
To obtain the consensus CI-DEGs that are shared between DLPFC and TCX brain regions, we selected those CI-DEGs that are detected in “DLPFC and TCX-Mayo” or in “DLPFC and TCX-MSBB” under any of the three algorithms (Fig. S2). We combined all such genes, which collectively comprised the consensus CI-DEGs for each cell type (Fig. 2b). Similarly, consensus bulk-DEGs were the combined set of bulk-DEGs shared between “DLPFC and TCX-Mayo” or “DLPFC and TCX-MSBB”.
Most consensus CI-DEGs are from neuronal cells (N = 559), followed by oligodendrocytes (N = 260), whereas microglia contributed the least number (N = 101). The majority (65.5% or 366/559) of neuronal CI-DEGs is down-regulated in AD, and the majority (66.0% or 140/212) of endothelial CI-DEGs is up-regulated in AD, with other cell types lying in between. Some of these CI-DEGs are also among the 1000 marker genes of the corresponding cell type from BRETIGEA [15]; 14.7% or 82/559 of neuronal CI-DEGs are also neuronal markers, 25.4% or 66/260 of oligodendrocyte CI-DEGs are also oligodendrocyte markers, and other cell types lie in between.
With regards to consensus bulk-DEGs (Fig. S3), 28.2% of them (885/3135) are cell type markers; 10.4% neuronal markers, 5.6% oligodendrocyte, 3.4% microglia, 4.8% astrocyte and 4.0% endothelial markers. The above observations indicate that computational deconvolution algorithms could identify CI-DEGs for both marker genes and non-marker genes. Importantly, the proportion of non-marker CI-DEGs is greater than that in bulk-DEGs. This suggests that compared to bulk-DEGs, CI-DEGs may be capturing a greater proportion of expression changes that are not due to mere cell population changes.
We also compared the consensus bulk-DEGs with consensus CI-DEGs (Fig. S4). We determined that only a small fraction (15.0% or 29/193) of the up-regulated neuronal CI-DEGs was also present in up bulk-DEGs although the overlap is still significant (Fig. 2c). In comparison, most of the up-regulated CI-DEGs of the other four cell types were included in up bulk-DEGs. On the other hand, most (84.2% or 308/366) of the down-regulated neuronal CI-DEGs were also present in down bulk-DEGs, whereas most of the down-regulated CI-DEGs of the other four cell types were absent from this group. Since bulk-DEGs did not adjust for neuronal loss and gliosis in AD (Fig. 1b), its ability to identify up-regulated neuronal genes and down-regulated glial genes is likely to be compromised. For the same reason, bulk-DEGs may have a false inflation of detecting down-regulated neuronal and up-regulated glial genes.
Enriched GO terms of consensus CI-DEGs between DLPFC and TCX
To identify pathways implicated by CI-DEGs that are robust across brain regions, we performed Gene Ontology (GO) enrichment analysis [20, 21] for the consensus CI-DEGs, assessing separately those that are up vs. down in AD subjects (Table S8-S17). Figure 3 illustrates the top two enriched GO terms by enrichment p-values, after filtering out terms that encompass less than four CI-DEGs or are cellular compartments.
Consensus CI-DEGs revealed biological pathways that are perturbed in AD in specific brain cell types. Some of these pathways have previously been implicated in AD and others are novel. Down-regulated neuronal CI-DEGs were enriched in neuropeptide hormone activity (GO:0005184) and hormone activity (GO:0005179) pathways, which include VGF (a.k.a. neuroendocrine regulatory peptide 1) [22] and corticotropin releasing hormone (CRH) [23] (Table S13). Consensus up-regulated neuronal CI-DEGs were significantly enriched in potassium channel activity (GO:0005267) and regulation of ion transport (GO:0043269) pathways (Table S8). The latter GO term encompasses most of the genes from the former, and also includes other genes involved in neuronal functions such as the glutamate ionotropic receptor NMDA type subunit 1, GRIN1 [24] and SYT1, which encodes the synaptic vesicle protein, synaptotagmin [25].
Many of the most significant GO terms are related to key functions of the respective cell types for the glial CI-DEGs, as well. The top enriched pathway of down-regulated CI-DEGs in oligodendrocytes is myelination (GO:0042552), including myelin basic protein (MBP) [4], plasmolipin (PLLP) [4, 5], myelin and lymphocyte protein (MAL), and myelin-associated glycoprotein (MAG) [26] (Table S14). Up-regulated CI-DEGs of oligodendrocytes are enriched in ceramide biosynthetic process (GO:0046513) including ceramide synthase 4 (CERS4) and UDP glycosyltransferase 8 (UGT8) [5] (Table S9). Ceramide is a constituent of sphingomyelin, a sphingolipid which is particularly found in the myelin sheath; and also a multi-functional signaling molecule [27, 28]. Hence, both the down-regulated and the up-regulated oligodendroglial consensus CI-DEGs highlight different components of the myelin biology that are perturbed in AD.
Similarly, microglial, astrocytic and endothelial CI-DEGs also highlight processes pertinent to the functions of these cell types. Microglial up-regulated CI-DEGs are enriched in inflammatory response (GO:000695) and leukocyte activation (GO:0002696), which includes complement C3a receptor 1 (C3AR1) [29], interleukin 18 (IL18) [30, 31] and CCAAT enhancer binding protein alpha (CEBPA) [32] genes (Table S10).
Astrocytes, a cell type that plays a critical role in maintaining brain energy dynamics [33] and metabolism [34], show enrichment of oxidoreductase activity (GO:0016491) and drug metabolic process (GO:0017144) in down-regulated CI-DEGs which includes genes glutathione S-transferase mu 2 (GSTM2) [35] and thioredoxin2 (TXN2) [36] (Table S16). Astrocytic up-regulated consensus CI-DEGs are enriched for cell-cell junction assembly (GO:0007043) process (Table S11), including the astrocytic gap junction protein connexin43 (GJA1) [37], which was identified as a key regulator associated with AD related outcomes. The other top GO process for astrocytic up-regulated consensus CI-DEGs is adenylate cyclase-inhibiting G protein-coupled receptor signaling pathway (GO:0007193), which harbors adenylate cyclase 8 (ADCY8), involved in memory functions [38].
Finally, endothelial cells, which are crucial in maintaining blood-brain barrier integrity [39, 40], show enrichment of up-regulated DEGs in cytoskeleton organization (GO:0007010) and actin filament-based process (GO:0030029) (Table S12).
Importantly, some CI-DEGs highlight protein translation as a top perturbed biological pathway. Down-regulated microglial consensus CI-DEGs show enrichment in processes involved in protein translation (GO:0006614 and GO:0006613), which include ribosomal protein encoding genes [41,42,43] (Table S15). Similarly, down-regulated endothelial consensus CI-DEGs also harbor ribosomal protein encoding genes, with enrichment in protein translation related GO processes (GO:0006413 and GO:0006613) (Table S17).
Comparison of CI-DEGs from computational deconvolution vs. snRNAseq
We determined the extent to which each of the three computational deconvolution algorithms could detect CI-DEGs from bulk tissue by comparison of their results with those obtained in a published snRNAseq study [19]. The ROSMAP dataset utilized in our study has both bulk RNAseq from DLPFC (bulk-DLPFC) as well as snRNAseq (snDLPFC) in a subset of its participants [19]. We compared the bulk-DLPFC data deconvoluted with three different algorithms with the published snDLPFC [19] data. Endothelial CI-DEGs were not available from the snRNAseq study, therefore overlap of results could be assessed only for four cell types.
We tested the overlap between the top CI-DEGs for each cell type obtained from deconvoluted bulk-DLPFC and those from snDLPFC ranked by their p values (Fig. 4a). We evaluated the overlap for a range of top CI-DEGs up to top 1000 genes. Overlap for CI-DEGs that are either up (Fig. 4a, upper panel) or down (Fig. 4 a, lower panel) in AD were assessed separately. Hence, overlapping genes had both similar ranks and direction of effect in both deconvoluted bulk-DLPFC and snDLPFC analyses. We established the significance of overlap using simulations for a range of top ranked genes (N = 200, 600 and 1000) (Table S18).
Neuronal CI-DEGs retained their significance of overlap across all comparisons and for all algorithms, except for the top 1000 up-regulated neuronal CI-DEGs deconvoluted with PSEA. Microglial CI-DEGs had the least numbers of significant overlap for their top ranked genes. Astrocytic and oligodendrocytic top ranked CI-DEGs had significance of overlap between the neuronal and microglial results (Table S18). These findings are reflective of the abundance of these cell types, with the most abundant neurons having the most overlap for the top ranked CI-DEGs between deconvoluted bulk-DLPFC and snDLPFC.
Amongst these comparisons, we determined that the significance for overlap was best for all algorithms for the top ranked 600 genes. Using WLC deconvoluted results, the overlap for the top 600 CI-DEGs from bulk-DLPFC and snDNPFC are statistically significant for all eight comparisons (Fig. 4a). For the top 600 genes, overlap with CellCODE results is significant for all except down-regulated oligodendrocyte and up-regulated astrocyte CI-DEGs. For PSEA, none of the microglia CI-DEGs had significant overlap. PSEA results for the top 600 genes were otherwise significant for all but up-regulated oligodendrocyte and down-regulated astrocyte genes.
We also performed a comparison of CI-DEGs identified at nominal significance (p-value < 0.05) with each algorithm from bulk-DLPFC to nominally significant snDLPFC results (Fig. 4b, Table S19). As with the above comparison, genes that are either up or down in both deconvoluted bulk-DLPFC and snDLPFC data were analyzed separately for each cell type.
Not surprisingly, down-regulated neuronal CI-DEGs have the greatest overlap (537/3732 or 14.4% for WLC, 292/3213 or 9.1% for CellCODE, 415/3516 or 11.8% for PSEA). These overlaps are significant for all three algorithms (Table S19). Down-regulated CI-DEGs in microglia show the least proportion of overlap (9/723 or 1.2% for WLC, 4/609 or 0.66% for CellCODE, 16/820 or 2.0% for PSEA) (empirical p-value > 0.05). Significant overlap detected with WLC (all but down-regulated microglia) and PSEA (all but microglial results and up-regulated oligodendrocytes) were similar, whereas CellCODE results had significant overlaps only for the neuronal CI-DEGs (Table S19).