Sex dependent glial-specific changes in the chromatin accessibility landscape in late-onset Alzheimer’s disease brains

Background In the post-GWAS era, there is an unmet need to decode the underpinning genetic etiologies of late-onset Alzheimer’s disease (LOAD) and translate the associations to causation. Methods We conducted ATAC-seq profiling using NeuN sorted-nuclei from 40 frozen brain tissues to determine LOAD-specific changes in chromatin accessibility landscape in a cell-type specific manner. Results We identified 211 LOAD-specific differential chromatin accessibility sites in neuronal-nuclei, four of which overlapped with LOAD-GWAS regions (±100 kb of SNP). While the non-neuronal nuclei did not show LOAD-specific differences, stratification by sex identified 842 LOAD-specific chromatin accessibility sites in females. Seven of these sex-dependent sites in the non-neuronal samples overlapped LOAD-GWAS regions including APOE. LOAD loci were functionally validated using single-nuclei RNA-seq datasets. Conclusions Using brain sorted-nuclei enabled the identification of sex-dependent cell type-specific LOAD alterations in chromatin structure. These findings enhance the interpretation of LOAD-GWAS discoveries, provide potential pathomechanisms, and suggest novel LOAD-loci. Graphical Abstract Supplementary Information The online version contains supplementary material available at 10.1186/s13024-021-00481-0.


Graphical Abstract
Background Large multi-center genome-wide association studies (GWAS) have identified associations between numerous genomic loci and late-onset Alzheimer's disease (LOAD) [1][2][3][4][5][6]. The most recent GWAS meta-analysis reported a total of 25 LOAD-GWAS regions [7]. However, the precise disease-responsible genes, the specific causal genetic variants, and the molecular mechanisms of action that mediate their pathogenic effects are yet to be explained. Most LOAD-GWAS SNPs are in noncoding (intergenic and intragenic) genomic regions, and thus may have a gene regulatory function. Supporting this hypothesis, differential gene expression in brain tissues have been described between LOAD and healthy controls [8][9][10]. Moreover, several expression quantitative trait loci (eQTL) studies in brain tissues from cognitively normal [11] and LOAD [12][13][14][15] individuals reported overlaps with LOAD-GWAS loci. Collectively, these observations suggest that dysregulation of gene expression plays an important role in LOAD pathogenesis [16]. Noteworthy, integration of findings from LOAD epigenome wide association studies (EWAS) and GWAS also identified a number of shared loci [17][18][19][20][21][22][23][24], providing further support for the role of gene dysregulation in LOAD pathogenesis.
To date, most brain expression, DNA-methylation, and chromatin studies have used brain tissue homogenates that represent multiple cell-types, i.e. various types of neurons and types of glia cells. The heterogeneity of brain tissue makes it difficult to determine the specific brain cell-type responsible for the changes in gene expression and in epigenome landscape. The mixture of cell-types could potentially mask signals corresponding to a particular cell-type, especially if the causal cell types comprise a small fraction of the entire sample. An additional shortcoming of studying bulk brain tissues is the bias related to sample-to-sample differences in the celltype composition of the tissue. The obstacle of variability in the neuron:glia proportions across samples is even more pronounced when analyzing disease-affected brain tissues that underwent the neurodegeneration processes of neuronal loss and gliosis. Single cell experimental approaches can circumvent these limitations; however, these methods are costly for studying larger sample sizes and have been underutilized in the field of LOAD genetics. Frozen tissues pose additional technical challenges as it is difficult to isolate intact cell bodies. Fluorescence Activated Nuclei Sorting (FANS) was developed to extract, purify and sort nuclei (vs. cells) from frozen brain tissues [25] using nuclei neuronal markers, such as NeuN, to greatly reduce cellular heterogeneity found in bulk tissues, and allow characterization of neuronal (NeuN+) and non-neuronal (NeuN-) cell populations. Recently, two studies used FANS to analyze LOADspecific changes in DNA-methylation on a whole-genome level [26] and across the APOE locus [27]. Furthermore, two new studies used single-cell (sc)RNA-seq from cortex of LOAD patients. The first found that the strongest LOAD-associated changes appeared early in pathological progression and were highly cell-type specific [28], and the second identified LOAD-associated gene dysregulation in specific cell subpopulations, particularly for APOE and transcription factors [29]. These results further highlight the importance of cell-type specific analysis of human brain tissues to understanding of the molecular underpinnings and cellular basis of LOAD.
In this study, we performed NeuN-FANS from 40 archived frozen human brain samples (19 LOAD,21 normal), and used the assay for transposase-accessible chromatin using sequencing (ATAC-seq) to characterize the chromatin accessibility landscape in neuronal and non-neuronal nuclei. We identified over 170,000 chromatin accessibility differences between neuronal and non-neuronal nuclei. We also report LOAD-specific differences in chromatin accessibility in both neurons and non-neurons. Interestingly, while the neuronal changes appeared to be independent of sex, in the non-neuronal cells LOAD differences in chromatin accessibility were detected only in females. LOAD-specific differences in chromatin accessibility significantly overlap with known LOAD GWAS regions, and also point to new candidate LOAD loci. These results provide new insights into the mechanistic and sex-specific pathogenesis of this disorder.

Human brain tissue samples
Following quality-control filtering, the final dataset was generated using fresh-frozen temporal cortex from neurologically healthy controls (n = 21), mild LOAD (n = 16), and severe LOAD (n = 3) patients. These samples were obtained from the Kathleen Price Bryan Brain Bank (KPBBB) at Duke University, and the demographics for this cohort are included in Table 1 and detailed in Supplementary Table 2. Clinical diagnosis of LOAD was pathologically confirmed using Braak staging (AT8 immunostaining) and amyloid deposition assessment (4G8 immunostaining) for all LOAD samples. Braak staging was used to define mild (stages IV and below) vs. severe (stages V and VI) LOAD. All donors are Caucasians with APOE 33; post mortem interval (PMI) averaged 7.15 h (standard error of the mean (SEM) 0.81). The archived frozen tissues have high-quality DNA as required for genomic analyses, and RNA (RIN ≥ 8) suitable for quantitative RNA analyses. The project was approved by the Duke Institutional Review Board (IRB). The methods described were carried out in accordance with the relevant guidelines and regulations. Tissue samples were processed in random pairs of one normal and one LOAD patient. Tissue homogenization, nuclei extraction, FANS, and tagmentation were performed on each pair on the same day. Library preparation and sequencing was performed blinded to age, sex, and pathology. Out of this group of samples four female donors were analyzed by snRNA-seq, two of which were neurologically healthy controls and two age matched mild LOAD patients (Braak Stage III). These four samples were ages 79-90 with PMI averaged 8.02 h (SEM = 1.99) (Supplementary Table 2, marked '*'). Tissue samples were processed for snRNA-seq on the same day and in the same 10X Genomics microfluidics chip.

Tissue dissociation and nuclei extraction
Methods were performed according to established protocols [25,30,31] with some modifications. Briefly, 50 mg of frozen temporal cortex (gray matter) was thawed for 10 min on ice in lysis buffer (0.32 M Sucrose, 5 mM CaCl2, 3 mM magnesium acetate, 0.1 mM EDTA, 10 mM Tris-HCl pH 8, 1 mM DTT, 0.1% Triton X-100). The tissue was gently dissociated and homogenized in a 7 ml dounce tissue homogenizer (Corning) with approximately 25 strokes of pestle A in 45 s, then filtered through a 100 μm cell strainer. The filtered lysate was transferred to a 14 × 89 mm polypropylene ultracentrifuge tube, carefully underlaid with sucrose solution (1.8 M sucrose, 3 mM magnesium acetate, 1 mM DTT, 10 mM Tris-HCl, pH 8) and subjected to ultracentrifugation at 107,000 RCF for approximately 30 min at 4°C. Supernatant and the debris interphase were carefully aspirated, and 100 ul PBS (−Mg2+, −Ca2+) was added to the nuclei pellet. After a 5-min incubation on ice, nuclei were gently resuspended and transferred to a 1.5 ml polypropylene microcentrifuge tube for staining. For snRNA-seq downstream experiment, the above protocol was modified to optimize sample preparation for single-nuclei sorting. Briefly, 50 mg frozen temporal cortex (gray matter) was thawed for 20 min on ice in lysis buffer and ultracentrifuged at 107,000 RCF for approximately 10 min at 4°C. Supernatant and the debris interphase were carefully aspirated, and 500 ul wash and resuspension buffer (1X PBS, 1% BSA, 0.2 U/ul RNase Inhibitor) was added to the nuclei pellet. After a 5-min incubation on ice, nuclei were gently resuspended and centrifuged at 300 RCF for 5 min at 4°C. The supernatant was again aspirated and 500 ul wash and resuspension buffer without RNase Inhibitor (1X PBS, 1% BSA) was added to the nuclei pellet. After a 1-min incubation on ice, the nuclei were gently resuspended and filtered through a 35 um strainer. Ten microliters nuclei were taken for quality assessment and counting prior to library preparation.

Immunofluorescence microscopy
After homogenization and sucrose gradient ultracentrifugation, a portion of the nuclei was counted, resuspended in 4% PFA, stained, plated on 12 mm coverslips at 10,000 nuclei per coverslip, incubated 20 min at room temperature, mounted, and imaged on a confocal microscope.

Fluorescence-activated nuclei sorting (FANS) of neuronal and non-neuronal nuclei
Sorting was performed using a MoFlo Astrios flow cytometer (Beckman Coulter) equipped with a 70 μm nozzle, operating at 35 psi. Standard gating procedures were used. Briefly, the first gate allowed separation of intact nuclei from debris. The second gate allowed us to identify individual nuclei, and exclude doublets and other aggregates. The third and fourth gates distinguish between PE+ and PE-nuclei and allowed us to sort and separate NeuN+ nuclei from NeuN-nuclei. Nuclei were sorted into 1 ml PBS (−Mg2+, −Ca2+) in a 2 ml polypropylene tube pre-coated with 200 ul of 5% BSA and rotated at 20 rpm at 4C.

Omni-ATAC on FANS nuclei
Approximately 100,000-700,000 sorted nuclei (Supplementary Table 2) were used for ATAC-seq library preparation as described in the Omni-ATAC protocol [32]. Libraries were quantified by Qubit, and size distribution was inspected by Bioanalyzer (Agilent Genomic DNA chip, Agilent Technologies). Barcoded ATAC-seq libraries were combined into pools of 6 libraries and sequenced on an Illumina HiSeq 4000 sequencer (50 bp, single read) at the Duke Sequencing and Genomic Technologies shared resource.

Omni-ATAC-seq on bulk tissue
In addition to performing ATAC-seq on NeuN+/− sorted nuclei, we also compared to ATAC-seq performed on total nuclei isolated from frozen tissue. Approximately 50,000 nuclei from 25 mg of pulverized frozen tissue were used for the transposition reaction applying the Omni-ATAC protocol as described previously [32].
Data processing pipeline ATAC-seq libraries made from 83 glia samples and 80 neuron samples were sequenced on Hi-seq 4000. Raw fastq sequencing files were first processed through cutadapt (v 1.9.1) to remove adaptors and bases with quality scores < 30. Filtered reads were aligned to hg19 by Bow-tie2 (v 2.1.0) using default parameters [33]. Bam files were sorted by samtools (v 0.1.18) and duplicates were removed by Picard MarkDuplicates. Sequences that overlapped ENCODE "blacklist" regions (https://sites. google.com/site/anshulkundaje/projects/blacklists) were removed, and narrow open chromatin peaks were called by Model-based Analysis of ChIP-seq (MACS v 2.1), with parameters --nomodel --shift − 100 --ext 200, using a FDR cutoff of q < 0.01 or 0.05 [34]. For visualization on the UCSC browser, BigWig files were generated using wigToBigWig (v 4).

Evaluation of variables affecting chromatin peaks
Prior to the differential analyses described below, we considered the effect of 37 variables that could impact the quality of the ATAC-seq results for each sample. These variables were collected from key features of the ATAC-seq processing pipeline, as well as individual sample characteristics such as case-control status, sex, and age. For example, the metadata for each sample included transposase batch, nuclei sorting date, mass, mean GC percentage of sequenced reads, mean mapped read length, alignment quality metrics, subject age at death, sex, diagnosis, and postmortem interval (PMI). Two subjects were missing PMI, therefore we used the R package MICE [36] to impute missing values using the classification and regression trees methodology.

Covariate selection
For differential peak analyses, selection of covariates for adjustment was carefully tailored to each comparative analysis to account for the variable number of peaks between groups and to minimize false positives in peak calling. We performed a linear regression of all metadata variables against the first 10 principal components (PCs) of the TMM normalized peak quantification in an effort to identify covariates to be utilized in the differential expression analyses. In an iterative process, we selected one variable (preferentially a variable directly related to the ATAC-seq experiment, explaining one of the largest proportions of variance and with few parameters), regressed its effect on the peak quantifications and performed a new principal component analysis independent of the selected variable(s). We repeated this procedure until there were no more Bonferroni significant (q < 0.05) variables associated with the peak PCs. For the comparison of neuronal and non-neuronal nuclei, we selected the following variables for our differential chromatin analysis: NRF, number of nuclei, normalized peak calls, nuclei sort date, alignment percentage, passing base pairs, age, and donor. For the comparison of LOAD vs. control neuronal nuclei, we selected normalized peak calls, alignment percentage, nuclei sort date, and number of nuclei as covariates. For the comparison of LOAD vs. control non-neuronal nuclei, we included the number of nuclei, NRF, nuclei sort date, and alignment percentage as covariates in the differential chromatin analysis. When performing a sex-stratified analysis of LOAD vs. control samples in the NeuN+ nuclei, we selected alignment percentage in the female-only subset and nuclei sort date in the male-only subset. Finally, in the sexstratified differential chromatin analysis of LOAD vs. control NeuN-nuclei, we included normalized peak calls as a covariate in the female-only subset, and normalized peak calls and percent GC content as covariates in the male-only subset. In the supplemented comparison of LOAD vs. control NeuN-nuclei in females only (nine additional non-neuronal samples), percent GC content was included as a covariate.

Differential chromatin accessibility analyses
Differential ATAC-seq peaks were detected using EdgeR package (version 3.22.3), which models counts using a negative binomial distribution [37]. ATAC-seq reads corresponding to chrX and chrY were excluded due to unequal numbers of female and male donors. For each comparison, counts of reads within peaks which were merged from all of the replicates were extracted from BigWig files and normalized by weighted trimmed mean of M-values [38] (TMM). Quasi-likelihood F-tests (QLF) was performed to determine differential sites at cut off adjusted p values < 0.05. As an approximate error model, QLF works more robustly and gives more reliable Type I error rate control than the other options, especially when there are smaller numbers of replicates (EdgeR User Guides, Bioconductor package vignettes). Three levels of case-to-control chromatin accessibility comparisons were performed ( Since the number of female samples was less than that of male, we included 9 additional NeuN-datasets (those that did not have matching NeuN+ data from the same donor but met QC criteria) into Level 3 and processed an additional comparison: (E) NeuN-females,14 cases vs 13 control. For all these comparisons we included covariates (described in "covariate selection") in the EdgeR analysis with model design as open chromatin accessibility~Disease (LOAD state) + covariates. Importantly, for each analysis, we remerged and recalled the chromatin peaks. Thus, the separate analyses were not direct subsets of each other.

Genomic distribution analysis
ATAC-seq peaks were categorized as promoters (transcription start site to upstream 1 kb), 1st exon, intragenic (excluding 1st exon), 5′ UTR, 3′ UTR, and intergenic, based on human gene hg19 NCBI RefSeq gene information from the UCSC genome browser.

Gene ontology analysis
Differential open chromatin regions were characterized by GREAT (v 3.0.0) (http://bejerano.stanford.edu/great/ public/html/), using the hg19 genome as background regions. Genes associated to open chromatin regions were determined by default "basal plus extension" settings (i.e., 5 kb upstream, 1 kb downstream, plus distal up to 1000 kb).

Overlap with LOAD GWAS loci
The tag SNP for 25 LOAD loci were obtained from the literature [7]. Genome coordinates +/− 100 kb surrounding each GWAS tag SNP were used to identify any differential open chromatin region that mapped within the region. Permutation analyses were performed by randomly selecting the same numbers of sites from the union set of ATAC-seq peak calls. For each comparison, we performed 10,000 permutations to estimate the empirical P-value.

Motif search
Transcription factor motif enrichments were evaluated using HOMER [39] (v4. 10.3) and MEME Suite [40]. For each comparison, we used the centered 300 bp of open chromatin regions as input, and the union peak calls as background. GC matching was applied to the background peaks to ensure that this did not lead to spurious results. We primarily used all peaks as the background, but to check for accuracy, we also randomly selected 10,000 peaks as the background from all peaks. For this analysis, we focused on the known motif search rather than search for de novo motifs.

Differential expression analysis of ROSMAP data
Gene expression data from AMP-AD was obtained from the Religious Orders Study and Memory and Aging Project (ROSMAP) [41,42]. This study includes RNA-seq data from the dorsolateral prefrontal cortex of 724 subjects [43,44]. limma was used to perform the differential analysis in R on normalized FPKM values obtained from RNA-seq of ROSMAP bulk tissue samples. Samples with cogdx of 1 (no cognitive impairment, n = 201) and 4 (Alzheimer's dementia and no other case of cognitive impairment, n = 222) were included in the analysis.

Single-nuclei library preparation and sequencing
Libraries were prepared for snRNA-seq using the Chromium Single Cell 3′ Reagent Kits v3 (10X Genomics) according to the manufacturer's instructions. Nuclei were diluted in nuclease-free water to a concentration of 1 million nuclei / ml in a final volume of 100 ul, and transferred to an 8-strip tube on ice. Sixteen thousand nuclei were added to the Master Mix for a targeted nuclei recovery of 10,000. Libraries were pooled onto a single S1 flow cell, and sequenced using the Illumina NovaSeq 6000 system to obtain paired-end 2 × 100 bp reads. Sequencing saturations ranged from 59.3 to 81.2%.

snRNA-seq data analysis
CellRanger software version 3.1.0 (10X Genomics) was used to demultiplex raw Illumina base call files into FASTQ files. A pre-mRNA GTF file was generated with the pre-built GRCh38 3.0.0 human reference using the Linux utility awk, in order to be compatible with Cell-Ranger and to include intronic reads from nuclear RNA in UMI counts for each gene and barcode. The CellRanger count pipeline was run to align reads to the pre-mRNA GRCh38 reference and gene expression matrices generated separately for each of the four samples were merged together into a single matrix.
(See figure on previous page.) Fig. 1 Isolation of nuclei from frozen brain samples and analysis of ATAC-seq data. Human postmortem frontal cortex was dissociated, nuclei were isolated and stained with the nuclear stain DAPI and a monoclonal NeuN antibody conjugated to PE. a Nuclei were first sorted based on their forward and side scatter from all possible events (R1 gate). b Single nuclei were further sorted based on their size from the doublets or larger clumps of nuclei (R2 gate). c DAPI positive single cells were gated as either NeuN-PE positive (neurons, R3 gate) or NeuN-PE negative (glia, R4 gate). d Post-sort data showing the purity of the separation between neuronal and non-neuronal nuclei. e Fluorescence image showing unsorted nuclei stained for NeuN (red) and DAPI (blue). The scale bar represents 100um. f Proportion of neuronal nuclei from each sample. Error bars show standard error of the mean. g Overview schematic of Levels 1, 2, 3, and 3b of differential analysis. Level 1 compares neuronal vs. nonneuronal for 21 normal and 19 LOAD samples. Level 2 compares normal vs. LOAD for each neuronal and non-neuronal subpopulation. Level 3 compares LOAD samples separated by female and male. Level 3b is the same comparison done after adding 9 female non-neuronal samples (3 normal and 6 LOAD) For quality control and filtering, the quickPerCellQC function from the R package scater v1.14.6 [45] was used to identify low-quality cells based on QC metrics (UMI count, number of genes detected, percentage of UMIs that mapped to the mitochondrial genome). Cells in which 229 or more genes were detected and less than 17.4% of UMIs mapped to the mitochondrial genome were used in downstream analyses. Out of 21,262 nuclei in the initial dataset 18,032 remained after filtering with a median number of 2332 detected genes per nucleus. The R package Seurat v3.1.1 standard workflow was used for integration of multiple samples to combine the four samples into a unique dataset [46]. Prior to integration, gene expression was normalized for each sample by scaling by the total number of transcripts, multiplying by 10,000, and then log transforming (log-normalization). We then identified the 2000 genes that were most variable across each sample, controlling for the relationship between mean expression and variance. Next, we identified anchor genes between pairs of samples using the FindIntegrationAnchors function that were then passed to the IntegrateData function to harmonize the four samples.
We scaled the integrated dataset before running a Principal Component Analysis (PCA). To distinguish principal components (PCs) for further analysis, we used the JackStraw method to determine statistically significant PCs and found that up to 30 PCs were enriched in genes with a PC score that was unlikely to have been observed by chance. We then utilized the shared nearest neighbor (SNN) modularity optimization-based clustering algorithm implemented in Seurat for identifying clusters of cells. This was performed using the Find-Neighbors function with 30 PCs, followed by the FindClusters function with the Louvain algorithm using a 0.4 resolution. This allowed us to assign cells into a total of 21 clusters. We applied the uniform manifold approximation and projection (UMAP) method on the cell loadings of the previously selected 30 PCs to visualize the cells in two dimensions and to separate nuclei into clusters.
Differential expression to identify cluster markers that are conserved between the samples was performed using the Seurat function FindConservedMarkers for each cluster on the normalized gene expression before integration. R package SingleR v1.0.1 [47] was used to annotate cell types based on correlation profiles with 713 bulk microarray samples from the Human Primary Cell Atlas [48] (HPCA) as reference expression data. Four major cell types were detected by the SingleR method using HPCA: macrophages, astrocytes, neurons, and endothelial cells. Because of the specific expression of microglial markers PTPRC and CSF1R in the macrophage cluster, as well as the differences in biological systems used in the HPCA reference, we manually refined the HPCA annotation of this specific cluster to microglia. Identification of neuronal vs. non-neuronal clusters for some differential expression analyses was performed by determining the proportion of cells expressing the neuronal marker NeuN/RBFOX3. Clusters with more than 50% of cells expressing NeuN/RBFOX3 were defined as neuronal clusters. Control samples showed an average of 2417 (SD 169) neuronal and 2319 (SD 1149) nonneuronal nuclei, whereas LOAD samples showed 1329 (SD 294) neuronal and 2952 (SD 1783) non-neuronal nuclei. NeuN neuronal clusters and neuronal clusters identified by SingleR using the HPCA reference were very consistent with only one additional cluster in NeuN neuronal clusters (out of a total of 16 clusters). Differential expression analyses between female LOAD nuclei and control nuclei within specific groups of nuclei were performed using the Wilcoxon Rank Sum test as implemented in the FindMarkers function in the Seurat package. 'min.pct' and 'logfc threshold' arguments were set to 0 to allow for the testing of a majority of genes in each analysis.

Study sample and quality control analyses
Nuclei isolated from archived frozen temporal cortex of 51 individuals with LOAD (n = 26) or controls (n = 25) (Supplementary Table 1) were stained and sorted using a PE-conjugated monoclonal NeuN antibody (Fig. 1a-d).
Staining of pre-sorted nuclei with nuclear membrane markers was confirmed by immunofluorescence (Fig.  1e). We observed a smaller proportion of sorted neuronal (NeuN+) nuclei from total isolated nuclei in the severe LOAD cases (Table 1, Fig. 1f), as expected due to neuronal cell loss and gliosis, hallmarks of LOAD pathological progression [49,50]. We next performed ATACseq using the NeuN+ and NeuN-sorted nuclei populations. A total of 90 neuronal and non-neuronal datasets passed quality control criteria (see Methods), which were derived from 40 individuals (19 LOAD cases and 21 controls) that had matched neuronal and non-neuronal data, and an additional 9 individuals (3 LOAD cases and 6 controls) that had only non-neuronal data. The final cohort of 40 consisted of 22 males and 18 females with similar post-mortem intervals (PMI); all donors were Caucasians and homozygous for APOE e3 (Supplementary Table 2). Subsequent analyses compared different sub-groups (Fig. 1g). Correlations of all potential numerical covariates showed expected patterns of co-linearity ( Supplementary Fig. 1). The samples displayed a range of QC metrics, with some samples displaying a higher signal to noise (Supplementary Table 1). Repeated experiments on a subset of samples demonstrated that signal to noise was reproducible and thus a sample-specific characteristic (Supplementary Table 1). In addition to randomizing sample preparation (see Methods), we found that no metadata variables were significantly associated with case-control status ( Supplementary Fig. 2), indicating an absence of batch effects.
Differential analysis of ATAC-seq data between neuronal and non-neuronal nuclei To demonstrate that we can robustly identify chromatin accessibility differences between major cell types of the brain, we first compared the ATAC-seq profiles between neuronal and non-neuronal nuclei groups from the entire study sample (Level 1, Fig. 1g). PCA of all samples showed that 37.5% of the total variance was explained by the cell type (neuronal versus non-neuronal, Supplementary Fig. 3, Supplementary  Fig. 11). Performing quantitative differential chromatin accessibility analysis using EdgeR (FDR q < 0.05), we found that 87,570 regions were more accessible in the neuronal population, 83,171 regions were more accessible in the nonneuronal population, and 54,484 regions were not detected as differential (Fig. 2a, Supplementary Table 10). Representative screenshots show differential ATAC-seq peaks around genes known to be expressed specifically in neuronal (Fig.  2b) or non-neuronal (Fig. 2c) cell types. A higher percentage of sites more accessible in non-neuronal population mapped to promoters, which could be explained by the more distal regulation of neuronal cell types or the greater diversity and heterogeneity of the cell types composing the non-neuronal population ( Supplementary Fig. 4). Using GREAT gene ontology analysis [51], neuronal-specific regions were enriched with genes associated with neuronal function, while non-neuronal specific regions were enriched with genes implicated in glial function (Supplementary Table 3).
Motif analysis for regions with increased chromatin accessibility in neuronal nuclei showed enrichment for the transcription factor recognition sites of Early Growth Response 2 (EGR2), Regulatory Factor X1 (RFX1), and Myocyte Enhancer Factor 2C (MEF2C) [52][53][54], while motif analysis for regions with increased chromatin accessibility in non-neuronal nuclei showed enrichment for the SRY-related HMG-box (SOX) transcription factor family [55] (Supplementary Table 4). When compared to a previous NeuN sorted-nuclei ATAC-seq study [56] using a smaller number (n = 8) of healthy brain samples, we found a substantial degree of overlap ( Supplementary Fig. 5a) with similar peak length distribution ( Supplementary Fig. 5b). However, the 5x larger number of samples used in our study identified > 80,000 additional significantly differential chromatin sites.

Comparison of sorted nuclei vs. bulk brain tissue homogenate
To our knowledge, no study has directly compared chromatin accessibility differences between bulk brain tissue, neuronal, and non-neuronal fractions. Bulk brain ATAC-seq data was generated using pulverized frozen whole tissue homogenate samples from six individuals for whom high-quality ATAC-seq data was collected from neuronal and non-neuronal populations (Supplementary Table 5). Binary peak overlap analysis shows that while there are some regions that are only accessible in bulk tissue, a substantially larger number of sites are uniquely detected in the nuclei populations of neuronal and non-neuronal only (Fig. 2d). This observation justifies the performance of ATAC-seq analysis by brain celltype to reduce cellular heterogeneity as it allows the identification of more cell type specific signals. Quantitative differential chromatin accessibility (EdgeR comparisons ( Supplementary Fig. 6) and GO annotations (Supplementary Table 6 -8) showed that chromatin accessibility sites specifically identified in bulk tissue, neuronal and non-neuronal populations displayed many biologically relevant pathways.

Identification of LOAD-associated differences in chromatin accessibility
To determine the association of LOAD status with changes in chromatin accessibility, we performed differential ATAC-seq analysis of 19 LOAD compared to 21 healthy controls stratified by brain cell-type (neuronal and non-neuronal only populations; Level 2, Fig. 1g). The comparison using the neuronal nuclei data resulted in 211 neuronal chromatin differences (FDR q < 0.05) between LOAD and control (Fig. 3a, Supplementary  Table 11), while the analysis of the non-neuronal nuclei detected no differential chromatin between LOAD and control (Fig. 3b). Motif analysis using the 141 sites that showed decreased chromatin accessibility in LOAD neuronal nuclei were enriched for transcription factor motifs including Wilms Tumor protein (WT1), Early Growth Response 1 (Egr1), Retinoic acid receptor gamma (RARg), and Kruppel like factor 14 (KLF14) (Fig.  3c) that have been reported relevant to brain function [57][58][59][60]. No significant motif enrichment was detected from 70 sites (FDR q < 0.05) with increased chromatin accessibility in neuronal LOAD samples.

Identification of sex-dependent chromatin accessibility differences between LOAD and control samples
Since sex has an effect on LOAD onset and progression [61], we performed a sex-stratified differential analysis of LOAD compared to control (Level 3, Fig. 1g) to examine whether the effect of sex on LOAD risk is mediated, at least in part, by chromatin remodeling. The differential analysis of the neuronal groups did not yield significant differences between LOAD and normal when stratified by sex for either females or males (FDR q > 0.05) (Fig. 4a-b). Analysis of the non-neuronal nuclei identified 24 chromatin accessibility differences in female LOAD compared to control (Fig. 4c, Supplementary Table 12), while no significant differences were identified when the analysis was performed in males (Fig. 4d). To further investigate this trend, we increased the sample size with 9 additional non-neuronal samples (6 female controls and 3 female LOAD cases). Differential analysis using ATAC-seq data from the larger female non-neuronal dataset resulted in 842 differential sites between LOAD and control (FDR q < 0.05) (Fig. 4e, Supplementary Table 13).
Comparison of these results with the LOAD associated chromatin accessibility sites obtained from the differential analysis of female neuronal nuclei showed that while there are differences detected in both female neuronal and glial cells in LOAD cases versus controls, we detected larger effect sizes in the female glia cells (Supplementary Fig. 12a). In addition, several differential regions identified in female non-neuronal nuclei were also observed in male non-neuronal nuclei with similar trends, however, these did not reach statistical significance ( Supplementary Fig. 12b). Nonetheless, these associations in the female group showed a stronger effect size ( Supplementary Fig. 12b).
We performed motif enrichment using the 203 sites that were more accessible in non-neuronal female LOAD and found significantly enriched motifs for the SOX family (Fig. 4f). The analysis of 639 sites that displayed decreased accessibility in non-neuronal female LOAD was enriched for transcription factors known to be highly associated with glia or neuron functions, such as RONIN, SOX9, YY1 and ELK4 [62][63][64][65] (Fig. 4f).
To determine whether binding of transcription factors identified in female non-neuronal cells (Level 3, Fig. 1g) has downstream effects on expression of (See figure on previous page.) Fig. 2 Level 1 comparison of ATAC-seq data from neuronal vs non-neuronal nuclei. a MA plot showing differential ATAC-seq sites between neuronal (blue) vs. non-neuronal regions (red). Red dots represent ATAC-seq peaks that are significantly different between groups (FDR q < 0.05). b ATAC-seq data around non-neuronal-specific genes SLC25A18 (upper panel) and ACSBG1 (lower panel). Boxes highlight peaks that are more accessible in neuronal (red) or non-neuronal (blue) nuclei. c ATAC-seq data around neuron-specific genes MEF2C (upper panel) and SLA (lower panel). All regions indicate hg19 coordinates. d Venn diagram of ATAC-seq peaks detected in whole tissue, sorted neuron and sorted non-neuron nuclei from 6 donor-matching samples Fig. 3 Level 2 comparison of ATAC-seq data between LOAD cases and controls. MA plots of differential chromatin sites for a neuronal and b non-neuronal nuclei. Red dots represent differential ATAC-seq sites with FDR q < 0.05. c Motifs that are enriched in neuronal ATAC-seq sites that are less accessible in LOAD samples. Size of red dots were increased for visibility target genes, we used the publicly available ROSMAP bulk RNA-seq data to perform differential expression analyses of those genes for LOAD vs. Normal. We found that several genes exhibited changes in the expected direction, including ZAC1 (PLAGL1), YY1, and SOX2 (Supplementary Table 15). This provides further mechanistic insights into gene dysregulation underlying LOAD pathogenesis in females. Last, we also used the chromatin accessibility differential sites identified in female non-neuronal LOAD (Level 3) for GO analysis and found pathways involved in immune response and myelination (Supplementary Table 9).

Overlap of LOAD-specific differential chromatin accessibility sites with LOAD GWAS regions
To determine the relationship of the LOAD-specific chromatin accessibility sites we compared these data with LOAD GWAS regions [7]. We defined LOAD GWAS regions by anchoring on the top 25 associated SNPs +/− 100 kb, and cataloged LOAD-GWAS regions of 200 kb each. Importantly, we were comparing genomic regions of chromatin accessibility to the GWAS regions, not chromatin QTLs, to GWAS loci. Of the 211 LOAD-specific differential chromatin accessibility sites in neuronal samples (FDR q < 0.05, Fig. 3a), we identified five sites that overlap four of the 25 LOAD GWAS regions ( Table 2). We show representative examples of overlapping LOAD-specific differential chromatin accessibility sites surrounding the PTK2B and CLU GWAS loci (Fig. 5a, Supplementary Fig. 7). Using permutation testing, we did not detect a significant enrichment with LOAD GWAS regions compared to regions selected at random (Supplementary Fig. 8a-b).
Of the 842 differential sites in non-neuronal female LOAD (FDR q < 0.05, Fig. 4e), we detected nine differential sites that overlap seven of the 25 LOAD-GWAS regions ( Table 2). Using permutation analysis (see Methods), we found significant enrichment in the amount of overlap with sites less accessible in nonneuronal female LOAD (P < 0.05, Supplementary Fig. 8cd). Representative examples of differential chromatin accessibility sites between LOAD cases and controls overlapping representative LOAD-GWAS regions are shown for the loci APOE (Fig. 5b, Supplementary  Fig. 7b), and IQCK (Fig. 5c, Supplementary Fig. 7c). While differences around the APOE and IQCK locus are more pronounced in females, we also detect subtle differences in the same direction in males (Fig. 5b-c, Supplementary Fig. 7b-c). The identification of LOAD differences in chromatin accessibility in the APOE region, suggests a possible regulatory mechanism independent of the e4 allele effect at this locus.
Overlap of LOAD-specific differential chromatin accessibility sites in female non-neurons with single nuclei RNA-seq data Next, we performed functional validation of key findings from the chromatin accessibility differential sites in female LOAD non-neuronal cells that overlapped LOAD-GWAS regions ( Table 3, Supplementary Fig. 9). We focused the analysis on two representative LOAD differential chromatin accessibility sites in female LOAD nonneuronal cells that overlapped LOAD-GWAS loci: (1) LOAD more accessible sites surrounding the ECHDC3 locus (annotated by the proximate gene to the associated SNP) and (2) LOAD less accessible sites surrounding the ABCA7 locus. The loci were defined by anchoring on Table 2 Neuron control vs LOAD differential peaks overlapped with 25 LOAD-GWAS regions as well as female glia control vs LOAD differential peaks overlapped with 25 Fig. 5 Differential LOAD specific ATAC-seq peaks around LOAD-GWAS regions. Screenshots of ATAC-seq data around a PTK2B, CLU, b APOE, and c IQCK loci. Box plots show ATAC-seq read counts for individual ATAC-seq peaks (blue frames highlight significant differential peaks for cases vs. controls, gray frames show control peaks that are not differential between cases and controls). Box plots are color coded for non-neuronal (blue) neuronal (red), female (no fill), and male (gray fill). All tracks show hg19 coordinates and all y-axes on tracks range from 0 to 250. For box plots, the line within each box represents the median, and the top and bottom borders of the box represent the 25th and 75th percentiles, respectively. The top and bottom whiskers of the box plots represent the 75th percentile plus 1.5 times the interquartile range and the 25th percentile minus 1.5 times the interquartile range, respectively  Fig. 9 the associated SNPs +/− 500 kb, rs7920721 (ECHDC3) and rs3752246 (ABCA7) using the UCSC Genome Browser [66] (http://genome.ucsc.edu/) GRCh38/hg38 assembly released December 2013 (Table 3, Supplementary Fig. 9). Single-nuclei (sn)RNA-seq data was collected from female brains using a small subset group of two LOAD and two control (Supplementary Table 2, marked with '*'), and differential expression analysis of female LOAD nuclei compared to control nuclei was performed for all genes mapped with these regions in three cell-type specific groups: 1) NeuN-non-neuronal, 2) Human Primary Cell Atlas (HPCA)-annotated astrocyte, and 3) HPCA-annotated microglia. Pseudogenes, RNA genes, and novel transcripts were excluded from the analysis. Out of 54 total genes within 1 Mb of the two SNPs, 8 within the ECHDC3 locus and 46 within the ABCA7 locus ( Supplementary Fig. 9), many were upor down-regulated as predicted by chromatin accessibility profiles (i.e. increased chromatin accessibility overlapping promoters and enhancers was associated with upregulation and vice versa). Overall 28 (51.9%), 23 (42.6%), and 24 (44.4%) genes of the NeuN-, astrocyte, and microglia clusters, respectively, showed differential expression with trends corresponding to the changes in chromatin accessibility. Of those genes, some showed nominal significance (6 (21.4%) for NeuN-, 3 (13.0%) for astrocyte, and 5 (20.8%) for microglia clusters), but did not reach adjusted statistical significance, while other trends did not reach nominal significance. This could be explained by the small sample size and/or number of cells in the clusters. Furthermore, we examined the consistency between our dataset and a similar recently reported dataset [28]. To address this question we conducted differential expression analysis for HPCAannotated neurons, astrocytes, and microglia and compared the results for the top 15 significant genes, i.e. 5 from each cell type identified by the previous study [28]. We found that out of these 15 genes, our results show the same directionality for 13 genes, 10 of which also had significant adjusted p-values (Supplementary  Table 14), providing further validation of our findings. Table 3 catalogues the genes surrounding the ECHD C3 and ABCA7 loci, excluding 26 genes that were not detected or had low fold change values. Out of seven genes positioned within the 1 Mb region of the LOAD associated ECHDC3 SNP, we found that CELF2 was significantly upregulated in LOAD HPCA-annotated microglia clusters (mean log (fold change) = 0.47, adjusted p = 4.90 × 10 − 16 , Table 3). In addition, out of 21 genes mapped within the 1 Mb region of the LOAD associated ABCA7 SNP, CIRBP was significantly downregulated in LOAD NeuN-non-neuronal clusters (mean log (fold change) = − 0.71, adjusted p = 1.52 × 10 − 57 , Table 3) and.
HPCA-annotated astrocyte clusters (mean log (fold change) = − 0.80, adjusted p = 1.27 × 10 − 42 , Table 3). These results demonstrated that alteration in chromatin accessibility in LOAD-GWAS loci correlates with changes in gene expression. Furthermore, the results suggested that the target genes within LOAD-GWAS loci affected by LOAD specific changes in chromatin state may not be simply interpreted as the most proximate gene to the associated SNP.
Our pilot analysis to identify the intersection of LOAD-and glia-specific ATAC-seq sites and snRNAseq patterns discovered novel LOAD regulatory elements that lead to gene expression changes in LOAD. However, this pilot analysis was limited by the small number of samples (2 cases and 2 control) and therefore validation of the findings warrant further analysis using a larger RNA-seq dataset from LOAD patients and controls. Furthermore, intersecting these data sets provided a functional validation for the top significant findings from the ATAC-seq experiments showing sexdependent LOAD-specific open and closed chromatin accessibility sites in non-neuronal cells. Collectively, these results suggest that integrating brain cell-type specific 'omics data is a powerful mechanistic strategy to discover regulatory elements that impact expression of disease genes in a cell-type specific manner.

Discussion
Decoding the genetic and genomic mechanisms of LOAD is a major challenge in the post-GWAS era, since the majority of the LOAD associated SNPs are in noncoding regions [1][2][3][4][5][6][7]. Noncoding disease-associated loci have been shown to be enriched for regulatory elements in tissues and cells relevant to the disease [56,[67][68][69]. Thus, post-GWAS research requires an in-depth characterization of cell-type specific DNA regulatory elements. Mapping chromatin accessibility has been widely used to identify the location of active DNA regulatory elements, including promoters, enhancers, and insulators [70][71][72][73]. We performed the first systematic interrogation of the chromatin accessibility landscape in neuronal and non-neuronal sorted nuclei from LOAD and healthy brains.
To our knowledge, this study represents the first and the most comprehensive dataset to date of chromatin accessibility in LOAD brains and matched controls. Furthermore, this study was performed with brain cell-type specific resolution, i.e. neuronal and non-neuronal cells. The study has four major findings for the field of LOAD epigenetics. First, we have generated a map of LOADassociated cell-type specific chromatin accessibility sites. Second, we provide a catalogue of female-specific disease-associated chromatin accessibility sites in nonneuronal cells. Third, we suggest a non-coding regulatory mechanism, namely chromatin accessibility, by which~25% of LOAD-GWAS loci may exert their effect. Fourth, we have demonstrated that LOAD associated changes in chromatin accessibility can result in gene dysregulation through their overlap with the transcriptome profile of LOAD-GWAS loci. Overall, these results suggest that the cis interactions between regulatory elements and key genes contribute, at least in part, to the development and/or progression of LOAD. Of note, 87% of the differentially accessible chromatin peaks in the female NeuN-analysis and 92% of the differentially accessible chromatin peaks in the NeuN+ analysis fall within known regions of topologically associating domains (TADs) from fetal brain [74]. This suggests that chromatin regions associated with risk for LOAD belong to highly interacting regulatory domains. Furthermore, several LOAD loci may exert their pathogenic effects in a cell-type specific manner while others act in multiple cell types to drive LOAD pathogenesis. Alternatively, these cell-type specific and common changes in chromatin structure may represent secondary effects as consequences of disease-related processes such as neurodegeneration and gliosis.
A growing number of epigenome-wide association studies (EWAS) in LOAD have profiled DNA methylation, hydroxymethylation, and histone acetylation marks (H4K16ac), and assessed associations with LOAD risk and other related endophenotypes including the burden of pathology [17][18][19][20][21][22][23][24]. These studies have been a powerful approach to validate known LOAD loci, discover new candidate genes, and identify disease-related pathways [17][18][19][20][21][22][23][24]. However, the majority of the LOAD EWAS datasets were generated using bulk brain tissues, and the heterogeneous cell-type populations of brain tissue samples pose technical and biological limitations. Our results showed that a substantially larger number of ATAC-seq sites are uniquely detected in the neuronal and non-neuronal cells, compared to bulk tissues obtained from the same donors. This observation demonstrates the importance of cell-type specific epigenomic studies relative to bulk tissues, as it reduces cellular heterogeneity allowing the identification of more cell-type specific signals. Collectively, our outcomes open a new window for the exploration of the particular cell-types that contribute to LOAD pathogenesis, and the genes and pathways that mediate the cell-type specific pathogenic effects.
These data advance the mechanistic understanding of LOAD, and moreover, uncover new candidate LOAD loci. To date, in addition to this study only five others (two transcriptome, two DNA-methylation, and one histone acetylation) have compared genomic signatures stratified by different cell types in the LOAD brain using sorted-and single-nuclei based methodologies [27-29, 70, 75]. Altogether, our and other studies demonstrated the importance of applying these cell-type specific approaches in molecular analyses of brain tissues and highlight the impact of transitioning into single-cell based 'omics studies in LOAD functional genomic research. In addition, we show that integrating our cell-type specific ATAC-seq data with our scRNA-seq LOAD data corroborate the interpretation of the results and provide functional validation. By aligning the two datasets, we identified LOAD-specific female-dependent nonneuronal regulatory elements around two differentially expressed genes in LOAD glia cells. Moreover, analysis of chromatin accessibility differential sites in female LOAD non-neuronal cells that overlapped LOAD-GWAS regions not only provided functional validation and established the link between LOAD -GWAS -ATAC-seq and -snRNA-seq but also demonstrated that genes other than the most proximate to the associated SNP may play a role in LOAD pathogenesis. These results exemplify the potential of integration of cell-type specific datasets to validate known LOAD loci and also to identify new candidate genes. In future studies, a larger sample size may allow conducting a chromatin accessibility QTL study to determine colocalization with GWAS loci, and may also provide additional power to detect differential sites in non-neuronal nuclei as in a recent study [76]. In addition, investigating the relationship between neuropathology severity and cell typespecific chromatin accessibility signatures in a larger cohort will be crucial to understanding how gene regulation in specific cell types is impacted by the progression of the disease.
Several bulk tissue ChIP-seq studies have used functional genomics and integrative systems biology approaches to infer cell types. Consistent with our findings, these epigenomic studies strongly suggest that nonneuronal cell types contribute to LOAD-specific histone marks associated with active regulatory elements (promoters and enhancers). It was reported that LOAD GWAS loci were enriched in enhancer elements specific to immune cells [77] and tangle-associated H3K9ac signals located in both promoters and enhancers were significantly associated with modules classified as nonneuronal [78]. Recently, two studies used FANS-sorted nuclei followed by ChIP-seq and demonstrated that microglia were the non-neuronal cell type contributing to LOAD epigenomic signatures. Nott et al. found that LOAD SNP heritability was most significant in microglial enhancers [79], and Ramamurthy et al. showed that hyperacetylated peaks in microglia colocalize more with LOAD SNPs than the histone acetylome of other cell types [75]. Collectively, ours and others' studies point to nonneuronal epigenomic dysregulation, likely microglial, as a major contributing factor to LOAD pathogenesis.
Sex is an important factor in LOAD etiology and there are sex differences in disease risk, progression and clinicopathological phenotypes [80]. Foremost, there is a sexdependent difference in LOAD prevalence and almost two-thirds of LOAD patients are female [81]. Historically, it has been attributed to the longer average life expectancy in women [82], however, recent research suggests that other factors, such as the sudden drop in the level of sex hormones (estrogens) in women at menopause, contribute to the differences in susceptibility between males and females [61]. It was also reported that women manifest faster disease progression and cognitive decline, increased brain atrophy and pathological burden largely driven by neurofibrillary tangles [61], and a more advanced disease stage as indicated by CSF biomarkers, especially higher concentrations of total tau and phosphorylated tau. Although several conflicting reports suggested opposite trends [83], the effect of sex on LOAD has been widely accepted. Nonetheless, the molecular mechanisms underlying the role of sex as a risk factor in LOAD are understudied. Our study provides new insights into these gaps in knowledge showing sexdependent changes in chromatin structure between LOAD and control brains. We identified hundreds of LOAD differential chromatin accessibility sites specific to females, which overlap nearly one-third of all LOAD GWAS regions. Since the majority of differential chromatin accessibility sites do not overlap LOAD-GWAS regions, these represent novel candidate LOAD loci. Moreover, a female-specific effect on LOAD-associated changes in chromatin accessibility appeared exclusively in glial cells and resulted in nearly three-fold overrepresentation of sites that were more closed in female LOAD patient samples. However, we cannot rule out the possibility that LOAD-associated changes in chromatin accessibility also occur in glia from male LOAD patients, but because of the plausible much smaller effect size in males, we could be underpowered to detect significant associations in our male cohort. These results warrant further investigation to determine the effects of sexdependent chromatin remodeling on dysregulation of gene expression in the context of LOAD. In this respect, the recent scRNA-seq study [28] also reported sexdependent LOAD effects and specifically observed a sexspecific differential transcriptional response to LOAD pathology, enrichment of females cells in LOADassociated cell subpopulation, and higher expression in females for the marker genes of LOAD-associated cellular subpopulations [28]. Future integration of diverse 'omics datasets stratified by sex will decipher the underpinning mechanisms of sex differences in LOAD by establishing the cross interactions between sex-dependent chromatin structure and function, transcriptome and LOAD phenotypic outcomes.
One of the loci that showed sex-dependent effects in our study is the APOE linkage disequilibrium (LD) region. The e4 allele of the APOE gene is the first identified, most highly replicated, and the strongest genetic risk factor for LOAD [84,85]. Furthermore, LOAD GWAS have confirmed strong associations with the APOE LD genomic region, and no other LOADassociation remotely approached the same level of significance [1,3,86]. Interestingly, female carriers of APOE e4 have an increased risk of LOAD versus male and the adverse effect of APOE e4 on LOAD biomarkers was generally stronger in women versus men [87][88][89][90][91][92]. Overall the APOE LD region displayed more open chromatin in glia versus neurons as expected since this gene is much more highly expressed in glia. Interestingly, we found decreased chromatin accessibility at multiple sites in female LOAD glia across the APOE region. This result provides molecular clues to the observations that APOE e4 allele confers a greater risk for LOAD in women than in men [87][88][89][90][91]. In addition, significant downregulation of APOE in astrocytes from LOAD brains, although not sex-dependent, was reported by scRNA-seq analysis [28,29]. This evidence is consistent with the trend we found of more closed chromatin in LOAD samples, providing functional validation to our result_ENREF_28 [28]. In summary, we proposed molecular insights based on chromatin structure that may explain, at least in part, some aspects related to the role of APOE in LOAD.
In conclusion, this LOAD genomic research pioneers the approach of brain cell-type specific chromatin accessibility profiling and lays the foundation for additional sorted-and single-nuclei 'omics analyses in LOAD. Our outcomes warrant further investigations using a larger sample size to enhance the discovery smaller LOAD-associated effects on chromatin accessibility and to allow the utilization of a single peakset and set of covariates to perform multiple testing. Future and ongoing studies using even more advanced single-cell technologies will generate complementary 'omics datasets with finer cell-type resolution from larger wellcharacterized LOAD cohorts. Data sharing via publicly available portals, such as Accelerating Medicines Partnership-Alzheimer's Disease (AMP-AD), will facilitate integrative single-cells 'omics towards moving forward our understanding of the underpinning genetic drivers and molecular mechanisms of LOAD.

Conclusions
This study shows that LOAD-associated changes in chromatin accessibility in the brain occur on a cell typespecific level and may explain up to 25% of GWAS loci conferring risk for LOAD. We found that changes in accessibility in non-neuronal cells were sex-dependent and specific to female samples, thus providing a plausible molecular basis for the increased risk for and accelerated progression of LOAD in females. We also identified correlations between LOAD-associated accessibility changes and dysregulation of gene expression within the same cell type. Lastly, we provide evidence suggesting that in some LOAD loci the precise disease-causing gene is not merely the nearest gene to LOAD SNPs.  Table 6. Differential analysis of Whole tissue vs neuron/non-neuron mixed. Terms in blue represent association with glial function and the ones in red represent association with neuronal function. Terms in green represent association with general brain function. GO terms enriched in DE sites of nonneuron vs whole tissue. Supplementary Table 7. GO terms enriched in DE sites of neuron vs whole tissue. Supplementary Table 8. GO terms enriched in sites more accessible in mixed neuron/non-neuron vs whole tissue. Supplementary Table 9. GO term enrichment of LOAD down differential sties in female glia controls vs. cases, FDR < 0.05. Terms in blue are associated with glial function. Supplementary Table 10. Level 1 (neuronal vs. non-neuronal) differentially accessible peaks. Supplementary Table 11. Level 2 (LOAD vs. Normal neuronal) differentially accessible peaks. Supplementary Table 12. Level 3 (LOAD vs. Normal nonneuronal female) differentially accessible peaks. Supplementary  Table 13. Level 3b (LOAD vs. Normal non-neuronal female) differentially accessible peaks. Supplementary Table 14. Consistency between our snRNA-seq LOAD vs. Normal differential expression results and those of a recent publication. Our results are derived from HPCA-annotated neurons, astrocytes, and microglia. Dark orange = significantly up-regulated in LOAD, light orange = non-significantly up-regulated in LOAD, dark blue = significantly down-regulated in LOAD, light blue = non-significantly down-regulated in LOAD. Supplementary Table 15. Differential expression analysis of ROSMAP data for Level 3 female non-neuronal cells with and without covariates.
Additional file 2 : Supplementary Fig. 1: Pearson correlations of all potential numerical covariates and the first ten principal components of the potential numerical covariates. Supplementary Fig. 2. Association of meta variables with case-control status at a Bonferroni significance level of q < 0.05. The vertical line indicates Bonferroni significance. Linear regression was performed for numerical variables (n = 27) and chi-square tests were conducted for categorical variables (n = 2, excluding diagnosis). Supplementary Fig. 3. Scree plot depicting the proportion of total variance explained by each principal component of peaks in all samples passing QC. Supplementary Fig. 4. Level 1 comparison: Genomic distribution of Top 5000 differential sites of neuron/non-neuron. Box plots show the median, 25th percentile, and 75th percentile. Box plot whiskers show the 75th and 25th percentiles plus and minus 1.5 times the interquartile range, respectively. Supplementary Fig. 8. Permutation (n = 10,000) controls for differential sites overlap with GWA LOAD sites. Red lines represent LOAD sites numbers overlapped by differential sites. a. neuron LOAD up, randomly chosen 537 peak calls (pvalue = 0.13); b. neuron LOAD down, randomly chosen 947 sites (pvalue = 0.06); c. female non-neuron LOAD up, randomly chosen 1000 sites (pva = 0.98); d. female non-neuron LOAD down, randomly chosen 2352 sites (pva = 0.05). Supplementary Fig. 9. All genes within 1 Mb of LOAD-GWAS loci found to be more (top) or less (bottom) accessible by ATAC-seq of FANS-sorted nuclei. SNPs (light blue lines) were anchored in the center of the region and red boxes indicate genes found to be significantly dysregulated by snRNA-seq. Pseudogenes, RNA genes, and novel transcripts are excluded. Figures generated using the UCSC Genome Browser (http://genome.ucsc. edu/) GRCh38/hg38 assembly released December 2013. Supplementary  Fig. 10. Clustering of 90 replicates for data selection. (a) Hierarchical clustering. (b) K-mean clustering(K = 2). Supplementary Fig. 11. Scatterplot of PC1 vs. PC2 peaks showing association between PC1 and cell type (p = 5.11 × 10-45). Supplementary Fig. 12. (a) Differential sites found in the female glia normal vs LOAD comparison (blue and red circles) were mapped to MA plots of female neuron normal vs LOAD comparison (black dots). For the female glia normal vs LOAD comparison, blue circles represent differential sites which were more accessible in LOAD while red circles represent differential sites which were less accessible in LOAD. (b) Differential sites found in 27 replicates of female glia normal vs LOAD comparison were mapped to log fold changes plot of 18 replicates of female glia normal vs LOAD comparison vs. 22 replicates of male glia normal vs LOAD comparison. Black dots represent sites shared by the above two comparisons. Blue circles represent differential sites which were corresponding to those in female glia normal vs LOAD and more open in LOAD. Red circles represent differential sites which were corresponding to those in female glia normal vs LOAD and less open in LOAD. The dashed line represents y = x. The solid line represents linear regression of all of data points. and the Illinois Department of Public Health. Additional phenotypic data can be requested at www.radc.rush.edu.