Proteo-genomics of soluble TREM2 in cerebrospinal fluid provides novel insights and identifies novel modulators for Alzheimer’s disease

Triggering receptor expressed on myeloid cells 2 (TREM2) plays a critical role in microglial activation, survival, and apoptosis, as well as in Alzheimer’s disease (AD) pathogenesis. We previously reported the MS4A locus as a key modulator for soluble TREM2 (sTREM2) in cerebrospinal fluid (CSF). To identify additional novel genetic modifiers of sTREM2, we performed the largest genome-wide association study (GWAS) and identified four loci for CSF sTREM2 in 3,350 individuals of European ancestry. Through multi-ethnic fine mapping, we identified two independent missense variants (p.M178V in MS4A4A and p.A112T in MS4A6A) that drive the association in MS4A locus and showed an epistatic effect for sTREM2 levels and AD risk. The novel TREM2 locus on chr 6 contains two rare missense variants (rs75932628 p.R47H, P=7.16×10-19; rs142232675 p.D87N, P=2.71×10-10) associated with sTREM2 and AD risk. The third novel locus in the TGFBR2 and RBMS3 gene region (rs73823326, P=3.86×10-9) included a regulatory variant with a microglia-specific chromatin loop for the promoter of TGFBR2. Using cell-based assays we demonstrate that overexpression and knock-down of TGFBR2, but not RBMS3, leads to significant changes of sTREM2. The last novel locus is located on the APOE region (rs11666329, P=2.52×10-8), but we demonstrated that this signal was independent of APOE genotype. This signal colocalized with cis-eQTL of NECTIN2 in the brain cortex and cis-pQTL of NECTIN2 in CSF. Overexpression of NECTIN2 led to an increase of sTREM2 supporting the genetic findings. To our knowledge, this is the largest study to date aimed at identifying genetic modifiers of CSF sTREM2. This study provided novel insights into the MS4A and TREM2 loci, two well-known AD risk genes, and identified TGFBR2 and NECTIN2 as additional modulators involved in TREM2 biology. Supplementary Information The online version contains supplementary material available at 10.1186/s13024-023-00687-4.

TREM2 is an innate immune response receptor and type I transmembrane protein, highly expressed in microglia [9].TREM2 plays important roles in microglia activation, survival, migration, and phagocytosis [10].Microglia have been implicated in AD via phagocytosing dead cells, eliminating Aβ plaques, and pruning synaptic connection [11][12][13].Therefore, dysregulated microglia function in the brain due to TREM2 risk variants may increase AD risk.Failure of microglia migration to Aβ plaques augmented insoluble Aβ 40 and Aβ 42 accumulation and increased neural dystrophy in Trem2 −/− 5XFAD mice [14].These pathological alterations and impaired cognitive function were rescued in human TREM2 bacterial artificial chromosome (BAC) transgenic mice [15].In contrast, TREM2 is detrimental to tau pathology and Trem2 deficiency protects against neurodegeneration in PS19 human tau transgenic mice [16].These studies together demonstrate that TREM2 plays an important, but complex, role in AD pathology.
A soluble form of TREM2 (sTREM2) in cerebrospinal fluid (CSF) has emerged as an important biomarker for AD progression and pathogenesis.The full-length TREM2 protein consists of an extracellular ectodomain, a transmembrane domain, and an intracellular domain [17].Among the three major TREM2 transcripts found in human brains [5,18], an alternative spliced transcript (ENST00000338469) excludes exon 4, which encodes the transmembrane domain, and produces sTREM2 [5].In addition, sTREM2 can be produced by proteases including ADAM17, ADAM10, or γ-secretases [19].We and other groups have shown that CSF sTREM2 is elevated cognitive normal individuals compared with individuals in early disease stages but lower compared to AD in later stages [20][21][22].In autosomal dominant AD, the changes in CSF sTREM2 occur 5 years before the expected onset of AD [23].CSF sTREM2 is positively correlated with CSF tau and phosphorylated tau (P-tau) at threonine 181, but not with Aβ42, indicating that sTREM2 is associated with neurodegeneration after amyloid accumulation [21,22,24].Higher sTREM2 in CSF is also shown to be associated with slower cognitive decline in AD [25,26].However, other proteins that are part of the TREM2 and sTREM2 pathways, and the downstream mechanism by which these proteins lead to AD are still unknown.
We previously performed a GWAS for CSF sTREM2 and identified the MS4A locus, on choromose 11, which included MS4A4A and MS4A6A among others as a key modulator of CSF sTREM2 levels [27,28].We demonstrated that MS4A4A and TREM2 colocalized to lipid rafts at the plasma membrane and that MS4A4A modified sTREM2 in a dose dependent manner.However, in that study we were not able to identify the functional variant driving the association in the MS4A locus, nor to determine whether there were any additional functional genes in this or other loci modifying sTREM2 levels.
In this study, we performed a GWAS of CSF sTREM2 levels and identified four loci in a large cohort that included 3,350 non-Hispanic European ancestry (EURs) individuals.To further pinpoint the functional variants and nominate the functional gene underpinning the identified loci, we then performed post-GWAS analyses including multi-ethnic fine mapping with 250 non-European individuals (non-EURs).For each of identified loci, we then pursued stepwise conditional analyses, colocalization analyses, annotation with brain cell-type specific enhancer-promoter interaction, in vitro functional validation, and Mendelian randomization analysis.
As sTREM2 levels were measured with two different platforms, we examined consistency across study and platforms and performed sensitivity analysis by excluding specific cohorts.VMAP used MSD, whereas the remaining seven cohorts (Knight ADRC, ADNI, DIAN, ACE, Barcelona-1, PPMI and Stanford) used SomaScan.Although sTREM2 was measured with MSD in VMAP, results from VMAP were consistent with the remaining seven cohorts at chr 6, 11, and 19 loci (Table S1; the SNPs in chr 6 locus had low frequency and were not included for this cohort).In addition, we performed sensitivity analyses by performing a meta-analysis with and without the VMAP cohort.These two approaches showed very similar results, with almost the same effect sizes and comparable p-values for the nine SNPs (Table S2), indicating that the findings were robust and not affected by the different platforms that VMAP used.
As our datasets were enriched for AD cases, we wanted to determine if any of the associations were influenced by disease status.To do this, we performed five different sensitivity analyses of the sentinel variants: 1) Association analyses adjusted for AD status (n=1,972); 2) Association analyses adjusted for biomarker positivity based on the ATN classification (n=1,600); 3) Association analyses adjusted for Clinical Fig. 1 Workflow of proteo-genomics of soluble TREM2 (sTREM2) in cerebrospinal fluid (CSF) and association results using 3,350 EUR samples.A Our study included a study design with two stages: the first stage includes a GWAS using 3350 European samples from eight cohorts and 12,621,222 autosomal genotypic variants, and second stagein where multi-ethnic fine mapping using 250 non-European (non-EUR) samples from eight cohorts and 8,909,120 autosomal genotypic variants was performed.CSF sTREM2 was measured by SomaScan or MSD.In the first-stage GWAS analyses, using an additive linear model adjusting for age at CSF draw, sex, genotype platform/cohorts, and 10 PCs, we identified 4 loci associated with CSF sTREM2 levels: chromosome 3 RBMS3-TGFBR2 (novel), chromosome 6 TREM2 (novel), chromosome 11 MS4A (known), and chromosome 19 APOE (novel) as shown in Manhattan plot and locus zoom plots.For these 4 loci, we then conducted post-GWAS analyses.First, we used multi-ethnic fine mapping to detect the true causal variants underlying each locus.For each of the four loci, we then performed stepwise conditional analyses to identify the independent genotypic variants.To identify the functional genes underlying three novel loci, we performed colocalization analyses of each locus with the AD GWAS, GTEx eQTL, and MetaBrain eQTL.The regulatory role of these loci were annotated with the brain cell type-specific enhancer-promoter interaction map.For chromosome 3 RBMS3-TGFBR2 locus, in vitro functional validation using overexpression of TGFBR2 and RBMS3 in human primary macrophages was conducted.The overall genetic architecture overlapped between AD PRS and CSF sTREM2 was estimated using multivariate linear regression.Finally to determine whether CSF sTREM2 is causal for AD, two-sample Mendelian randomization was analyzed using CSF sTREM2 GWAS as exposure and the latest AD GWAS as outcome.B Manhattan plots of GWAS for cerebrospinal fluid (CSF) soluble triggering receptor expressed on myeloid cells 2 (sTREM2) in European individuals (EURs).P values are two-sided raw P values estimated from a linear additive model.The blue solid horizontal line denotes the genome-wide significance level (P = 5 × 10 -8 ), and the red solid horizontal line represents the suggestive significance level (P = 1 × 10 -6 ).X-axis depicts genomic coordinates by chromosome number and y-axis denotes the negative log10-transformed P value for each genetic variant.C) LocusZoom plot of GWAS of CSF sTREM2 at chromosome 3, 6, 11, and 19.The X-axis depicts genomic coordinates and the y-axis denotes the negative log10-transformed P value for each genetic variant (See figure on next page.) Dementia Rating (CDR; n=1,639); 4) Association analyses only using biomarker negative (A -T -) individuals (n=841); 5) Association analyses only using biomarker positive (A + T + ) individuals (n=759).We found a very high correlation of the effect sizes in each analysis (Fig. S4, Pearson corr=1, P< 5.3×10 -4 ).In addition, the effect sizes of these five models were not significantly different to our main model (Table S3).Our findings indicate that the genetic regulation of CSF sTREM2 was not affected by clinical or biomarker status.

Tissue specificity is detected in two of four Identified genetic loci
In order to determine whether four identified genetic loci are specific for CSF, we examined the association of these four loci with plasma sTREM2 based on 35,559 Icelanders (Table S4).The loci at chromosome 6 at the TREM2 locus, in cis, and chromosome 11 at MS4A locus, showed highly significant association (P < 1.5×10 -250 ; Table S4) in plasma, indicating shared genetic regulation between CSF and plasma.Notably, our two new signals, chromosome 3 (TGFBR2/RBMS3) and chromosome 19 (NECTIN2/APOE), did not showed any association with plasma sTREM2 (P>0.17),suggesting these are CSF specific signals and reinforming the notion that it is important to study relevant tissues other than plasma for AD.
Multi-ethnic fine mapping identifies two independent functional variants and genes in MS4A modifying sTREM2 and AD risk The MS4A locus on chromosome 11 showed the most significant association for CSF sTREM2 levels (Fig. 1B).This locus included 488 genetic variants reaching genome-wide significance (all with P < 5×10 -8 ; Table S5).
In order to determine the presence of an additional independent signal in this locus, we performed conditional analysis.After conditioning by the sentinel variant rs72918674, a secondary signal located within an intron of MS4A4A (rs10897026, MAF=0.32;β=-0.28,P=2.98×10 -31 before conditioning; β=-0.19,P=6.38×10 -16 after conditioning; Fig. 2A) was identified.There were no additional independent signals beyond these two tagged by rs72918674 and rs10897026.The linkage disequilibrium (LD) structure of this region revealed that these two signals belong to two distinct LD blocks (r 2 =0.06 between two index variants; Fig. 2B and Table S5).For these two signals, all eight cohorts contributed consistently to the association, without any evidence of heterogeneity (heterogeneity P=0.36 for rs72918674 and P=0.78 for rs10897026; Fig. 2C).To identify the most likely functional variant(s), we performed functional annotation of the associated variants.The primary signal, rs72918674, is in LD with two missense variants in MS4A6A (rs7232 p.T185S; rs583791 p.A112T; first LD block; Table S5; Fig. S5A; Fig. 2B).The secondary signal, rs1089702, is in LD with two missense variants (rs10750931 p.K52E; rs6591561 p.M178V; second LD block; Table S5; Fig. S5A; Fig. 2B) in MS4A4A as well as in LD in two missense variants (rs674971 p.A146S; rs7929057 p.G77D) in MS4A4E.
While some of the missense variants may be the causal variants modifying sTREM2 levels, the remaining variants would be in LD with the functional variants.As LD structure varies across populations due to random genetic drift, genetic mutation, and recombination events [29],analyses of other ethnicities can help distinguish the functional variants from those in LD.To achieve this goal of fine-mapping, we performed an association analysis in 250 non-EURs (Table S6 and Fig. S1C).In the first LD block that contained the primary signal, one missense variant p.A112T in MS4A6A (P=6.96×10 - ) remained significant at P < 0.05.This missense variant had an LD R 2 of 0.363 with the primary index variant in the non-EURs population (Table S7 and Fig. 2B).The effect sizes for these variants were consistent across populations (Table S5).In the second LD block that contained the secondary signal, only one missense variant p.M178V in MS4A4A (P=1.75×10 - ) was significant in the non-EURs, with consistent effect size (β=-0.278 in EURs vs. -0.209 in non-EURs).These results suggest that the MS4A6A p.A112T and the MS4A4A p.M178V variants may be the functional SNPs driving the association in this locus.However, the effects of the variants are in opposite direction, with minor allele of p.A112T being associated with higher CSF sTREM2, and the minor allele of p.M178V being associated with lower CSF sTREM2.In both cases, the allele associated with higher CSF sTREM2 levels is associated with lower AD risk.
We hypothesize that these two independent signals in this locus could have a synergistic effect.We therefore performed epistatic analysis for CSF sTREM2 and AD risk by including MS4A6A p.A112T, MS4A4A p.M178V, and their interaction term in a linear model.We found a significant interaction among these two variants for both CSF sTREM2 levels (P=0.002;Table S8 and Fig. 2E) and AD risk (P=0.011;Table S9 and Fig. 2E).This indicates that these two missense variants are jointly affecting CSF sTREM2 levels as well as AD risk.However, the underlying molecular mechanism for this identified interaction needs further investigation.

The association with sTREM2 at the TREM2 locus is driven by TREM2 missense variants
The TREM2 locus on chromosome 6 was the second most significant signal (Fig. 3A).This locus contained nine genetic variants reaching genome-wide significance.Two of these significant variants are the TREM2 missense variants p.R47H and p.D87N.The index variant (rs12664332, MAF=0.006,β=-1.39,P=2.25×10 -20 ) was in LD (r 2 =0.799) with the p.R47H (MAF=0.006,β=-1.349,P=7.16×10 -19 ; top panel of Fig. 3A) as well as other seven variants in this region (Fig. S6A).The conditional analysis identified an additional independent signal at missense variant (rs142232675, p.D87N, MAF=0.003,β=-1.843,P=2.71×10 -10 ) which is in other LD block (bottom panel of Fig. 3A; LD between p.R47H and p.D87N R 2 =0).The minor alleles of both missense variants were associated with lower CSF sTREM2 levels, with a consistent effect size across six cohorts (Fig. 3B and Fig. 3C).Colocalization analysis with the latest AD GWAS [8] confirmed that this locus is the same as the one for AD risk (PP.H4=1.00;Fig. S6A and  S6B).The two missense variants (p.R47H and p.D87N) in TREM2 were previously identified for AD risk, but not for CSF sTREM2 levels.In addition, a suggestive association was observed for the p.R62H variant, which has also been reported to be associated with AD risk (rs143332484, MAF=0.011,β=-0.55,P=6.02×10 -7 ; Fig. 3B and Fig. 3C).This missense variant is not in LD with either rs142232675 p.D87N (r 2 =0) or rs75932628 p.R47H (R 2 =0.0001).Our previous study [27] included 800 samples and was unable to identify this locus due to the low frequency of these variants.
Among the four genome-wide significant variants, two variants (rs73823314 and rs73823316) were in a regulatory region that binds transcription factor (TF) based on the Ensemble variant effect predictor (VEP) annotation.
In order to identify the potential functional variant and gene, we examined brain cell type-specific enhancerpromoter interactome maps [34].We found high peak of epigenetic markers for TGFBR2, consistently measured with ATAC-Seq, H3K27ac, and H3K4me3 (Fig. 4C).These epigenetic markers were only observed in microglia, indicating that TGFBR2 is actively regulated in microglia.More importantly, we identified microgliaspecific dense chromatin loops that connect the regulatory variant rs73823314 (LD with rs73823326, R 2 =1.0) to the promoter of TGFBR2.All this evidence suggests that TGFBR2 is the most likely functional gene that affects CSF sTREM2 levels.
As APOE ε2 (rs7412) and APOE ε4 (rs429358) are the most significantly associated genetic variants for sporadic AD, we wanted to determine if the association with CSF sTREM2 levels was driven by these known APOE variants.We performed conditional analyses for the full APOE genotype.However, when we conditioned for the full APOE genotype (or APOE ε4 or APOE ε2 alone), the association of rs11666329 with CSF TREM2 levels did not change significantly, remaining near genome-wide significant with the similar effect (β=-0.12,P=5.72×10 -8 ; Fig. 5A), indicating that this association is independent of APOE genotype (Fig. 5A).The LD structure also confirmed that rs11666329 is not in LD with APOE ε2 (LD R 2 =0.0031) or APOE ε4 (LD R 2 =0.0007;Fig. 5B).In addition, to examine the association of sTREM2 levels with APOE, we coded the APOE ε4 dosage as non-APOE ε4 (no 4 alleles; which includes ε2/ ε2, ε2/ ε3, and ε3/ ε3), APOE ε4X (single copy; ε2/ ε4 and ε3/ ε4), and APOE ε44 (two copies; ε4/ ε4).In Europeans, we found that CSF sTREM2 levels tend to be higher, as the number APOE ε4 allele increases (non-APOE ε4 vs APOE ε44; P=0.048).On the other hand, in non-Europeans, sTREM2 levels also shows a nominal association with APOE ε4 alleles (P=0.012),but in the opposite direction between one copy and two copies, suggesting that this association is neither significant nor robust across populations (Fig. S8).
Next, we performed eQTL mapping to determine if NECTIN2, APOE or any other gene in the region is the most likely functional gene driving this association (Table S12, Table S13, and Table S14).We examined RNA expression of the five genes in this locus, NECTIN2 (also known as PVRL2), APOE, APOC1, TOMM40, and CLPTM1 gene, across multiple tissues using eQTLGen [35], GTEx [36], Metabrain [37], and microglia (MiGA) [38].Among these five genes, we found the strongest eQTL evidence for NECTIN2 (Table S15).Both of two variants, rs57537848 and rs11666329, were regulating NECTIN2 expression in all three tissues (brain cortex, P<2.3×10 -11 ; blood, P<8.2×10 -101 ; microglia, P<7.93×10 -3 ).There was strong evidence of colocalization (PP.H4=0.76;Table S14) with NECTIN2 mRNA levels, as well as with CSF NECTIN2 protein;evels (Fig. 5D).While we observed a nominal significance for TOMM40 in the brain cortex (P= 4.9×10 -3 ), this was much weaker than those with NECTIN2.There were no eQTL evidence for the remaining three genes (APOE, APOC1, and CLPTM1) in any of the three tissues.In addition, brain cell type specific annotation did not observe any interactions between this locus and the promoter of APOE (Fig. S9).
We wanted to determine if this variant, rs11666329, is also associated with AD risk, based on the latest GWAS [7,30].The A allele of rs11666329 was associated with higher AD risk (β=0.168,P=1.52×10 -66 ; Fig. 5D) [7,30].In order to address whether the association with AD risk for this variant independent of APOE ε2 (rs7412) and ε4 (rs429358), we performed conditional analyses using GCTA-COJO that adjusts for rs7412 and rs429358 in the latest AD GWAS [30].We observed that the association of rs11666329 with AD risk is still highly significant after conditioning on APOE (before conditioning: P=1.52×10 -66 ; after conditioning: P=7.12×10 -32 ; Table S16), indicating this association is independent of APOE.Using the same analyses, the association of rs11666329 with NECTIN2 expression in cortex is also independent of APOE (before conditioning: P=2.78×10 -5 ; after conditioning: P=3.32×10 -5 ; Table S16).
Finally, for functional validation, we used the same approach as with chr3 RBMS3/TGFBR2 locus (Fig. S10A).Lentivirus-mediated overexpression of NEC-TIN2 resulted in ~2.7-fold increase in intracellular NECTIN2 protein levels (P<0.0001) as compared to control lentivirus transduced PBMC-derived macrophages (Fig. 5E, Fig. S10B).Importantly, while intracellular TREM2 protein levels remained unchanged (Fig. 5F, Fig. S10B), the extracellular sTREM2 levels were significantly elevated (~21% increase, P=0.0264, Fig. 5G) upon NECTIN2 overexpression further supporting our findings that NECTIN2 modulates the levels of sTREM2.For NECTIN2 knock-down, however, despite of the multiple experiments by using shRNAs (four different shRNAs, see material and methods) with multiplicity of infection 1 and 2, none of them led to consistent and robust reduction in NECTIN2 protein levels in our cell model (Fig. S11).

The genetic architecture is shared between AD risk and CSF sTREM2 levels
In order to determine if the overall genetic architecture of CSF sTREM2 levels overlaps with that of AD risk, beyond the GWAS hits, we determined if polygenic risk scores (PRS) for AD risk (with and without the APOE region) are associated with sTREM2 levels.PRS were computed using effects at genetic variants with P<5.00×10 -8 for AD risk [8].When variants in APOE region were included in PRS calculation, a significant negative association of PRS with CSF sTREM2 was observed (β=-0.047,P=3.57×10 -

3
).When variants in APOE region were removed in PRS calculation, PRS was even more significantly associated with CSF sTREM2 (β=-0.088,P=1.57×10 -7 ), suggesting a general genetic overlap between AD risk and sTREM2 levels.Larger GWAS studies using sTREM2 as endophenotype may lead to the identification of novel AD risk variants that are involved on TREM2 biology.

Mendelian randomization confirms the protective role of CSF sTREM2 for AD
To examine whether CSF sTREM2 levels are part of the causal pathway for developing AD, we performed twosample Mendelian randomization (MR) analysis.We used our GWAS results for CSF sTREM2 and the latest AD GWAS for AD [7].Eight variants were selected as independent instrument variables after clumping.The variant rs11666329 on chromosome 19 was an outlier noted by MR-PRESSO and removed from this analysis.We chose the remaining seven genetic variants for independent instrument variables (Table S17) and performed five different MR analyses.All analyses provided significant associations, indicating that higher CSF sTREM2 levels lower AD risk (Fig. 6A).The result in MR Egger, which accounts for possible horizontal pleiotropy, remained significant (P=1.78×10 -2 ).Therefore, we considered the MR results using the inverse variance weighted (IVW) approach as appropriate.Based on this, we conclude that CSF sTREM2 is causal for AD, indicating that higher CSF sTREM2 levels have a significantly protective effect on reducing AD risk (β=-0.236,P=1.36×10 -9 ; Fig. 6A and 6B).
We further examined biomarker-based status (A -T -, A + T -, and A + T + ) for the four cohorts with sample sizes n>50 in each group.CSF sTREM2 levels in A + T + cases were indeed higher than A -T -across four cohorts (Knight ADRC, P=9.8×10 -5 ; ADNI, P=2.6×10 -5 ; ACE, P=1.0×10 - 7 ; and DIAN, P=2.6×10 -5 ; Fig .6C).However, CSF sTREM2 levels in A -T -were higher than those in A + T -, as reported in previous studies [26,[39][40][41].This highlights the complex dynamics of some of these proteins, as similar patterns have been found, for example, with pTau in where initially increases and then decreases.In this case, this data and other studies indicate that higher sTREM2 are associated with a lower risk of AD and lower progression [39], but in later disease stages there are higher sTREM2 levels due to secondary inflammation.This data indicates that pure observational studies, looking at levels between controls vs. cases, may lead to false inferences.While we observed that sTREM2 levels in ADNI and DIAN were lower, and is likely due to sample collection.It is known the tube type, or different freeze-thaw cycles can influence overall protein levels, and there is no consistent sample collection across these studies.However, we want to note that the association with AT status is consistent, and genetic association at all the 11 variants was consistent with those from other cohorts.
All together, these results strongly support that higher sTREM2 levels are associated with lower risk for AD.

Additional follow-up illuminates understanding the biology of the CSF sTREM2 loci
In order to further characterize the sTREM2 loci, we performed protein-wide association analyses of 7,027 aptamers at the four sentinel variants to identify other proteins that are regulated by the identified loci, using a similar approach that we had used before [42].We identified three proteins (TREM2, IZUMO4 and A4GALT) associated with rs72918674 in the MS4A4A/MSA4A6A locus and 47 proteins (including TREM2, ZNF483, ARL2, and ATE1) with rs11666329 in the NECTIN2/APOE locus (Table S18, Fig. S12A, S12B).These analyses were performed by query the Online Neurodegenerative Trait Integrative Multi-Omics Explorer (ONTIME) (https:// ontime.wustl.edu/), that includes summary statistics for more than 26,000 molecular traits.

Discussion
Here, we performed the largest genetic screening for CSF sTREM2 levels integrating protein and genetic data from 3,350 European ancestry individuals, as well as 250 non-Europeans.We identified four genetic loci, including the known MS4A cluster on chromosome 11 [27] as well as three novel loci, two of them specific to CSF and not reported in large plasma studies.
We previously reported a MS4A cluster as a major regulator for CSF sTREM2 [27].This study provides additional insights about the role of this locus on AD risk and TREM2.Specifically, we were able to demonstrate that there are two independent signals in the MS4A locus and that the MS4A6A p.A112T and the MS4A4A p.M178V variants are the most likely functional variants based on the multi-ethnic fine mapping analyses.We also demonstrated a significant epistatic effect between these two variants in AD risk and sTREM2 levels.There are other variants in LD with these two variants, some of which are coding or reported to be genetic regulators of gene expression levels (Table S5).It is possible that in each LD blocks the effect is driven by more than one variant, complicating the analyses and interpretation of this region.Additional functional studies will be needed to identify the exact functional variants.
Previously we also demonstrated that genetic and pharmacologic regulation of MS4A4A also modify TREM2 levels and therefore MS4A4A is a potential therapeutic target for AD.Currently there are several clinical trials that aim to modify sTREM2 levels and AD risk by targeting MS4A.For instance, Alector Inc. developed AL044, a humanized MS4A function modulating monoclonal antibody for the treatment of AD.In preclinical in vivo studies, AL044 has induced key microglial signaling pathways for proliferation, survival, lysosomal activity, migration, phagocytosis, and immune response.The data in this study goes beyond our initial findings and indicating that future investigation of targeting both MS4A4A and MS4A6A may have even a larger effect than just one of those genes.
TREM2 risk-variant carriers (rs142232675 p.D87N, rs75932628 p.R47H, and rs143332484 p.R62H) are known to have higher AD risk [7,8].With the largest cohort of CSF samples, we were able to detect the association of the TREM2 locus with CSF sTREM2 levels for the first time.We found significantly lower CSF sTREM2 in TREM2 risk-variant carriers (rs142232675 p.D87N and rs75932628 p.R47H).The variant p.R47H might reduce CSF sTREM2 level through decreased solubility and cleavage [49].However, this observation is opposite to the higher CSF sTREM2 in p.R47H variant carriers reported by Piccio et al [21], Deming et al [27], and Suárez-Calvet et al [41].There are three isoforms of sTREM2 (ENST00000373113, ENST00000373122, and ENST00000338469) in CSF.We do not have the isoform information in Somalogic and ELISA, and the discrepancy between our results and Suárez-Calvet's might be due to differences in sTREM2 isoforms detected by Somalogic from this study and ELISA from Suárez-Calvet et al [41].We also cannot rule out the possible influence of p.R47H to sTREM2 structure and sTREM2 binding with the aptamer in the Somalogic platform.Therefore, a follow-up study that quantifies the three isoforms in AD patients and healthy controls using orthogonal proteomic measures including Somalogic, Olink, ELISA, and mass spectrometry would be informative for addressing these possibly conflicting findings.
The novel CSF specific chromosome 3 locus (rs73823326) is located in an intergenic region between RBMS3 and TGFBR2.TGFBR2 at 3p24.1 encodes a transmembrane protein and plays a key role in signal transduction.Besides hepatic stellate cells, adipocytes and endothelial cells, TGFBR2 is abundant in microglia based on the Human Protein Atlas (https:// www.prote inatl as.org).It is reported that brain extracts of AD patients have significantly lower levels of TGFBR2 compared to controls [50].Consistent with this, reduced neuronal TGFBR2 signaling led to accelerated age-dependent neurodegeneration and promoted beta-amyloid accumulation in an animal model [50].TGFBR2 contains a microglia-specific high level of epigenetic markers.The microglia activation induced by Tgfbr2-deficiency [51] further supports a regulatory role of TGFBR2 in microglia.The presence of the microglia-specific dense chromatin loops that connect this locus to the promoter of TGFBR2 support the hypothesis that TGFBR2 is the most likely functional gene underpinning this locus.Since TGFBR2 spans the cell membrane, it might regulate CSF sTREM2 through influencing the proteolytic cleavage at the cell membrane of microglia.Importantly, our in vitro cell-based studies confirmed that overexpression (and knockdown) of TGFBR2 in human primary macrophages increases (and decreases) extracellular sTREM2 levels.We did not observe any changes in sTREM2 levels upon RBMS3 overexpression.Taken together, the bidirectional change on sTREM2 levels due to altered TGFBR2 expression levels in PBMC-derived macrophages strongly implicates TGFBR2 as the functional gene in this locus.
Another novel CSF specific signal is a locus (rs11666329) on chromosome 19 located in an intron of NECTIN2.Two well-known APOE variants (rs429358 and rs7412) reside 50 kb downstream but were not associated with CSF sTREM2.Both LD structure and our conditional analyses confirmed that the association at this locus is independent of APOE genotype.Besides CSF sTREM2, the A allele of rs11666329 is associated with significantly lower CSF NECTIN2 protein levels, higher AD risk and earlier age at onset for AD.Active epigenetic markers including ATAC-seq, H3K27ac, and H3K4me3 in this locus were noted in microglia, neurons, and oligodendrocytes.NECTIN2, also known as CD112 or PVR-related 2 (PVRL2/PRR2), is a single-pass type I transmembrane protein [52] and its two splice variants (α and δ) are expressed in multiple tissues including brain neurons, astrocytes and microglia [53,54].Genetic global ablation of NECTIN2 causes loss of neurons and nerve fibers in mouse brains at 6-months of age [55], indicating a protective role of NECTIN2 in neurodegeneration.Recently, NECTIN2 variants were shown to be associated with AD risk as well as with altered lipid metabolism and conferring cardiovascular risk in people with type 2 diabetes mellitus [56] Furthermore, one of the SNPs in the human NECTIN2 gene, is significantly associated with AD in African Americans, even after adjusting for the effects of APOE genotype [57].This is in line with our findings that rs11666329 remains significantly associated with AD after conditioning on two APOE variants.Besides sTREM2, additional 46 CSF proteins involved in Amyloid-beta clearance, Innate Immune System, Autophagy, as well as others, were regulated by this locus.Taken together, our analysis is consistent with NECTIN2 being a novel modulator for CSF sTREM2 and that it may impact directly or indirectly AD development.
Our analyses nominate NECTIN2 as the functional gene for sTREM2 in the APOE region but we cannot totally exclude that APOE is the functional gene or has no interaction with TREM2 at protein level.APOE has been identified as a binding ligand for TREM2 [58,59].The binding of APOE to TREM2 was associated with increased clearance of apoptotic neurons by microglia.Therefore, altered TREM2 protein structure by TREM2 missense variants, as well as reduced sTREM2 protein levels determined by variants in four loci, might reduce the affinity of APOE for TREM2 and decrease the clearance of beta-amyloid from the brain [58].This might be one of the possible mechanisms for these loci that contribute to AD risk.
In line with previous results that the soluble form of TREM2 is protective against AD [27].Our Mendelian randomization and polygenic risks score analyses not only support this hypothesis, but also suggest that the mechanism that regulate the levels of sTREM2 levels is in fact a part of the causal pathway of AD, independently of the TREM2 risk variants.Our analyses indicate that the variants that regulates sTREM2 levels are also not only regulated AD risk but also slowed memory decline and brain atrophy and reduced amyloid and tau aggregation observed in AD patients [26,60].Therefore, modulators of sTREM2 will be ideal candidate for novel AD therapeutic target.
Despite the strength of our study and novel loci identified, the present study has a limitation.First, a sample size of CSF sTREM2 levels in non-EURs individuals is small.While we narrowed down to the two missense variants for the MS4A locus, we were not able to examine the remaining 3 loci.There was not enough power for both chromosome 3 TGFBR2 locus and chromosome 19 NECTIN2 locus.In addition, the rare variants in the chromosome 6 TREM2 locus in Europeans were not available in our non-European cohorts.Follow-up GWAS analyses using non-European population with a larger sample size would be valuable.Another limitation is that even in this study were are able to nominate to nominate two coding variants in the MS4A cluster, we are not able to determine if they are loss or gain of function.To assess the impact of two missense mutations, used several prediction algorithms, like SIFT and Polyphen and lossof-function (pLI) score.Both variants had a (pLI) score of zero, indicating a low likelihood of causing loss-offunction effects.This observation aligns with the benign prediction made by PolyPhen and SIFT for these two variants.Therefore, based on this information alone, we cannot predict if those variants are loss or gain of function.For both of the independent signals, we found a consistent association in where the allele associated with lower sTREM2 levels was also associated with higher AD risk.However we found an opposite effect on gene expression, the allele associated with lower sTREM2 in MSA4A was with lower MS4A4A expression in brain cortex and microglia, but higher in blood.ON the other hand, the MS4A6A allele associated with lower sTREM2 levels is associated with higher MS4A6A in cortex, microglia and blood (table S10).Therefore, additional functional studies are needed to validate the MS4A4A p.M178V and the MS4A6A p.A112T as the functional variants and determine if these are loss or gain of function.Third, the goal of cell-based assays included in this study was to validate the genetic findings and/or narrow down the most likely functional gene(s) for each locus.To do this we used PBMC-derived macrophages as a proxy for microglia.Our results indicate that TGFBR2 in the chr 3 locus and NECTIN2 in the chr 19 locus are the most likely functional genes driving the GWAS signal.However, additional studies using microglia models, such as induced Microglia-like (iMGL), are needed not only to validate these results but to determine the specific mechanism by which TGFBR2 and NECTIN2 modify sTREM2 levels.For TGFBR2, we were able to perform overexpression and knock-down experiments which lead to significant changes in sTREM2.Our analyses indicate that modifying TGFBR2 two fold, leads to around 11-16% changes of sTREM2 levels, suggesting that the impact of TGFBR2 in sTREM2 levels is limited.It is important to note that the changes in sTREM2 levels were observed only after six days post transduction of the PBMC-derived macrophages.It may be possible that larger sTREM2 level changes accumulate over time.Therefore, additional analyses, and ideally in iMGL should be performed to fully understand the impact of TGFBR2 in sTREM2 levels.Based on the genetic analyses, the effect size of TGFBR2 (β=-0.282) is similar to that of the MS4A locus (β =0.28-0.37,Table S5).Third, we were not able to perform successful knock-down experiments for NECTIN2 in PBMC-derived macrophages.Moving from genetic findings to in-depth functional characterization is a major task, requiring multiple teams with multiple expertise, as this involves creating and optimizing reagents and models.Despite these challenges, in this study, we have been able to perform basic cell-based analyses that nominate TGFBR2 and NECTIN2 as novel genes implicated in TREM2 biology.
In summary, we performed the largest GWAS analysis of CSF sTREM2 and identified four loci.In addition to the known MS4A gene cluster and a cis signal in the TREM2 locus, we identified two novel regulators, TGFBR2 and NECTIN2, involved in TREM2 biology.These two genes, as well as MS4A4A and MS4A6A, are highly expressed in microglia and are transmembrane proteins, suggesting that may affect the proteolytic cleavage of TREM2 and serve as novel therapeutic targets for AD.

Study design
The 3,600 participants across the eight cohorts were grouped into European (EURs; n= 3,350) and non-European (non-EURs; n= 250).This was genetically determined using the principal component analysis of genomic data anchored by the HapMap reference panel (Fig S1 We performed two-stage GWAS analyses: the first stage of GWAS in joint European ancestry (EURs), and the second stage of multi-ethnic fine mapping.First stage GWAS analyses utilized 3,350 European samples from eight cohorts and 12,621,222 autosomal genotypic variants, and second stage multi-ethnic fine mapping used 250 non-European (non-EURs) samples from eight cohorts and 8,909,120 autosomal genotypic variants.CSF sTREM2 was measured by SomaScan or MSD (Table 1).In the first-stage GWAS analyses, we used an additive linear model adjusting for age at CSF draw, sex, genotype platform/cohorts, and 10 PCs.
For the significant loci, we then conducted post-GWAS analyses.First, we used multi-ethnic fine mapping to detect the true causal variants underlying each locus.For each of the four loci, we then performed stepwise conditional analyses to identify the independent genotypic variants.To identify the functional genes underlying three novel loci, we performed colocalization analyses of each locus with the AD GWAS, GTEx eQTL, and Meta-Brain eQTL.The regulatory role of these loci were annotated with the brain cell type-specific enhancer-promoter interaction map.For chromosome 3 RBMS3-TGFBR2 locus, in vitro functional validation using overexpression of TGFBR2 and RBMS3 in human primary macrophages was conducted.The overall genetic architecture overlapped between CSF sTREM2 and AD risk was estimated using association between PRS of AD risk and CSF sTREM2.Finally to determine whether CSF sTREM2 is causal for AD, two-sample Mendelian randomization was analyzed using CSF sTREM2 GWAS as exposure and the latest AD GWAS as outcome.

Ethics statement
The Institutional Review Board of all participating institutions approved the study and research was performed in accordance with the approved protocols.Written informed consent was obtained from all participants or their family members.

Cohort demographics
The join analyses in EURs and non-EURs included participants from Charles F. and Joanne Knight Alzheimer Disease Research Center (Knight ADRC), Alzheimer's Disease Neuroimaging Initiative (ADNI), ACE Alzheimer Center Barcelona (ACE), Longitudinal observational study from the Memory and Disorder unit at the University Hospital Mutua de Terrassa (Barcelona-1), Dominantly Inherited Alzheimer Network (DIAN), Parkinson's Progression Markers Initiative (PPMI), Vanderbilt Memory and Aging Project (VMAP) and Stanford ADRC.
Samples were recruited from eight multi-ethnic cohorts.European participants were identified based on principal component analyses (PCA) were used in first stage.In total, 797 samples from Knight ADRC, 676 samples from ADNI, 435 samples from ACE, 187 samples from Barcelona-1, 172 samples from DIAN, and 779 samples from PPMI, 135 participants from VMAP, and 169 individuals from Stanford ADRC were included (Table 1).In the multi-ethnic fine mapping analyses, PCA identified non-European samples including 90 from knight ADRC, 40 from ADNI, 8 from ACE, 6 from Barcelona-1, 31 from DIAN, 38 from PPMI, 7 from VMAP and 30 from Stanford ADRC were analyzed (Table S6).

Knight ADRC
Charles F. and Joanne Knight Alzheimer Disease Research Center (Knight ADRC), housed at Washington University in St. Louis, is one of 30 ADRCs funded by NIH.The goal of this collaborative research effort is to advance AD research with the ultimate goal of treatment or prevention of AD.The subjects included in this study are from the Memory and Aging Project (MAP) supported by Knight ADRC.As part of the project, subjects undergo annual psychometric testing and interviews along with biennial or triennial PET, MRI and CSF collection.Further details on Knight ADRC and MAP can be found at https:// knigh tadrc.wustl.edu/.In our discovery stage analyses, 797 EUR samples including 178 (22.33%) of AD cases and 619 (77.67%) cognitive normal controls (hereinafter refers as controls) were from MAP cohort (Table 1).In multi-ethnic fine mapping, a total of 90 non-EURs samples including 12 (13.33%) of AD cases and 78 (86.67%) controls were from MAP cohort (Table S6).

ADNI
ADNI was launched in 2003 as a public-private partnership, led by Principal Investigator Michael W. Weiner, MD.The primary goal of ADNI has been to test whether serial magnetic resonance imaging (MRI), positron emission tomography (PET), other biological markers, and clinical and neuropsychological assessment can be combined to measure the progression of mild cognitive impairment (MCI) and early Alzheimer's disease (AD).For up-to-date information, see www.adni-info.org.EURs samples (n=676) including 512 (75.74%)AD cases and 164 (24.26%) controls were included in discovery stage, non-EURs samples (n=40) including 26 (65%) of AD cases and 14 (35%) controls from ADNI were used in multi-ethnic fine mapping (Table S6).

ACE
The ACE study [61] comprises 4120 AD cases and 3289 control individuals.Cases were recruited from ACE Alzheimer Center Barcelona, Institut Català de Neurociències Aplicades (Catalonia, Spain).Diagnoses were established by a multidisciplinary working group, including neurologists, neuropsychologists, and social workers, according to the Diagnostic and Statistical Manual of Mental Disorders-IV criteria for dementia and to the National Institute on Aging and Alzheimer's Association's (NIA-AA) 2011 guidelines for defining AD.Control individuals were recruited from three centers: ACE (Barcelona, Spain), Valme University Hospital (Seville, Spain), and the Spanish National DNA Bank Carlos III (University of Salamanca, Spain) (www.banco adn.org).EURs samples (n=435) including 238 (54.71%)AD cases and 197 controls (45.29%) from ACE were used in discovery stage.non-EURs (n=8) samples including 4 (50%) of AD cases and 4 (50%) of controls were used in multi-ethnic fine mapping (Table S6).

Barcelona-1
Barcelona -1 [62] is a longitudinal observational study consisting of ~300 subjects at baseline carried out in the Memory and Disorder unit at the University Hospital Mutua de Terrassa, Terrassa, Barcelona, Spain.Cases include subjects diagnosed with AD dementia (ADD), non-AD dementias (non-ADD), mild cognitive impairment (MCI), or subjective memory complaints (SMC).Clinical information was collected at baseline as well as longitudinally and lumbar puncture (LP) and amyloid PET were performed if subjects had diagnosis of MCI, early-onset dementia (<65 years), or dementia with atypical clinical features.Our discovery stage in EURs (n=187) included 59 (31.55%) dementia cases and 128 (68.45%) controls from Barcelona-1.The multi-ethnic fine mapping stage in non-EURs included 6 (100%) controls.

DIAN
The Dominantly Inherited Alzheimer Network (DIAN), led by Washington University School of Medicine in St. Louis, is focused on the study of Autosomal Dominant AD (ADAD).It is a family-based long-term observational study with standardized clinical and cognitive testing, brain imaging, and biological fluid collection (blood, cerebrospinal fluid) from subjects with the intent of identifying changes in pre-symptomatic and symptomatic gene carriers who are expected to develop AD.Since the focus of this study is on ADAD, which has an early age of onset compared to sporadic AD, the subjects in this cohort are younger on average compared to other cohorts.The data used in this study is from data freeze 15 (DF15).Additional details on DIAN can be found at https:// dian.wustl.edu/.118 (68.6%)ADAD cases and 54 (31.4%) controls were included in discovery stage.Multi-ethnic fine mapping stage included 22 (70.97%) of ADAD cases and 9 (29.03%) of controls.

PPMI
The Parkinson's Progression Markers Initiative (PPMI) [63] is an observational, international study designed to identify clinical, imaging, genetic, and biospecimen Parkinson's diease (PD) progression markers.This study is a public-private partnership of academic researchers, The Michael J. Fox Foundation for Parkinson's Research (MJFF), and pharmaceutical and biotech industry partners.The overall goal of PPMI is to investigate novel methods to establish longitudinal PD cohorts to examine clinical, imaging, genetic, and biospecimen PD progression markers that individually or in combination will rapidly demonstrate interval change in PD patients in comparison to Healthy Controls (HC) or in sub-sets of PD patients defined by baseline assessments, genetic mutations, progression milestones, and/or rate of clinical, imaging, or biospecimen change.460 (59.05%)PD cases and 319 (40.95%) prodromal cases and controls were used in discovery stage.Multi-ethnic fine mapping stage included 27 (71.05) of PD cases and 11 (28.95%)controls.

VMAP
The Vanderbilt Memory and Aging Project (VMAP), established in 2012, is a longitudinal study investigating vascular health and brain aging.At baseline, participants complete a physical and frailty examination, fasting blood draw, neuropsychological assessment, echocardiogram, cardiac MRI and brain MRI.The detailed information can be found at https:// www.vumc.org/ vmac/ vande rbiltmemory-aging-proje ct.In discovery stage, 51 (37.78%) dementia cases and 84 (62.22%) controls were included.Multi-ethnic fine mapping stage included three (42.86%) of dementia cases and four (57.14%) of controls.

SomaScan and MSD for CSF sTREM2
CSF samples were collected after an overnight fast, processed, and stored at −80 °C for SomaScan assay.CSF sTREM2 for Knight ADRC, ADNI, ACE, Barcelona-1, and DIAN was measured at once using SomaScan v4.1 7K.Whereas the level of CSF sTREM2 for PPMI and Stanford ADRC was assayed separately using SomaScan v4 5K.In-house Meso Scale Discovery (MSD) assay was used to quantify CSF sTREM2 in VMAP cohort [64].
SomaScan is a multiplexed, single-stranded DNA aptamer-based platform from SomaLogic (Boulder, CO) [65].Instead of physical units, the protein level was quantified using relative fluorescent units (RFU).To mitigate nuisance variation introduced by the readout, pipetting errors, and inherent sample variation, SomaLogic performed sample level normalization within a plate including hybridization control normalization, intraplate median signal normalization, and median signal normalization to an external reference.The adaptive normalization by maximum likelihood (ANML) was applied for the median signal normalization to an external reference.Finally inter-plate calibration based on calibrator samples was performed to remove plate bias.The details for these normalization procedures were described by Candia J [66].
For both normalized SomaScan readouts and the MSD measures, we further removed outlier datapoints defined as log10 transformed RFU level fell outside of either end of 1.5-fold of interquantile range (IQR).
In order to harmonize and handle heterogeneity due to different platforms, due to different platforms, we performed CSF sTREM2 levels normalization using log10 transformation as well as by Z-score, that was done separately for each platform.As SomaScan data was obtained at the same time for the five cohorts (Knight-ADRC, ADNI, DIAN, ACE, and Barcelona-1), Z-scores were calculated for all these five cohorts together.Z-scores were calculated separately for each of the remaining three cohorts (PPMI, Stanford, and VMAP).Then data across all cohorts were jointly analyzed.We also examined the sentinel and the likely functional variants in each locus in each cohort separately, to determine if there is any heterogenetic in each locus.

Overlapping samples with the previous study
Our previous study [27] used ELISA-based sTREM2 measures, whereas the current study used sTREM2 measures with SomaScan, except for VMAP cohort that used Meso Scale Discovery assay (MSD).In our previous study, we performed genome-wide analysis with ADNI cohort (n=813), identifying the MS4A locus.The six cohorts (Knight ADRC, DIAN, GHPH, SPIN, Clinic-IDIBAPS, and GHDEM, n=580, Table S19) were then used to replicate the signals at the MS4A locus.In this current study, we considered all eight cohorts (n=3,350, Table 1) for genome-wide analysis.This GWAS sample size corresponded four times larger than our previous study (3,350 vs 813), by which we were able to replicate MS4A locus and identify additional three loci.There were three cohorts that had sample overlaps (ADNI, n=598; Knight ADRC, n=125; and DIAN, n=41) between this and the previous study.While four cohorts (GHPH, SPIN, Clinic-IDIBAPS, and GHDEM, n=344) that were in our previous study, were not included in this study as sTREM2 was measured using ELISAs.There were four cohorts (ACE, Barcelona1, PPMI, VMAP and Standford ADRC) that were included in this study but not in the previous study.

Genotyping and imputation
Genotypes from the eight cohorts were from different platforms.1) Five different arrays including the Illumina CoreExome-24 (CoreEx), Global Screening Array-24 (GSA), NeuroX2, OmniExpress-24 (OmniEx), and Human660W-Quad (X660W) were used by Knight ADRC.ADNI utlized OmniEx.ACE were genotyped with the Affymetrix Axiom.Genotypes of Barcelona-1 was measured by GSA and NeuroX2.DIAN was genotyped by CoreEx.Only autosomal genetic variants were included in our analyses.The genotypes of these 5 cohorts were quality controlled (including gender check) using PLINK v1.90b6.26[69] and imputed by our group.Before imputation, the variants and individuals with call rate of <98%, individuals with sex inconsistencies, as well as variants with Hardy-Weinberg equilibrium (HWE) P<1 x 10 -6 , were excluded.The GRCh38/ hg38 coordinates based imputation was performed using TOPMed Imputation Server (August 2021).After imputation, imputed variants with Rsq < 0.3 were removed from the data and the hybrid data was created by replacing remaining imputed genotype with actual genotype if available; 2) We obtained whole-genome sequencing data in VCF format (aligned to build GRCh38/hg38) from PPMI.The variants and individuals with call rate of <98% were removed.3) Whole-genome sequencing data in binary plink format (aligned to GRCh38/hg38) from Stanford ADRC and variants with call rate of <95% and Hardy-Weinberg equilibrium (HWE) P<5 x 10 -8 were removed by Stanford site.4) VMAP samples were genotyped on the Illumina Infinium Expanded Multi-Ethnic Genotyping Array (MEGAX) chip on genome build GRCh37.Prior to imputation, Vanderbilt site removed variants with call rate <95% or minor allele frequency (MAF) < 1%.Samples with call rates <99% or exhibited an inconsistency between reported and genetic sex were removed.Variant positions were lifted over to genome build GRCh38 and Imputation was performed on the TOPMed Imputation Server, using Minimac4 and Eagle for phasing.Imputed genetic data were filtered for imputation quality (Rsq>0.8),biallelic SNPs, and MAF >0.01.
The genotypes of the eight cohorts were merged onto one dataset using PLINK v1.90b6.26.We obtained the pairwise genome-wide estimates of proportion identityby-descent using PLINK v1.90b6.26[69].Unanticipated duplicates and cryptic relatedness (Pihat ≥0.20) were identified and the unrelated samples with the higher number of variants were selected.The variants with minor allele count (MAC) > 10 were included in our analyses.The 10 principal components and genetic ancestry (3,350 European and 250 non-European) were calculated using PLINK1.90b6.26[69].APOE ε2, ε3, and ε4 isoforms were detected by genotyping rs7412 and rs429358.

Statistical methods
In first stage GWAS analyses, single-variant association with CSF sTREM2 was conducted jointly for eight cohorts which contain 3,350 European.Before conducting the single-variant association analyses, we performed the pairwise comparison of CSF sTREM2 between females and males in EURs, as well as in non-EURs using a two-tailed Student's t-test.We found sex differences in sTREM2 levels, with lower sTREM2 levels in females compared to males (P=0.029;Fig. S14) in Europeans, as reported previously [21].While the direction was consistent (lower sTREM2 in females), this difference was not significant in non-Europeans.To control for the sex differences, and other possible confounding factors that can cause biased estimate, additive linear regression model in PLINK v2.0 [69] including age at CSF draw, sex, genotype platforms/cohorts, and 10 principal components were included as covariates.To determine whether the genetic signals demonstrate the consistent effects for CSF sTREM2 across cohorts, we also performed association analyses of each of 8 cohorts separately using the same additive linear regression model accounting for age at CSF draw, sex and 10 principal components.
To identify the independent variants at each locus, we performed stepwise conditional analyses using PLINK v2.0 [69].In brief, the top SNP of each locus was included as a covariate in the first round, and if any SNP remains significant (P<5 ×10 -8 ) after the first round, it will be added in the covariate list.This will be repeated until no significant SNP identified at the locus.
Because of linkage disequilibrium (LD) block, GWAS loci identified in European ancestry contain both causative SNPs and the variants in LD with them.Leveraging the fact that the population evolution in different ethnic group creates allelic heterogeneity and LD block variations, cross-ancestry fine-mapping can be useful for pin-pointing the likely functional variants.The GWAS signals shared by multi-ancestry are more likely the functional causal variants.To achieve this goal, we performed additive linear regression model for 250 non-European participants jointly.Since the number of non-Europeans in each cohort does not have sufficient power, we did not analyze eight cohorts separately.
Finally, A fixed effect meta-analyses of EURs and non-EURs was performed using METAL [70].The significant signals were determined as 1) P< 5 × 10 -8 in stage 1; 2) P<0.05 in stage 2 and the concordant direction of effect estimation as in stage 1; 3) P< 5 × 10 -8 in transancestry meta-analyses and the concordant direction of effect estimate between stage 1 and stage 2.
To identify whether our chromosome 19 variants is significant for AD and NECTIN2 after adjusting for two APOE SNPs, we downloaded the AD GWAS [30] and MetaBrain eQTL [37] and performed conditioning analyses using GCTA [73].
We also utilized our internal pQTLs for proteins assayed in SomaScan 7K to annotate and interpret our findings.The UCSC genome browser session (hg19) containing the processed ATAC-seq, ChIP-seq, and PLACseq datasets for each brain cell type was used for the brain cell type specific enhancer-promoter interaction map annotation [34].

Mendelian randomization
To investigate the role of CSF sTREM2 in AD risk, we conducted Mendelian randomization (MR) analyses.MR has been widely used to determine the causal relations between modifiable exposures and disease.We used twosample MR implemented in R package TwoSampleMR [75] version 0.5.5 to test whether CSF sTREM2 is causal for AD risk.
The latest AD GWAS results [8] were considered for disease outcome.First, non-palindromic SNPs with association P < 5 ×10 -8 for CSF sTREM2 were selected and independent genetic variants was identified using clump_ kb=10000 and clump_r2=0.1.These variants were used as instruments and were extracted from AD GWAS results (Table S17).After data harmonization step, twosample MR applied MR Egger, Weighted median, Inverse variance weighted (IVW), Simple mode and Weighted mode to estimate the causal effect of CSF sTREM2 on AD risk.Additionally we performed MR pleiotropy residual sum and outlier (MR-PRESSO) tests.We used the following strategy to select the most appropriate MR tests.1) If global horizontal pleiotropy was confirmed by MR-PRESSO, the outlier SNPs were removed from the instrument lists and corrected P was selected from MR-PRESSO; 2) If there was no evidence of global horizontal pleiotropy and there was significant egger intercept, MR-Egger test was selected; 3) If both global horizontal pleiotropy and egger intercept were not significant, inverse variants weighted meta-analysis (IVW), aggregating all of single-SNP causal effects, was used instead.P<0.05 in MR analyses was considered significant.

Polygenic risk score (PRS) analysis
To calculate PRS for AD, we downloaded the summary statistics of the largest GWAS study for AD [8].The PRSice-2 was then used to calculate the PRS [77].First, PRSice-2 utilized "C+T" method to choose the AD risk variants that is clumping and retaining independent SNPs with the smallest P in a 250-kb window according to LD r 2 <0.1, as well as P thresholding such as 5×10 -8 , 5×10 -5 , 0.05 and 0.5.The AD risk variants in our analyses have P<5×10 -8 .Second, using effects of these AD risk variants as weights, PRS for AD was calculated as weighted sum of the risk allele for our samples.Finally, multivariate linear regression was used to assess the association of this PRS and CSF sTREM2 accounting for age at CSF draw, sex, genotype platforms/cohorts, and 10 principal components.

Tissue specificity of Identified genetic loci
To determine whether the identified genetic loci are CSF specific, we examined the association of these loci with sTREM2 in plasma based on 35,559 Icelanders (Table S4) [78].

Sensitivity analyses of impacts of disease status
Due to large heterogeneity across different instruments (or platforms) that measure CSF Aβ42 and pTau levels, we used Mclust function of "mclust" R package (version 5.4.6) via Gaussian mixture models to classify samples into biomarker negative (A-T-) and positive (A+T+) based on CSF amyloid beta42 (Aβ42) and phosphorylated tau-181 (pTau) levels [62,[79][80][81][82][83].For each cohort and platform and cut-off values (z-score and raw value in pg/ml) that had been obtained previously [79], along with reported values (if any), were listed in Table S20.Briefly, in Knight ADRC, CSF Aβ42 was assayed in INNOTEST (Fujirebio US, Inc, Malvern, PA), and pTau were measured using the LumiPulse G platform (Fujirebio US, Inc, Malvern, PA).A total of 948 subjects were included for classification.A cutoff of z-score -0.33 was obtained for Aβ42 corresponding to a raw value of 527 pg/mL.Samples below 527 were considered Aβ42 positive.A cutoff of z-score = 0.73 was obtained for pTau corresponding to a raw value of 58.9 pg/ml.Samples above 58.9pg/ml were considered pTau positive.
In ADNI, CSF Aβ42 was assayed in xMAP (Millipore-Sigma, Burlington, MA) and pTAU was measured using Elecsys (F.Hoffmann-La Roche Ltd, Switzerland).Based on 749 subjects, a z-score cutoff of 0.60 was identified for Aβ42, corresponding to a raw value of 196 pg/mL.Samples below 196 were considered Aβ42-positive.For pTau, a z-score cutoff of 0.197 was identified, corresponding to a raw value of 27.8.Samples above 27.8 were considered to be pTau-positive.
In ACE, both CSF Aβ42 and pTau were measured using LumiPulse for 632 samples.A z-score cutoff of 0.468 was identified for Aβ42, corresponding to a raw Aβ42 value of 856 pg/mL.Samples below 856 were considered Aβ42-positive.A z-score cutoff of -0.018 was identified for pTau, corresponding to a raw value of 68.Samples with a value greater than 68 were considered pTau-positive.
In Barcelona-1, both CSF Aβ42 and pTau were measured using ELISA for 231 samples.A z-score cutoff of 1.04 was identified for Aβ42, corresponding to a raw Aβ42 value of 1325 pg/mL.Samples below 1325 pg/mL were considered Aβ42-positive.A z-score cutoff of -0.163 was identified for pTau, corresponding to a raw value of 58.Samples above 58 were considered to be pTau-positive.
In DIAN, LumiPulse was used for both Aβ42 and pTau.For Aβ42, in 478 samples, a z-score cutoff of -0.198 was identified, corresponding to a raw Aβ42 value of 517 pg/ mL.Samples below 517 were considered to be Aβ42positive.For pTau, samples with a value greater than 51.8 were considered pTau-positive.
To examine the impact of disease status on our signals, we performed the following sensitivity analyses:

PheWAS analyses of four identified genetic loci
For the four sentinel variants, we performed proteinwide association analyses for 7,027 aptamers using additive linear regression model in PLINK v2.0 [69] with age at CSF draw, sex, genotype platforms/cohorts, and 10 principal components as covariates.The significant associations were defined by P< 5 × 10 -8 .Additional pathway enrichment analyses were conducted using Enrichr (https:// maaya nlab.cloud/ Enric hr/) with the genes assayed in SomaScan 7K as backgrounds.

Isolation, culture and differentiation of peripheral blood mononuclear cells (PBMCs)
PBMCs were isolated immediately after blood draw via density gradient centrifugation using Ficoll-Paque

Microglia cell lines
In addition to PBMC-derived macrophages, we have performed experiments using the immortalized human microglial clone 3 cell line, HMC3 (ATCC ® CRL-3304).This cell line has been used in multiple studies to investigate intracellular TREM2 biology but not sTREM2 derived from HMC3 cells [84][85][86].Although we were able to successfully knockdown our genes of interest in HMC3 cell line by transient transfection with siRNAs, we were unable to detect any sTREM2 in the conditioned medium even after the medium was concentrated prior to ELISA experiments (Data not show).Hence, HMC3 cell line proved to be unsuitable, to study the dynamics of sTREM2 due to the extremely low levels of sTREM2 in the conditioned medium.) for NECTIN2 knockdown experiments.However, regardless that we used four NECTIN2 shRNAs (TL316689VA, TL316689VB, TL316689VC and TL316689VD) with MOIs 1, 2 and 5, we were unable to achieve consistent and robust decrease in NECTIN2 protein levels.The cDNAs and shRNAs were in pLenti-C-Myc-DDK-P2A-Puro and pGFP-C-shLenti plasmids, respectively.All the lentiviruses were obtained from Ori-Gene Technologies (Rockville, MD, USA).

Enzyme-linked immunosorbent assay (ELISA)
Conditioned medium was harvested on DIV 9 from PBMC-derived macrophages.Conditioned medium was centrifuged at 2000 x g for 10 min at 4°C to remove cells and membrane debris.Next, conditioned medium was transferred into fresh tubes and stored at -80°C until used for ELISA.Human TREM2 ELISA kits (EH464RB, Thermo Scientific) were used to measure soluble TREM2 levels in conditioned medium.All the samples were run in duplicates in each assay and each batch was normalized to its respective controls.
Quantifications of intracellular NECTIN2 protein levels.The cells were transduced with NECTIN2 shRNAs using MOI of 1 in 6 independent batches (A, B, C, D, E and F) and MOI of 2 in 4 independent batches (H, I, J and K).Each batch includes 3-4 wells transduced with control shRNA and 3 wells transduced with target NECTIN2 shRNAs B and C. Additional file 2. Table S1.Association results of CSF sTREM2 by eight cohorts.Table S2.Association results of CSF sTREM2 in Joint (with VMAP), Meta (with VMAP), and Meta (No VMAP).Table S3.Genetic regulation of the four index variants based on six GWAS analyses.Table S4.Association of sTREM2 in CSF and plasma at four loci.Table S5.Summary of association results in the European individuals (EUR) and non-European (nonEUR) individuals (P<5e-8 in EUR).Table S6.Characteristics of sample by cohorts for non-European participants.Table S7.Linkage Disequilibrium Matrix among 8 chromosome SNPs.Table S8.Interaction of rs583791 and rs6591561 for CSF sTREM2.Table S9.Interaction of rs583791 and rs6591561 for AD.Table S10.Association of chromosome 11 SNPs with AD endophenotypes and AD risk.Table S11.Colocalization analyses with AD GWAS.Table S12.Colocalization analyses with eQTLGen cis-eQTLs.Table S13.Colocalization analyses with GTEx cis-eQTLs.Table S14.
Colocalization analyses with MetaBrain cis-eQTLs.Table S15.Association of two variants in the NECTIN2 locus with expression of the five genes.Table S16.Association results of rs11666329 and rs57537848 with AD and NECTIN2 cortex expression before and after conditioning on APOE.Table S17.Instrument SNPs used in two sample MR analyses.Table S18.Association of four sentinel variants with other CSF proteins.Table S19.Comparison of cohorts used by our previous (Deming et al 2019) and our current paper.Table S20.The Z-score cut-off and their corresponding raw values determined using Gaussian mixture models.

Fig. 2
Fig.2Association results of CSF sTREM2 at chromosome 11.A LocusZoom plots at chromosome 11 in European ancestry (EURs) for the sentinel SNP rs72918674 and the secondary signal rs10897026 after conditioning on the sentinel SNP.X-axis depicts genomic coordinates at chromosome 11 and y-axis denotes the negative log10-transformed P value for each genetic variant.B Linkage disequilibrium (LD) heatmap of chromosome 11 SNPs in EURs and non-EURs.C Forest plots of effect size estimates by cohort for rs72918674 (MS4A6A intron), rs10897026 (MS4A4A intron), rs583791 (MS4A6A, p.A112T) and rs6591561 (MS4A4A, p.M178V).Heterogeneity P is 0.36 for rs72918674, 0.41 for rs583791, 0.77 for rs10897026, and 0.79 for rs6591561 respectively.D Summary of association results of two independent SNPs and 6 missense variants in the MS4A gene region from EURs.SNP references single-nucleotide polymorphism and rsID denotes reference SNP cluster ID (rsID), according to dbSNP build 155.Gene is annotated based on Ensembl Variant Effect Predictor (VEP) release 106.Annotation is the definition of identified SNP as Top Hit, Secondary, or amino acid changes.N is the sample size in GWAS of European (EURs) samples.β in EURs is effect estimated in EURs samples.P in EURs is two-sided raw P values in EURs samples.β in non-EURs is effect estimated in non-European (non-EURs) samples.P in non-EURs is two-sided raw P values in non-EURs samples.E Effect of epistasis between rs583791 (MS4A6A, p.A112T) and rs6591561 (MS4A4A, p.M178V) on CSF sTREM2 levels and Log of Alzheimer's disease (AD) Odds ratio.X-axis is dosage of rs583791 (MS4A6A, p.A112T) coded based on the copy of C allele.Y-axes are Z-score of CSF sTREM2 and Log of AD Odds ratio.The color is based on the dosage of rs6591561 (MS4A4A, p.M178V) coded based on the copy of G allele.The effect allele is T for rs72918674, C for rs583791, G for rs6591561, and C for rs10897026.The association of the first variant rs583791 is much stronger in individuals with CC genotype of the second variant rs6591561, as shown in the steep black line, when compared to that for those with TT genotype

Fig. 3
Fig.3 Association results of CSF sTREM2 at chromosome 6.A Top LocusZoom plots at chromosome 6 in European ancestry (EURs) for the sentinel SNP rs12664332 and bottom one is the secondary signal rs142232675 conditioning on the sentinel SNP.X-axis depicts genomic coordinates at chromosome 3 and y-axis denotes the negative log10-transformed P value for each genetic variant.B Forest plots of effect size estimated by cohort for rs142232675 p.D87N, rs75932628 p.R47H, and rs143332484 p.R62H.Heterogeneity P is 0.57 for rs75932628, 0.63 for rs142232675, and 1.0 × 10 -1 for rs143332484 respectively.C Violin plots of CSF sTREM2 Z -score Residuals vs. genotype of rs142232675 p.D87N, rs75932628 p.R47N, and rs143332484 p.R62H

Fig. 4
Fig. 4 Association results of CSF sTREM2 at chromosome 3 and in vitro functional validation using PBMC-derived macrophages.A Forest plots of effect size estimated by cohort for rs73823326.The effect allele is T for rs73823326.Heterogeneity P is 0.75 for rs73823326.B Violin plots of CSF sTREM2 Z-Score Residuals by genotypes of rs73823326.C UCSC genome browser visualization of Microglia and Neurons specific assay for transposase-accessible chromatin with sequencing (ATAC-seq), H3K27ac Chromatin immunoprecipitation followed by sequencing (ChiP-seq), H3K4me3 ChiP-seq and proximity ligation-assisted ChIP-Seq (PLAC-seq) loops at the chr 3 RBMS3 -TGFBR2 locus.Chromatin loops linking the promoter of TGFBR2 to active gene-regulatory region close to rs73823314 (LD r2=1 with rs73823326 and P=2.54 x 10 -8 ) is specific in microglia.D Quantification of intracellular TGFBR2 (left panel), TREM2 (middle panel) and extracellular sTREM2 protein levels in PBMC-derived macrophages upon TGFBR2 overexpression.E Quantification of intracellular RBMS3 (left panel), TREM2 (middle panel) and extracellular sTREM2 protein levels in PBMC-derived macrophages upon RBMS3 overexpression.n = 15 from 4 independent experiments.F Quantification of intracellular TGFBR2 (left panel), TREM2 (middle panel) and extracellular sTREM2 (right panel) protein levels in PBMC-derived macrophages upon TGFBR2 knockdown.n = 9 from 3 independent experiments.ns: not significant, ** p < 0.01, **** p < 0.0001.Results are shown in mean ± SEM

Fig. 5
Fig. 5 Association results of CSF sTREM2 at chromosome 19.A LocusZoom plots at chromosome 19 before conditional analysis and conditional analyses on Apolipoprotein E (ApoE) haplotype.Linkage disequilibrium (LD) estimates used our data.X-axis depicts genomic coordinates at chromosome 19 and y-axis denotes the negative log10-transformed P value for each genetic variant.B Linkage disequilibrium (LD) heatmap of chromosome 19 SNPs and 2 APOE snps.C Forest plots of effect size estimated by cohort for rs111666329.The effect allele is G for rs11666329.Heterogeneity P is 0.75 for rs11666329.D Association results of rs11666329 for CSF sTREM2, CSF Aβ42, CSF pTAU, CSF NECTIN2, cortex NECTIN2, Microglia NECTIN2, Blood NECTIN2, AD age at onset, and AD risk.E-F NECTIN2 was overexpressed in PBMC-derived macrophages to validate the genetic findings.E quantification of intracellular NECTIN2, F TREM2 and G extracellular sTREM2 protein levels.n = 15 from 4 independent experiments.ns: not significant, * p < 0.05, **** p < 0.0001

Fig. 6
Fig. 6 Mendelian randomization (MR) analyses for CSF sTREM2 on AD risk.A Scatter plot of SNP effects on outcome against SNP effects on exposure.The lines represent the causal estimate using 5 method: MR Egger, Inverse variance weighted (IVW), Simple mode, Weighted mode, and Weighted median.B Results of MR analyses for the different MR models.C Box plots of CSF sTREM2 Z Score by four cohorts and AT status macrophages were transduced at DIV 3 after medium change by adding the lentivirus into the cell culture medium.Multiplicity of infection 1 (MOI 1) was used for all lentiviruses.The following lentiviruses were used: TGF beta Receptor II (NM_001024847) Human Tagged ORF Clone Lentiviral Particle (RC223209L3V), RBMS3 (NM_001003793) Human Tagged ORF Clone Lentiviral Particle (RC211441L3V), Nectin 2 (NECTIN2) (NM_002856) Human Tagged ORF Clone Lentiviral Particle (RC200286L3V), Lentiviral Control Particles (PS100092V), TGF beta Receptor II Human shRNA Lentiviral Particle (shRNAs TL308851VB and TL308851VD; Locus ID 7048, TL308851V), Lentiviral shRNA Control Particles (TR30021V).In addition, we used Nectin 2 (NECTIN2) Human shRNA Lentiviral Particle (Locus ID 5819, TL316689V

Fig. S12.
Dot plots and circular plots of proteins associated with rs72918674 at chromosome 11 and rs11666329 at chromosome 19.Fig. S13.Scatter plots of CSF sTREM2 measured using MSD vs SomaScan in Knight ADRC.Fig. S14.Box plots of CSF sTREM2 Z Score by ethnicity and sex.

Table 1
Characteristics of sample by cohorts for EURs participants ). Table S6 included characteristics of these non-European individuals.
Alzheimer's Disease Research Center (ADRC) (https:// med.stanf ord.edu/ adrc.html), one of thirtyone ADRC, aims to translate research advances into improved diagnosis and care for people with AD and related disorders.The ultimate goals are to prevent and cure AD. 46 (27.22%) dementia cases and 123 (72.78) controls were used in discovery stage.Multi-ethnic fine mapping included 7 (23.33%) of dementia cases and 23 (76.67%) of controls.