Replication of EPHA1 and CD33 associations with late-onset Alzheimer's disease: a multi-centre case-control study

Background A recently published genome-wide association study (GWAS) of late-onset Alzheimer's disease (LOAD) revealed genome-wide significant association of variants in or near MS4A4A, CD2AP, EPHA1 and CD33. Meta-analyses of this and a previously published GWAS revealed significant association at ABCA7 and MS4A, independent evidence for association of CD2AP, CD33 and EPHA1 and an opposing yet significant association of a variant near ARID5B. In this study, we genotyped five variants (in or near CD2AP, EPHA1, ARID5B, and CD33) in a large (2,634 LOAD, 4,201 controls), independent dataset comprising six case-control series from the USA and Europe. We performed meta-analyses of the association of these variants with LOAD and tested for association using logistic regression adjusted by age-at-diagnosis, gender, and APOE ε4 dosage. Results We found no significant evidence of series heterogeneity. Associations with LOAD were successfully replicated for EPHA1 (rs11767557; OR = 0.87, p = 5 × 10-4) and CD33 (rs3865444; OR = 0.92, p = 0.049), with odds ratios comparable to those previously reported. Although the two ARID5B variants (rs2588969 and rs494288) showed significant association with LOAD in meta-analysis of our dataset (p = 0.046 and 0.008, respectively), the associations did not survive adjustment for covariates (p = 0.30 and 0.11, respectively). We had insufficient evidence in our data to support the association of the CD2AP variant (rs9349407, p = 0.56). Conclusions Our data overwhelmingly support the association of EPHA1 and CD33 variants with LOAD risk: addition of our data to the results previously reported (total n > 42,000) increased the strength of evidence for these variants, providing impressive p-values of 2.1 × 10-15 (EPHA1) and 1.8 × 10-13 (CD33).

Two recently published companion studies by Hollingworth et al. [20] and Naj et al. [17] performed metaanalysis of two large GWAS datasets (n > 75,000). Besides APOE, CLU, PICALM, and CR1, the meta-analyses revealed association at ABCA7 (p = 5 × 10 -21 ), MS4A6A (p = 1.2 × 10 -16 ), MS4A4E (p = 1.1 × 10 -10 ), EPHA1 (p = 6 × 10 -10 ), CD2AP (p = 8.6 × 10 -9 ) and CD33 (p = 1.6 × 10 -9 ). In addition, the two datasets revealed opposing association (Naj et al. OR = 0.93, p = 0.001; Hollingworth et al. OR = 1.06, p = 0.03) of the variant near ARID5B (rs2588969) with LOAD, suggesting potential heterogeneity at this locus. In this study, we genotyped the variants identified at the CD2AP, EPHA1, and CD33 loci in our independent case-control dataset comprising six case-control series (n = 6,835). To assess the opposing associations at the ARID5B locus, we also genotyped the two ARID5B variants included in the Hollingworth et al. study. Genotypes from our follow-up case-control series (Mayo 2) for variants in ABCA7, MS4A6A and MA4A4E were included in Stage 3 of the Hollingworth et al. study, so we have not included these three variants in this study. We have performed meta-analyses of five variants (at CD2AP, EPHA1, ARID5B and CD33 loci) in our six case-control series, which showed no significant series heterogeneity. Furthermore, we have performed logistic regression analysis of our pooled series adjusting for covariates. Finally, we have used a Fisher's combined test to evaluate the significance of the association of these five variants in our data combined with the data in the Hollingworth et al. and Naj et al. studies.

Results
We genotyped five variants (CD2AP; rs9349407, EPHA1; rs11767557, ARID5B; rs2588969 and rs4948288, CD33; rs3865444) in our independent follow-up case-control series (Mayo2) from three North American and three European Caucasian series. Detailed information about these samples is shown in Table 1 and genotype counts are shown in Table 2. Samples used in this study do not overlap with those included in the Naj et al. study and have not been included in any of the published LOAD GWAS. The Mayo2 dataset included in the Hollingworth et al. publication only included genotypes for ABCA7, MS4A6A and MA4A4E.
Meta-analyses of allelic association in the six Mayo2 series performed using a DerSimonian-Laird random effects model ( Figure 1 As shown in Figure 1c and 1d, we also observed significant association for both ARID5B variants (rs2588969, OR = 1.08, p = 0.046; rs4948288, OR = 1.11, p = 0.008) with ORs comparable to those reported by Hollingworth et al. (OR = 1.06 and 1.07, respectively) and in the opposing direction to those reported by Naj et al. for rs2588969 (Stage 1+2 OR = 0.93, p = 7.7 × 10 -4 ). As shown in Figure 1a and 1e, we did not observe significant association for CD2AP (OR = 0.98, p = 0.76) or CD33 (OR = 0.96, p = 0.32) in our meta-analyses. Breslow-Day tests provided no significant evidence that the ORs for any of these variants were heterogeneous among our series (all p > 0.25), as shown in Figure 1. The variant with the most heterogeneity was CD2AP (rs9349407) where the estimated percentage of variation due to heterogeneity across studies (I 2 ) was 25.1% (95% CI 0%-70%) suggesting the presence of some heterogeneity for that variant.
To adjust for important covariates, we included ageat-diagnosis/entry, sex and APOE ε 4 dosage in logistic regression analyses of all five variants in each of the six Mayo2 series; in our analysis of all Mayo2 series combined, series was included as an additional covariate. Table 3 shows the results for the six Mayo2 series combined (Mayo follow-up) as well as for each of the six individual Mayo2 series. For the purpose of comparison, we have also included in Table 3 the published GWAS The number of LOAD patients (AD) and controls (CON), mean age-at-diagnosis, percentage that are female and percentage that possess at least one copy of the APOE ε 4 allele are given for each individual series. Mean age is given as age at diagnosis/entry with the standard deviation (SD) from the mean in parentheses. results for the same variants. Adjustment for covariates revealed comparable ORs to those obtained in the metaanalyses, with improved p-values for the EPHA1 (OR = 0.87, p = 5 × 10 -4 ), CD33 (OR = 0.92, p = 0.049) and CD2AP (OR = 0.97, p = 0.56) loci. However, the associations of the ARID5B variants were no longer significant following adjustment for covariates (rs2588969: OR = 1.05, p = 0.30, rs4948288: OR = 1.07, p = 0.11) suggesting that these associations may be dependent upon the series, age-at-diagnosis/entry, sex and/or APOE ε 4 dosage of the individual. In order to estimate the overall association of these five variants in our data combined with the previously published associations, we used Fisher's method to combine the p-values for all associations (Table 3; Mayo2/ ADGC/Hollingworth). We found that adding our data to those previously reported, increased the strength of evidence for all variants as LOAD risk modifiers (CD2AP: p = 6.5 × 10 -11 , EPHA1: p = 2.1 × 10 -15 , ARID5B rs2588969: p = 2.3 × 10 -9 , ARID5B rs4948288: p = 4.0 × 10 -4 , CD33: p = 1.8 × 10 -13 ).

Discussion
We report here successful replication of the association of two variants with LOAD in a large (n = 6,835), independent case-control study; rs11767557, which is located 3 kb upstream of EPHA1 (p = 5 × 10 -4 ) and rs3865444, which is located 373 bp upstream of CD33 Although our meta-analyses showed successful replication of the association of the ARID5B variants rs2588969 (OR = 1.08, p = 0.046) and rs4948288 (OR = 1.11, p = 0.008) with a direction of association consistent with that reported by Hollingworth et al. (OR = 1.06 and 1.07, respectively), the associations did not survive adjustment for age-at-diagnosis/entry, sex and APOE ε 4 status (p = 0.30 and 0.11, respectively). This covariate-dependent association could explain the opposing association reported by Naj et al. in their discovery (OR = 0.88) and replication (OR = 1.05) datasets for rs2588969; the only ARID5B variant they tested. Therefore, while estimation of the p-values for association of the ARID5B variants in all datasets combined were highly significant (rs2588969; p = 2.3 × 10 -9 and rs4948288; p = 4.0 × 10 -4 ), interpretation of these associations should be treated with caution and should take into account the age-at-diagnosis/entry, sex and APOE ε 4 dosage of the populations. Finally, although the estimated p-value for association of rs9349407 (located in intron 1of CD2AP) in all datasets was 6.5 × 10 -11 , there was no evidence for association of this variant in our dataset alone (OR = 0.97, p = 0.56).
Our Mayo2 collection of case-control series studies provided a total of 2,634 LOAD and 4,201 controls. Combining across studies to perform global tests of significance for additive genotypic trend tests gave us 80% power to detect ORs ranging from 1.17 (or 0.855) for variants with a minor allele frequency (MAF) of 0.2 to 1.13 (or 0.883) for variants with a MAF of 0.45 in controls. The study provided approximately 60% power to detect the OR of 1.11 that we report for CD2AP (MAF = 0.27).
Case-control studies such as this are not designed to ascertain whether the variants with reported association with LOAD risk are the functional variant but they can identify a linkage disequilibrium (LD) block within which a truly functional variant may reside. Our results indicate that the EPHA1 and CD33 variants represent excellent candidates for targeted deep sequencing or high density genotyping in order to define the locus further, followed by subsequent functional studies of nearby genes to elucidate the mechanism behind these associations. With the exception of rs9349407, which lies within intron 1of CD2AP, all of these variants lie within intergenic regions but for ease of the reader, we have thus far only referred to the nearest gene for each variant. This by no means signifies that these variants (or the functional variants in LD with them) are assumed to affect the expression or function of the nearest gene but may affect other nearby genes. Until it is known which gene underlies these associations, all nearby genes should be included in follow-up functional investigation (all genes that reside within 100 kb of these variants are listed in Additional file 1, Table S1).

Conclusions
Taken along with our previous publications [5,18,20,21], we have now performed follow-up association studies of 25 of the top GWAS-identified candidate LOAD genes and successfully replicated the association of eleven variants (in or near ABCA7, BIN1, CD33, CLU, CR1, EPHA1, GAB2, LOC651924, MS4A6A/4E and PICALM), eight of which are currently ranked in the top ten (after APOE) on AlzGene. This recent success in replicating genetic association highlights the utility of multiple, large case-control follow-up studies to confirm the novel associations reported by large GWAS, thus confirming them as good candidate genes for functional follow-up studies.

Ethics statement
Approval was obtained from the ethics committee or institutional review board of each institution responsible for the ascertainment and collection of samples. Written informed consent was obtained for all individuals that participated in this study.

Case-control subjects
The Mayo2 case-control series consisted of Caucasian subjects from the United States ascertained at the Mayo Clinic Jacksonville, Mayo Clinic Rochester, or through the Mayo Clinic Brain Bank. Additional Caucasian subjects from Europe were obtained from Norway [22], Poland [23], and from six research institutes in the United Kingdom that are part of the Alzheimer's Research UK (ARUK) Network. Although the ARUK samples used in this follow-up do not overlap with

Statistical Analyses
Meta-analysis of allelic association and Breslow-Day tests were performed using StatsDirect v2.5.8 software.
Meta-analyses were performed using the results from each individual case-control series. Summary ORs and 95% CI were calculated using the DerSimonian and Laird (1986) random-effects model [24]. Breslow-Day tests were used to test for heterogeneity between populations. PLINK software [25]

Additional material
Additional file 1: Table S1. Genes located within 100 kb of the five variants tested in this study.