Replication of progressive supranuclear palsy genome-wide association study identifies SLCO1A2 and DUSP10 as new susceptibility loci

Background Progressive supranuclear palsy (PSP) is a parkinsonian neurodegenerative tauopathy affecting brain regions involved in motor function, including the basal ganglia, diencephalon and brainstem. While PSP is largely considered to be a sporadic disorder, cases with suspected familial inheritance have been identified and the common MAPT H1haplotype is a major genetic risk factor. Due to the relatively low prevalence of PSP, large sample sizes can be difficult to achieve, and this has limited the ability to detect true genetic risk factors at the genome-wide statistical threshold for significance in GWAS data. With this in mind, in this study we genotyped the genetic variants that displayed the strongest degree of association with PSP (P<1E-4) in the previous GWAS in a new cohort of 533 pathologically-confirmed PSP cases and 1172 controls, and performed a combined analysis with the previous GWAS data. Results Our findings validate the known association of loci at MAPT, MOBP, EIF2AK3 and STX6 with risk of PSP, and uncover novel associations with SLCO1A2 (rs11568563) and DUSP10 (rs6687758) variants, both of which were classified as non-significant in the original GWAS. Conclusions Resolving the genetic architecture of PSP will provide mechanistic insights and nominate candidate genes and pathways for future therapeutic intervention strategies. Electronic supplementary material The online version of this article (10.1186/s13024-018-0267-3) contains supplementary material, which is available to authorized users.


Background
Progressive supranuclear palsy (PSP) is a Parkinsonian neurodegenerative disorder that presents with predominant 4R tauopathy in basal ganglia, diencephalon and brainstem with associated neuronal loss and fibrillary gliosis [1,2]. Although PSP is largely considered to be a sporadic disorder, cases with suspected familial inheritance and cases carrying pathogenic mutations have been reported; e.g. mutations in the MAPT gene, encoding the tau protein, have been associated with PSP phenotypes [3]. In addition, the common MAPT H1 haplotype is established as the major genetic risk locus for PSP [4][5][6].
The only unbiased genome-wide association study (GWAS) to date in PSP was performed in a total cohort of 2165 PSP patients and 6807 controls [7]. The discovery-replication design confirmed the MAPT locus as the most strongly associated genetic risk factor (OR = 5.46; P = 1.5E-116). The study also identified three novel loci associated with disease susceptibility MOBP (OR = 0.72; P = 1.0E-16), STX6 (OR = 0.79; P-value = 2.3E-10) and EIF2AK3 (OR = 0.75; P-value = 3.2E-13). Follow-up studies have attempted to draw more specific associations of these variants with the risk of PSP. Sequencing of the coding regions of the GWAS implicated genes in 84 PSP cases was mainly negative with the exception of a rare, predicted damaging STX6 p.C236G mutation that remains of unknown relevance [8] and the association of the EIF2AK3 haplotype B, known to be in LD with rs7571971, with the risk of PSP [9].
Due to the relatively low prevalence of PSP, large sample sizes can be difficult to achieve, and this can result in a GWAS having less than desirable power to detect biologically meaningful associations at the genome-wide statistical significance threshold. Thus, maximizing sample size (the number of PSP patients in particular) is imperative in order to reduce the likelihood of obtaining false-negative findings for true genetic risk factors for PSP, and meta-analytic studies are an effective way to accomplish this. Therefore, in the current study we have included a new cohort of 533 pathologically-confirmed cases and 1172 controls, genotyped the top variants identified in the original GWAS (P < 1E-4), and performed a combined analysis with the original GWAS data in order to attempt to confirm the previously reported genes and also identify additional candidates.

Study sample
This study included 533 pathology-confirmed PSP patients, all confirmed negative for MAPT mutations. These patients have donated their brains to the Mayo Clinic brain bank for neurodegenerative disorders. It should be noted the Mayo Clinic brain bank receives cases from across the United States and thus may house a small number of cases that overlap with longitudinal clinical studies. Neuropathologic diagnosis was rendered by a single neuropathologist (DWD) and followed published criteria for PSP [10]. Clinical and demographic information was collected from available medical records. Study controls were approximate age-(±5 years) and gender-matched 1172 clinical volunteers (~1:2 case-control ratio) who were observed not to have a neurodegenerative or neurological condition within the Department of Neurology, Mayo Clinic. Additionally, the control population included samples from 106 pathologic-defined control subjects that did not have significant neuropathology suggestive of disease and that have a Braak stage < 3. All PSP patients and controls were unrelated and of European ancestry which was determined by extracting the self-reported ethnicity from medical records. A thorough review of the new PSP cases was performed to manually include only cases that were not part of the original PSP GWAS. Study subjects were recruited through protocols approved by the Mayo Clinic institutional review board.

SNP genotyping
Selection of follow-up SNPs was performed from the publicly available PSP GWAS results (https:// www.niagads.org/datasets/ng00045). The selection was made based on these criteria: 1. Include only the European ancestry analysis results because as previously mentioned the new pathology-confirmed PSP cohort is exclusively composed of Caucasian samples, 2. Include all unique loci with an arbitrary cut-off of P < 10 − 4 in the joint analysis (N = 31), and 3. Since our new PSP cohort is all pathology-confirmed cases we considered it to be comparable to the stage 1 GWAS cohort and therefore reasonable to include additional unique loci with P > 10 − 4 in the joint analysis but with P < 10 − 4 in the stage 1 analysis (N = 9). To select only unique chromosomal regions, the list of variants was carefully examined to identify genomic regions by grouping SNPs closely located and in linkage disequilibrium (LD). LD data was obtained from HaploReg v4 (http://archive.broadinstitute.org/mammals/haploreg/haploreg.php) and the 1000 Genomes (Utah Residents (CEPH) with Northern and Western Ancestry). When a region was represented by more than one SNP, we selected the one with highest association score. For MAPT, we also included the MAPT rs242557 to account for the association of PSP risk and 'H1c' haplotype.
In total, 37 SNPs representing the 36 different loci were genotyped in the new pathology-confirmed PSP cohort and controls (Table 1). To prevent genotyping errors, we selected one more SNP that was in high LD with each of the GWAS SNPs (proxy) and genotyped them in parallel on the MassArray genotyping platform. The rs242557 variant tagging the MAPT H1c subhaplotype was added to the study and it is routinely genotyped by TaqMan assay in our laboratory. rs11532787 variant did not fit the iPlex design and it was genotyped by Sanger sequencing. MassArray assays were run using the iPlex Gold chemistry (Agena Bioscience, San Diego, CA).

Statistical analysis
For the analysis of the new PSP patients and controls that were included in the current study, associations between SNPs and risk of PSP were evaluated using logistic regression models that were adjusted for age and sex in PLINK version 1.07. To be consistent with the original GWAS, the presented odds ratios (ORs) and 95% confidence intervals (CIs) are based on the additive effect of an additional major allele (and only the population of controls are considered in determining the major allele). We used the excessive missingness as a genotype quality control excluded individuals that were missing 80% or more of the 37 genotyped SNPs, call rates for all SNPs were > 95%, and SNPs with HWE P-values<.001 in controls were excluded from further analyses. A meta-analysis was performed using fixed-effects models in METAL [11] by combining the genotypes of the selected 37 SNPs of our new data of 533 PSP patients and 1172 controls and genotypes of the sample previously included in the PSP GWAS [7]. Only samples of Caucasian European ancestry were included and METAL used the combined Stage1 and Stage 2 values for meta-analysis. The standard error approach was employed, weighting individual project beta coefficients by the standard error. The threshold for genome-wide significance in our meta-analysis was considered to be P ≤ 5 × 10 − 8 ; all statistical tests were two-sided.

Functional annotation
A possible cis-effect of the significant SNP as eQTL (expression quantitative locus) was studied by querying the publicly available GTEx, Braineac, Regulome and PhenGen databases for expression and regulatory information associated with the SNP and all the markers with r2 > 0.8 (GWAS signal set) in Haploreg and 1000Genomes. Circular representation of genomic information associated with SNPs was done using the Package "circlize" for R [12].

Results
Ten subjects were excluded for low call rates, leaving 526 cases and 1167 controls. In total, our combined analysis of 37 variants included 934 clinical and 1764 pathology-confirmed PSP (2698 PSP patients in total) and 8019 controls [7]. Characteristics of this new PSP cohort are shown in Table 1. For the 37 genetic variants that were included in this study, associations are shown in Table 2 for the original GWAS, our independent patient-control series from the current study, and the meta-analysis of the original GWAS and our new data. A summary and graphic representation of the results of meta-analysis are presented in (Fig. 1). Eight variants were found to be genome-wide significantly associated with PSP (P < 5E-8). The six top significant variants were in MAPT (rs8070723 (H1-H2 SNP); and rs242557 conditioned on rs8070723), MOBP (rs1768208), IRF4 (rs12203592), EIF2AK3 (rs7571971), and STX6 (rs1411478). Of note, despite displaying a genome-wide significant association in the original GWAS, the association at IRF4 was not emphasized in that study due to a potential age-related bias. The association with PSP for IRF4 rs12203592 was also genome-wide significant in our combined analysis however this effect was solely driven by the initial GWAS and our newly genotyped samples did not contribute to the association (OR = 1.09, P = 0.40). Thus, age may be influencing the allelic frequency of rs12203592 in the control population and requires further study to resolve the association of this variant with PSP. Interestingly, our combined analysis identified two novel genome-wide significant associations, the first for SLCO1A2 rs11568563 (OR = 0.67, P = 5.3E-10) and the second for DUSP10 rs6687758 (OR = 0.8, P = 1.14E-8); both of these variants displayed associations with PSP in the previous GWAS that did not quite meet genome-wide significance. Importantly, associations for these two variants replicated in the independent patient-control series utilized in the current study (SLCO1A2: OR = 0.60, P = 0.0004; DUSP10: OR = 0.80, P = 0.017), where effect sizes and patient-control allele frequencies were very similar to those from stages 1 and 2 of the GWAS.
The top SNP in the SLCO1A2 region (rs11568563), encodes a non-synonymous p.E172D mutation which may impact SLCO1A2 expression and/or function. To determine the association of SLCO1A2 rs11568563 with abnormal gene expression of SLCO1A2 or neighboring genes, we queried available gene expression databases for associated expression patterns and expanded this search to include all the markers in high LD with this variant (rs11568563 GWAS signal set, Table 3). The rs11568563 GWAS signal set was found to have 4 more intronic SLCO1A2 variants (rs145667214, rs7966334, rs74651308 and rs79424089) and one intergenic variant close to the SLCO1A2 3'UTR (rs188509290) (Additional file 1: Figure  S1A), all of which were found to have a modest    non-significant effect on the regional brain expression of 17 genes (Additional file 2: Figure S1B). However, none of these 6 SNPs were found to have a significant eQTL effect, or significant regulatory functions on these genes. We did observe that five of these genes were found to be highly expressed in the brain, with three of these genes being mainly expressed in the brain (SLCO1A2, C12 orf39 and SLCO1C1), and in regions affected in PSP (including the substantia nigra, caudate, putamen, and nucleus accumbens). A similar approach was taken to study the possible functional impact of rs6687758, the top SNP in our second newly identified genome-wide associated region. This variant is located in an~223 Kb intergenic region between DUSP10 (~250 Kb upstream) and TRT-TGT2-1 (~473 Kb downstream) that contains 6 uncharacterized   sequences and 3 pseudogenes. The rs6687758 variant is an eQTL for the expression of three large intergenic non-coding RNAs (lincRNAs): RP11-815 M8.1 (p 1.2E-15 in lung and 2.5E-5 in blood), RP11-400 N13.2 (3.3E-13 in lung) and LINC01655 (2.7E-8 in lung) but the expression of these sequences is negligible in the brain. The rs6687758 GWAS signal set included 50 variants (Table 4 and Additional file 2: Figure S2A) and none of them locate directly in well characterized, coding genes (Additional file 2: Figure S2B). This GWAS signal set was found to be associated with differential brain expression of eight genes (c1orf140, LOC100287182, DUSP10, HHIPL2, MIA3, AIDA, C1orf58, and FAM177B). Additionally, the effect of rs6687758 may be mediated by other LD variants.

Discussion
This study confirms the known association of loci at MAPT, MOBP, EIF2AK3 and STX6 with the risk of PSP and reveals novel associations with SLCO1A2 and the intergenic rs6687758 SNP. Additional analysis were performed in the associated variants and their GWAS signal sets (variants in high LD with the GWAS variants) to gather functional evidence that could explain the observed association. We consider that the association of SLCO1A2 rs11568563 with PSP is likely to be mediated by the non-synonymous p.E172D change that this variant induces. The p.E172D change affects a relatively conserved region (phastCons = 1, PhyloP = 4.143) of the 4th transmembrane domain of the organic anion-transporting polypeptide SLCO1A2, and the change is predicted to be probably damaging and deleterious by Polyphen and SIFT respectively [13]. Solute carrier organic anion transporters (SLCOs), also known as organic anion-transporting polypeptides (OATPs) facilitate the uptake of drugs in specific organs and therefore they influence absorption, distribution and elimination of drugs, xenobiotics, hormones and toxins. SLCO1A2 is the most important SLCO in the human brain, and the expression information collected from GTEx and Braineac confirms that it is highly expressed in the brain and in brain regions that are targets for tauopathy. Zhou et al. had shown previously that the rs11568563 minor allele is associated with low expression of SLCO1A2 in the brain [6]. Specifically, the p.E172D change has been found to reduce the transport of known SLCO substrates [14] that is independent of protein expression or glycosylation but seems to be the result of altered SLCO1A2 cell surface trafficking and final localization to the plasma membrane.
Recently, a variation in SLCO1A2 (rs73069071) has been associated with cortical Aβ deposition in AD-related cognitive impairment and temporal lobe atrophy and they proposed that this variant may be a modifier of Aβ deposition on AD-related neurodegeneration [15]. This variant is not in LD with rs11568563 but both SNPs are relatively close to each other in the SLCO1A2 locus (139,057 bp distance): rs73069071 is located in the intron 2 and rs11568563 is located in the exon 7 (NM_134431). Additionally, rs73069071 maps in an intronic region in the islet amyloid polypeptide, IAPP, which has been previously implicated in Alzheimer's disease (AD) etiology [16]. The proximity of IAPP to SLCO1A2 is due to IAPP genomic sequence being encoded in the complementary strand that spans from SLCO1A2 intron 1 to intron 2. Studies suggest that IAPP expression is under the influence of rs11568563. Recent studies have shown that IAPP is an important regulator of apoptosis and autophagy [17], with both pathways linked to neurodegeneration. Since PSP is not associated with Aβ deposition but with tauopathy, it is possible that there is a more general role for SLCO1A2 in neurodegeneration and tau aggregation and that variation in SLCO1A2 and/or IAPP underlie differential proteinopathies.
The effect of the intergenic rs6687758 variant is also unclear as the nearest gene, DUSP10, is located~250 Kb away. In favor of its effect on DUSP10, a separate GWAS for colorectal cancer associated rs6687758 with DUSP10 [18,19]. rs6687758 is an eQTL for four lincRNAs (RP11-815 M8.1, RP11-400 N13.2, RP11-400 N13.3, and LINC01705) and it has been predicted to act as an enhancer. However, little is known about the function of these specific lincRNAs in the brain. Furthermore, the GWAS signal set associated with rs6687758 has 50 more variants mostly localized in a long intergenic region and with suggestive cis regulation of the neighboring genes DUSP10, HHIPL2 and FAM177B. If involved, DUSP10 may influence the accumulation of hyperphosphorylated tau, gliosis and synaptic/cognitive deficits due to the uncontrolled, hyperactivation of p38 and JNK kinases. However, a specific role of p38 and JNK in PSP will need to be elucidated since these MAPK pathways function in general cell signaling and are dysregulated in several neurodegenerative conditions.
Overall our strategy of expanding the PSP population, contrast it to age-and gender-matched controls and perform meta-analysis was successful in detecting genetic associations with PSP with a higher precision; however, this study cannot rule out that other variants that did not reach genome-wide significance are associated with PSP. Indeed, five of these variants had p-values lower than 1E-6 and replicated the direction of association observed in the GWAS ( Table 2). Two of these SNPs (rs197971 and rs2107272) had even lower p-values than the GWAS (2.7E-6 vs 6.7E-6 and 3E-6 vs 3.6E-5 respectively) but further studies in larger case-control populations will be required to support the association of these non-genome-wide significant variants. Although all subjects were self-reported Caucasian, without genomewide population control markers, and by focusing only on the 37 SNPs with the highest association in the previous PSP GWAS, our study cannot rule out population stratification influencing the observed results or explore the role of novel loci undetected by the original PSP GWAS.

Conclusions
In conclusion, we have performed a meta-analysis adding a new PSP cohort to the previous GWAS population which confirmed that the top GWAS variants retain significant association with PSP and identified two novel associations with SLCO1A2 and an intergenic rs6687758. Further studies are needed to understand the role of newly associated variants with PSP including the effect of SLCO1A2 variation in BBB function in PSP and the cis and trans regulatory effects of GWAS variants on gene networks associated with tauopathy.

Additional files
Additional file 1: Figure S1. SLCO1A2 rs11568563 GWAS signal set. LD Manhattan plot for rs11568563 in the 1000G phase3:CEU as visualized using Ensembl (A). Genomic location of genes neighboring rs11568563 GWAS signal set as visualized in UCSC Genome Browser (B) with customized tracks from top to bottom: R2 plot for rs11568563, proxy variants in strong LD (r2 > 0.8) with rs11568563, USCS genes found to have differential brain expression and their associated GTEx RNA-seq gene expression (brain expression in yellow). (TIF 2794 kb) Additional file 2: Figure S2. rs6687758 GWAS signal set. LD Manhattan plot for rs6687758 in the 1000G phase3:CEU as visualized using Ensembl (A). Genomic location of genes, predicted coding sequences and pseudogenes neighboring rs6687758 GWAS signal set as visualized in UCSC Genome Browser (B) with customized tracks from top to bottom: R2 plot for rs11568563, proxy variants in strong LD (r2 > 0.8) with rs6687758, USCS genes located in this region with the genes found to have differential brain expression highlighted in yellow and GTEx RNAseq gene expression (brain expression in yellow). (TIF 2231 kb)