Skip to main content

Table 2 Summary of studies on geneticsof human gene expression: Analyses, results, conclusions.

From: Gene expression endophenotypes: a novel approach for gene discovery in Alzheimer's disease

Reference

Reference ID

Analytic Approach

Results

Conclusion

Yan et al., 2002

[28]

Comparison of relative allelic expression levels within the same cellular sample.

Significant differences in allele-specific expression observed for 6 of 13 genes. Mendelian inheritance detected for expression levels, inherited together with genetic markers.

Gene expression levels can be used to detect genetics of disease susceptibility.

Schadt et al., 2003

[29]

eQTL (expression quantitative trait locus) linkage analysis.

Differential expression detected for 7,861 of 23,574 (>33%) genes in the parental and ≥10% genes in the F2 strain. 9-16% genes have eQTLs with LOD scores > 4.3. Gene expression profiling identified three distinct expression patterns for 280 genes that distinguish mice at the lower 25th percentile of an obesity trait (fat-pad mass = FBM) and two groups at the upper 25th percentile of the FPM trait. These 280 genes were enriched for eQTLs. Linkage analysis of the obseity trait focused on groups with distinct expression patterns improved the signal.

Gene expression levels can be used to identify more refined disease sub-groups, genes and pathways that are implicated in the disease phenotype. These have implications in understanding genetics of complex diseases and drug discovery aimed at more homogeneous sub-groups of distinct expression patterns.

  

eQTL (expression quantitative trait locus) linkage analysis.

18,805 (77%) genes with differential expression. Of these, 6,481 genes with ≥ 1 eQTL with LOD score > 3.0. Total of 7,322 eQTLs. Interactions detected in <10% of eQTL.

 
  

Variance components analysis to test heritability.

2,726 genes with differential expression (11%). Of those 29% have a detectable heritability.

 

Cheung et al., 2003

[30]

Utilized 3-4 replicate measurements per person. Calculated variance ratio of each gene expression by dividing the variance of expression levels among subjects by that within subjects (using replicates).

50% of genes on the arrays are expressed in the LCLs. 813 genes with valid observations had variance ratios of 0.4-64. 5 genes evaluated in larger group and found to have highest variance among unrelateds, then sibs then monozygotic twins (10 pairs).

There is natural variation to gene expression levels which is at least in part determined genetically. Genetic differences among individuals may account for variations in gene expression and suggest underlying heritability.

Morley et al., 2004

[31]

Variance of expression levels detected from 94 unrelated subjects. 3554 expression phenotypes with greater variation between subjects than within subjects (replicates) were used for further analysis. Genome-wide linkage analysis conducted for these phenotypes in 14 CEPH families.

Found 984 expression phenotypes with pointwise linkage p < 0.05 genome-wide, more than the 178 false positives expected by chance alone. 142 phenotypes have pointwise p < 0.001, which exceeds chance (3.5 false positives expected). Of the top 142, 27 have a cis-(within 5 Mb) and 110 have a trans-regulator. Of the 984, 164 have multiple regulators (152 with multiple trans and both cis+trans for 12). There are linkage regions with multiple expressions linking to it, called "hotspots". Genes that map to one hotspot have expression levels with higher than expected correlations. Some of these genes have close physical locations. Some cis-SNPs show differential allelic expression.

Genetic factors that influence variation in human gene expression can act in cis (5 Mb) or trans. There are transcriptional "hotspots", which may contain "master regulators" of multiple genes. Mapping genetic factors that influence gene expression could help with the understanding of human biology and disease.

Monks et al., 2004

[32]

Variance components analysis to test heritability and eQTL analysis. Comparison of biological pathways within GO and KEGG, using genes clustered by genetic correlations (GC) and Pearson's correlations (PC). Correction for multiple testing with Bonferroni and false discovery rate (FDR).

2,430 of 23,499 genes differentially expressed in ≥50% of children. Of these, 762 were heritable with FDR of 0.05 and median heritability of 0.34. These genes were enriched for immunity-pathways. 22 genes have significant eQTLs at genome-wide level. Did not detect "hotspots". 574 genes analyzed for GC and PC showed that both clusters have similar pathway coherence for GO, but GC has better pathway coherence for KEGG pathways.

Genetic factors influence gene expression in LCL. Important to test other tissues. Random samples may not have transcriptional "hotspots". Gene expression genetics may identify novel biological pathways.

Cheung et al., 2005

[33]

Follow-up association study for the significant linkage findings from Morley et al. (2004). Linear regression association for 374 expression levels with prior evidence of cis-linkage, using SNPs near linkage peaks (± 50 kb). Expression GWAS (eGWAS) for 27 top cis-linkage phenotypes, using >770,000 SNPs. Comparison of prior linkage and association results.

65 of 374 expression levels have ≥1 SNP that associates at p < 0.001, 12 with p < 1E-10 and 133 with p < 0.01. Same proportions of associations found for the 5', 3' and genic regions. 14 out of top 27 cis-linkage regions showed genome-wide associations. 12 of those were cis only, 1 was cis+trans and 1 was trans only. One gene with strong cis-linkage and association was submitted to two functional assays which confirmed presence of a functional variant that influenced gene expression by modifying strength of RNA polymerase II binding.

Strong linkage predicts strong association for expression levels. eSNPs NOT enriched for 5' or 3' end. eGWAS is feasible and may lead to genetic determinants of expression phenotypes.

Stranger et al., 2005

[34]

Analyzed 374 of 630 genes with expression signals above background and most variable. Linear regression association for these genes (688 probes in total). Three methods for multiple-test correction: Bonferroni, FDR and permutations.

Good concordance between the 3 multiple test correction methods. For 10-40 of 374 genes, cis-SNPs (1 Mb from genomic midpoint of gene) are detected at genome-wide level by ≥1 statistical method. Only 3 trans hits were observed which are more likely to be false positive.

eGWAS can identify variants with regulatory activity

Stranger et al., 2007 (Nature Genetics)

[35]

Linear regression association analysis in 4 ethnic populations. Heritability estimates in Caucasian and Yoruba trios. Tested for significance by 10,000 permutations and FDR. Candidate trans-SNP analysis (SNPs with cis-effects, non-synonymous, splicing, microRNA SNPs).

10% (4,829) and 13% (6,482) of all probes analyzed has heritability >0.2 in CEU and YRI trios, respectively, with 958 overlapping genes. 154 CEU and 217 YRI genes have heritability >0.5, with overlap of 9 genes. 831 genes with significant cis in at least 1 population; 310 in at least 2; 62 in all four. Most detected genes have heritability estimates above 0.2. Pooling populations captures some additional genes with smaller effect size. Most cis-associations are in genic and immediate intergenic regions. 108 genes with significant trans association in ≥1 population, 16 genes in ≥2 and 5 in all 4 populations. Most trans-SNPs also have cis-effects. CEU population had most divergent expression profile from other populations, likely due to age of cell lines. There were 60 cell lines that were measured on 2 different arrays, which had high correlations in overlapping results, suggesting transcript measurements are stable across different experiments, measurement times and platforms.

There is a substantial number of heritable expression traits detectable in small population (30 trios), but also substantial non-genetic variation. Substantial overlap between different ethnic groups for significant eSNPs. Ethnic differences could in part be due to differences in SNP frequencies. Most eSNPs act in-cis.

Stranger et al., 2007 (Science)

[36]

Linear regression association analysis in 4 ethnic populations. Cis-SNPs defined as 1 Mb from midpoint of probe and cis-CNVs within 2 Mb. Permutation-based p values ≤ 0.001 deemed significant. Overlap with Nature Genetics study unclear.

Of 14,072 genes, 888 have ≥1 SNPs with significant association in ≥1 ethnic group, 331 of which were significant in ≥2 ethnic groups and 67 of which in all 4 populations. Of 14,072 genes, 238 have ≥1 CGH clone with significant association in ≥1 ethnic group, 28 of which were significant in ≥2 ethnic groups and 5 of which in all 4 populations. Not all CGH clones have detectable CNVs. 1322 CNV clones detected. 99 genes associate with ≥1 CNV clone, in ≥1 ethnic group, 34 of which with ≥2 ethnic groups, and 7 in all 4 populations. Most CNV associations cannot be detected by SNPs (87%).

Both SNP and CNV associations replicate across ethnic groups. CNVs appear to exert their effects by disrupting both regulatory regions as well as the genic regions. Survey of structural variants in addition to SNPs is important in eGWAS.

Dixon et al., 2007

[37]

Variance components analysis to test heritability and eQTL analysis on the subset of transcripts with heritabilities > 0.3. Multiple testing corrections by FDR.

No significant differences between asthmatics and non-asthmatics (unchallenged cells). 15,084 transcripts (28%) = 6,660 genes have heritabilities > 0.3. Traits with higher heritability have SNPs that explain a bigger percentage of their heritability and therefore also have a larger lod score of association (on average peak SNP explains 18.2% heritability). SNP interactions could explain transcript levels not explained by single SNPs. Trans effects weaker than cis. Highly heritable traits enriched for chaperonins, heat shock proteins, cell cycle progression, RNA processing, DNA repair, immune response.

Joint analysis of disease GWAS and eGWAS identified potential candidate genes for asthma (ORMDL3), Crohn's disease (PTGER4), NIDDM (PHACS), thalassemia (HBS1L). eGWAS is a useful approach to detect disease SNPs with a functional role.

Goring et al., 2007

[38]

Variance components analysis to test heritability and eQTL analysis. Cis-QTL = multipoint lod score at the location nearest the underlying structural gene. Trans-QTL = located on a different chromosome than its transcript. Multiple testing correction by FDR. Tested HDL-C concentrations for correlations with cis-regulated transcripts to identify cis-SNPs that also influence HDL-C.

16,678 transcripts (84.9%) were heritable with median heritability estimate of 22.5%. RefSeq transcripts have higher heritability estimates than non-RefSeq transcripts. At an FDR of 5%, identified 1,345 cis-regulated transcripts (6.8%) with median effect size of 24.6%. More significant cis- than trans-QTLs. Strongest QTLs tend to be cis. Identified a functional cis-SNP in VNN1 that associate with its expression and HDL-C levels.

Lymphocytes may provide more accurate representation of natural gene expression state than lymphoblasts, though there is overlap. Cis-regulation more stable across studies, tissues and stronger. No evidence of master regulators in this study.

Emilsson et al., 2008

[39]

Correlations between obesity traits and blood and adipose tissue expression levels. Variance components analysis to test heritability and eQTL analysis. Linear regression association analysis. Multiple testing correction by FDR. Generated connectivity matrix of genes with high correlation of expression in adipose tissue, compared human and mouse data, identified GO categories enriched for co-regulated genes.

Adipose tissue expression levels (63-72%) correlate better with obesity traits than do blood expression levels (3-9%). 55% of blood and 75% of adipose tissue transcripts are significantly heritable, with average heritability of 30%. 2,529 (12%) significant cis-eQTLs in blood, and 1,489 (7%) in adipose tissue. >50% of significant adipose tissue cis-eQTLs also significant in blood. Traits with higher heritability of greater reproducibility. Much less significant trans-eQTLs. No evidence of master-regulators. 2,714 (12.9%) significant cis-SNPs in blood and 3,364 (16%) in adipose tissue. Identified genes that are correlated in both human and mouse adipose tissue and enriched in macrophage activation pathways. cis-eSNPs for the expression traits in this network also influence obesity traits.

Significant overlap in genetic factors underlying gene expression in two different tissue types, but expression levels from clinically-relevant tissue correlates better with clinical-phenotypes. Expression correlation networks combined with cis-eSNPs could potentially identify genes/pathways underlying complex clinical phenotypes.

Schadt et al., 2008

[40]

Linear regression association analysis. Cis eQTL defined as being 1 Mb from transcription start or stop site of the gene. Multiple testing corrected by Bonferroni or FDR approaches. Compared significant eQTL results to those from published disease GWAS for Type 1 diabetes and coronary artery disease.

At Bonferroni adjusted p < 0.05, 1,350 expression traits (1,273 genes); at FDR <10%, 3,210 traits (3,043 genes) identified to have at least one significant cis eSNP, which explain 2-90% expression variation. Of the blood and adipose expression traits present on the liver expression microarrays, 30% had cis eQTLs that overlapped with the 3,210 significant liver cis eQTLs. Trans eQTLs significant at Bonferroni p < 0.05 were 242 traits (236 genes), and at FDR <10% were 491 traits (474 genes). Identified SORT1 and CELSR2 as candidate genes for coronary artery disease and LDL cholesterol levels, and RPS26 for Type 1 diabetes.

Evidence of common genetic control between tissues as well as tissue-specific genetic control of expression. Significant trans eQTLs only a fraction (15%) of cis eQTLs. Increase in sample size bigger impact on power than increasing genetic coverage by SNPs. Cis eQTLs combined with expression networks in humans and rodents and known biological pathways (such as KEGG) may help identify disease-susceptibility genes in regions of LD. Not all disease-SNPs will be eSNPs but strong expression association for disease-SNPs provides additional confidence for the candidates.

Myers et al., 2007

[41]

Linear regression association analysis. Cis eSNPs defined as being within the gene or 1 Mb its 3' or 5' end. Multiple testing corrected by permutation approaches.

58% of the transcriptome has expression in ≥5% of control brains. Of these 21% correlate with a cis or trans eSNP. 433 significant cis eSNPs (99 transcripts), and 16,701 significant trans eSNPs (2,876 transcripts). Enrichment of significant cis vs. trans associations maximized within ~70 kb of transcripts. MAPT cis eSNPs with alleles on the major haplotype (H1) are associated with higher transcript levels. Few common results with lymphoblast eGWAS.

Evidence for genetic control of human brain gene expression. Brain eSNPs may be used in conjunction with disease-SNPs for neurologic or psychiatric illnesses to identify functional variants.

Webster et al., 2009

[42]

Linear regression association analysis. Cis eSNPs defined as being within the gene or 1 Mb of its 3' or 5' end. Analyzed cases and controls both separately and jointly. In the combined analysis, tested for diagnosis effects on expression by comparing model with diagnosis only vs. one with diagnosis, SNP and diagnosis × SNP interaction. Multiple testing corrected by permutation approaches. Network analysis was done on the transcripts with a significant eQTL (p ≤ 0.01) and those without a significant eQTL but were differentially expressed between ADs and controls.

58% of the transcriptome has expression in ≥5% of AD brains. Hybridization date and APOE had strongest influence and post-mortem interval least influence on brain expression levels. 1,829 significant cis eSNPs in the combined sample and 656 trans eSNPs. 27% of all eQTLs with significant interaction term with diagnosis. 37% of cis eSNPs that interact with diagnosis overlap with those found in just the control brains (Myers et al). 18% overlap between previous report and cis+trans effects without diagnosis interaction and 9% overlap for those with interaction. Effect size of eSNPs that are closer to the transcription start sites are larger. Identified clusters of transcripts that are enriched for certain ontology groups and that contain "hub" genes with expression levels that correlate with many other transcripts.

Transcriptome measurements in disease-relevant tissue is important. Brain transcriptome appears to be unique. eQTLs may be used as biomarkers for classifying preclinical subgroups. eQTL approach may help distinguish true disease risk variants. Using tissue from subjects with disease may be needed to capture most eSNPs that have disease interactions, though significant eSNPs without disease interactions and some with disease interactions can be identified in control and disease tissue equally well.