- Open Access
Independent component analysis of Alzheimer's DNA microarray gene expression data
- Wei Kong†1, 2,
- Xiaoyang Mou†2,
- Qingzhong Liu3,
- Zhongxue Chen4,
- Charles R Vanderburg5,
- Jack T Rogers6 and
- Xudong Huang2, 6Email author
© Kong et al; licensee BioMed Central Ltd. 2009
- Received: 26 September 2008
- Accepted: 28 January 2009
- Published: 28 January 2009
Gene microarray technology is an effective tool to investigate the simultaneous activity of multiple cellular pathways from hundreds to thousands of genes. However, because data in the colossal amounts generated by DNA microarray technology are usually complex, noisy, high-dimensional, and often hindered by low statistical power, their exploitation is difficult. To overcome these problems, two kinds of unsupervised analysis methods for microarray data: principal component analysis (PCA) and independent component analysis (ICA) have been developed to accomplish the task. PCA projects the data into a new space spanned by the principal components that are mutually orthonormal to each other. The constraint of mutual orthogonality and second-order statistics technique within PCA algorithms, however, may not be applied to the biological systems studied. Extracting and characterizing the most informative features of the biological signals, however, require higher-order statistics.
ICA is one of the unsupervised algorithms that can extract higher-order statistical structures from data and has been applied to DNA microarray gene expression data analysis. We performed FastICA method on DNA microarray gene expression data from Alzheimer's disease (AD) hippocampal tissue samples and consequential gene clustering. Experimental results showed that the ICA method can improve the clustering results of AD samples and identify significant genes. More than 50 significant genes with high expression levels in severe AD were extracted, representing immunity-related protein, metal-related protein, membrane protein, lipoprotein, neuropeptide, cytoskeleton protein, cellular binding protein, and ribosomal protein. Within the aforementioned categories, our method also found 37 significant genes with low expression levels. Moreover, it is worth noting that some oncogenes and phosphorylation-related proteins are expressed in low levels. In comparison to the PCA and support vector machine recursive feature elimination (SVM-RFE) methods, which are widely used in microarray data analysis, ICA can identify more AD-related genes. Furthermore, we have validated and identified many genes that are associated with AD pathogenesis.
We demonstrated that ICA exploits higher-order statistics to identify gene expression profiles as linear combinations of elementary expression patterns that lead to the construction of potential AD-related pathogenic pathways. Our computing results also validated that the ICA model outperformed PCA and the SVM-RFE method. This report shows that ICA as a microarray data analysis tool can help us to elucidate the molecular taxonomy of AD and other multifactorial and polygenic complex diseases.
- Independent Component Analysis
- Principal Component Analysis Method
- Microarray Gene Expression Data
- Independent Component Analysis Method
- Support Vector Machine Recursive Feature Elimination
Since microarray technology can determine the expression levels of thousands of genes from a single array of chemical sensors, it has become a popular gene expression screening tool in the molecular investigation of various diseases. This technology allows for two main types of descriptive analyses: firstly, the identification of genes that may be responsible for a clinicopathological feature or phenotype, and secondly, the genomic classification of tissue.
Its ultimate goal is to improve clinical outcome by adapting therapy based on the molecular characteristics of human diseases such as a tumor [1, 2]. Various methods have been developed to accomplish these tasks. However, most methods only consider individual genes, making the results difficult for biologists to interpret due to the large number of genes, their complex underlying inter-gene dependency, and the high co-linearity among the gene expression profiles.
Therefore, to understand the coordinated effects of multiple genes, researchers need to extract the underlying features from the multi-variable dataset and thereby reduce dimensionality and redundancy inherent in the measured data. To extract these features, however, any microarray technology, to be truly effective, must address the issue of noise in the array systems that lead to imperfection in experimental design. Additionally, to discover functional modules involved in gene regulatory or signaling pathways, powerful mathematical and computational methods are needed for modeling and analyzing the microarray data of interest.
Two kinds of unsupervised analysis methods for microarray data analysis, principal component analysis (PCA) and independent component analysis (ICA), have been developed to accomplish the tasks. PCA projects the data into a new space spanned by the principal components. Each successive principal component is selected to be orthonormal to the previous ones and to capture the maximum information that is not already present in the previous components. The constraint of mutual orthogonality of components implied in classical PCA methods, however, may not be suitable for biological systems. Biological model components are usually statistically independent and without the constraint of orthogonality. Hence, ICA is well suited to biological data because it assumes that the gene expression data generated from the DNA microarray technology is a linear combination of some independent components having specific biological interpretations. Another useful advantage of ICA is that it does not use any training data and a priori knowledge about a parameter of its data filtering and mixing.
Hori in 2001 [3, 4] and Liebermeister in 2002  showed that the ICA model can effectively classify gene expressions into biologically meaningful groups and relate them to distinct biological processes. Thus ICA has been widely used in DNA microarray data analysis for feature extraction, clustering, and the classification of gene regulation analysis. Most published literature on the use of ICA analysis for microarray data are about yeast cells' cycle [6–8] and cancer data such as: ovarian cancer , breast cancer [10–13], endometrial cancer , colon and prostate cancer [15, 16], and acute myeloid leukemia , etc.
Although the exact causes of AD are not fully revealed, DNA microarray technique has been applied to AD-related gene profiling. However, in our knowledge, application of ICA in AD-related DNA microarray data analysis has not been reported before. Since ICA can both identify gene expression patterns and group genes into expression classes that might provide much greater insight into biological function and relevance, we employed ICA methods to uncover biologically meaningful patterns in AD microarray gene expression data. Herein, we present a new computational approach to reveal AD-related molecular taxonomy and to identify AD pathogenesis-related genes.
To perform ICA application in AD gene expression data analysis, we used a dataset from GEO DataSets deposited by Blalock et al that featured hippocampal gene expression from control and AD samples . The hippocampal specimens were obtained through the Brain Bank of the Alzheimer's Disease Research Center at the University of Kentucky. The human GeneChips (HG-U133A) of Affymetrix and Microarray Suite 5 were used in the microarray data collection. The procedures for total RNA isolation, labeling, and microarray were described in  and .
We excluded the samples with significant noise and chose 8 control and 5 severe AD samples for ICA application, with each sample containing 22283 gene expressions. In addition, since microarray data often yield "unregulated" genes, whose expression profile does not contain much information, we filtered out unregulated genes prior to applying ICA. Finally, to perform ICA, we selected 13 samples (8 control and 5 severe AD samples) and 3617 genes from each sample.
ICA Decomposed AD Microarray Data into Biological Processes
The main modeling hypothesis underlying the application of ICA to gene expression data analysis is that the gene expression level is determined by a linear combination of biological processes, many of which may up-regulate or down-regulate gene expression. It is assumed that these biological processes correspond to activation or inhibition of single pathways or a network of highly correlated pathways, and that each of these pathways only affects a relatively small percentage of all genes. Because of the statistical independence assumption inherent in the ICA inference process, we would expect ICA components to map closer to pathways.
ICA Improved Gene Clustering Results of AD Samples
Sample clustering by matrix A
ICA essentially seeks a new representation of the observed expression profile matrix Xwith the columns of matrix Arepresenting the new basis vectors (latent variables). Each row of Acontains the weight with which gene signatures contribute to observed expression profiles.
Sample clustering by reconstructed data
For the original data (Figure 3A), some of the control samples and severe AD samples have been clustered together, but the highest hierarchical split did not separate the two classes as would have been expected. For the data reconstructed by both PCA and ICA, the clustering results were greatly improved (Figure 3B and 3C). In PCA method, the first 10 components associated with a larger variance were selected to reconstruct data that captured most of the information (the cumulative contribution of their eigenvalues exceeded 95.5%) of original data whereas the remaining components with lower variance contained noise and were removed. The ICA method extracted m (13) gene signatures (rows of matrix S) that were mutually statistically independent as underlying biological processes. Each independent component was as sparse as possible, in which only a few relevant genes were significantly affected, leaving the majority of genes relatively unaffected. The filtering capacity of ICA was achieved by setting the entries in each gene signature with values that are less than the threshold = 0. Then, the reconstructed data gave a clearer clustering result to discriminate control and severe AD samples from original data.
ICA Identified Significant Genes for AD
Selected genes up-regulated in severe AD
adhesion molecule with Ig-like domain 2
B-cell translocation gene 1, anti-proliferative
CD44 molecule (Indian blood group)
CDC42 effector protein (Rho GTPase binding) 4
interferon-induced transmembrane protein 1 (9–27)
interferon-induced transmembrane protein 2 (1–8D)
interferon regulatory factor 7
interferon-induced protein 44-like
interleukin 4 receptor
interleukin-1 receptor-associated kinase 1
nuclear factor of kappa light polypeptide gene enhancer in B-cells inhibitor, alpha
calcium/calmodulin-dependent protein kinase (CaM kinase) II beta
calmodulin 1 (phosphorylase kinase, delta)
capping protein (actin filament) muscle Z-line, alpha 2
chromogranin B (secretogranin 1)
lactotransferrin/similar to lactotransferrin
myelin basic protein
secretagogin, EF-hand calcium binding protein
solute carrier family 24(sodium/potassium/calcium exchanger), member 3
solute carrier family 7, (cationic amino acid transporter, y+ system) member 11
zinc family member 1 (odd-paired homolog, Drosophila)
zinc finger and BTB domain containing 20
zinc finger protein 500
zinc finger protein 580
zinc finger protein 652
zinc finger protein 710
ribosomal protein S26/similar to 40S ribosomal protein S26
sorbin and SH3 domain containing 3
collagen, type XXI, alpha 1
C-terminal binding protein 1
capping protein (actin filament) muscle Z-line, alpha 2
filamin A, alpha (actin binding protein 280)
apolipoprotein C-II/apolipoprotein C-IV
ATP-binding cassette, sub-family A (ABC1), member 1
glutamate decarboxylase 2 (pancreatic islets and brain, 65 kDa)
low density lipoprotein receptor adaptor protein 1
AE binding protein 1
transporter 1, ATP-binding cassette, sub-family B (MDR/TAP)
ubiquitin associated protein 2-like
major histocompatibility complex, class II, DR beta 4
thyrotropin-releasing hormone degrading enzyme
Transmembrane protein 92
serpin peptidase inhibitor, clade A (alpha-1 antiproteinase, antitrypsin), member 3
cyclin-dependent kinase inhibitor 1C (p57, Kip2)
glutathione S-transferase M5
secreted protein, acidic, cysteine-rich (osteonectin)
Selected genes down-regulated in severe AD
CD22 molecule/myelin associated glycoprotein
calcium-binding protein 1
calcium channel, voltage-dependent, gamma subunit 3
calcium/calmodulin-dependent protein kinase (CaM kinase) II beta
calcium/calmodulin-dependent protein kinase IG
capping protein (actin filament) muscle Z-line, beta
met proto-oncogene (hepatocyte growth factor receptor)
zinc finger protein 365
transferrin receptor (p90, CD71)
amyloid beta (A4) precursor-like protein 2
cytochrome P450, family 26, subfamily B, polypeptide 1
neurofilament, heavy polypeptide 200 kDa
neurofilament, light polypeptide 68 kDa
neurotrophic tyrosine kinase, receptor, type 2
serpin peptidase inhibitor, clade I (neuroserpin), member 1
oligodendrocyte lineage transcription factor 2
chondroitin sulfate proteoglycan 5 (neuroglycan C)
chromosome 1 open reading frame 115
chromosome 20 open reading frame 149
chromosome 9 open reading frame 16
heterogeneous nuclear ribonucleoprotein A3 pseudogene 1/heterogeneous nuclear ribonucleoprotein A3
SWI/SNF related, matrix associated, actin dependent regulator of chromatin, subfamily a, member 4
ATP-binding cassette, sub-family A (ABC1), member 2
ATPase, H+ transporting, lysosomal 16 kDa, V0 subunit c
ATPase type 13A2
breast carcinoma amplified sequence 1
calcium-binding protein 1
regulating synaptic membrane exocytosis 3
proprotein convertase subtilisin/kexin type 1
regulating synaptic membrane exocytosis 2
glutamate receptor, ionotropic, N-methyl D-aspartate 1
myelin basic protein
myelin-associated oligodendrocyte basic protein
phosphoinositide-binding protein PIP3-E
phospholipase D family, member 3
protein tyrosine phosphatase, receptor type, T
eukaryotic translation initiation factor 5A
ISG15 ubiquitin-like modifier
regulator of calcineurin 2
regulator of G-protein signaling 4
steroid-5-alpha-reductase, alpha polypeptide 1 (3-oxo-5 alpha-steroid delta 4-dehydrogenase alpha 1)
Significant genes found by ICA
Even though the immune system tends to work less effectively in older adults than in younger ones, the elderly are prone to neuroinflammation. In fact, even though recent studies have indicated that certain aspects of the inflammatory response may have therapeutic potential [22–24], neuroinflammation is commonly believed to be a culprit in AD pathogenesis. Associated with this robust inflammatory response is the extracellular deposition of amyloid β-protein (Aβ)  that together are the characteristic pathological features of AD. are To validate the strong link between neuroinflammation and AD, we found that many inflammation-related genes are highly expressed, such as AMIGO2, BTG1, CD24, CD44, CDC42EP4, IFITM1, IFITM2, IRF7, FI44L, IL4R, IRAK1, NFKBIA, as Table 1 shows.
B-cell translocation gene 1 (BTG1) is a member of the anti-proliferative gene family that regulates cell growth and differentiation. Anti-proliferative, BTG1 may participate in the activation-induced cell death of microglia by lowering the threshold for apoptosis; BTG1 increases the sensitivity of microglia to the apoptogenic action of the autocrine cytotoxic mediator .
CD24 is a cell adhesion molecule and a cell surface glycoprotein that is expressed on both immune cells and the cells of the CNS. Literature showed that CD24 is required for the induction of experimental autoimmune encephalomyelitis (EAE), an experimental model for the human disease multiple sclerosis (MS). The development of EAE requires CD24 expression on both T cells and non-T host cells in the CNS .
CD44 is a multifunctional cell surface glycoprotein that serves as a receptor for hyaluronic acid, collagen types I and VI, and mucosal vascular addressin. The localization of CD44 was investigated immunohistochemically in postmortem human brain tissue of control subjects and patients with AD. Morphological diversities of CD44 positive astrocytes were in the cerebral cortex of normal subjects and patients with AD. In the AD brain, the number of CD44 positive astrocytes increased dramatically. Therefore, CD44 may be an important adhesion molecule for these astrocytic processes [28, 29].
CD22 (in Table 2) is a regulatory molecule that prevents the over-activation of the immune system and the development of autoimmune diseases. Our results exhibited that CD22 is a down-expression that suggests overinflammation in AD.
Rho GTPases (Cdc42) are one of the targets in Aβ-induced neurodegeneration in AD pathology; they have a role in mediating changes in the actin cytoskeletal dynamics. The Rho family of small GTPases (Rho, Rac and Cdc42) are regulators of F-actin polymerization , acting as molecular switches by cycling between an inactive GDP-bound state and an active GTP-bound state. Rac1 and Cdc42 promote polymerization at the leading edge, orchestrating the formation of lamellipodia and membrane ruffles , as well as peripheral actin microspikes and filopodia [32, 33]. RhoA is an antagonist, promoting retraction of the leading edge and assembly of stress fibers .
Our ICA selected results exhibited NF-κB (NFKBIA) at a high expression in severe AD (see Table 1). NF-κB plays a key role in regulating the immune response to infection. Consistent with this role, incorrect regulation of NF-κB has been linked to cancer, inflammatory and autoimmune diseases, septic shock, viral infection, and improper immune development. NF-κB has also been implicated in processes of synaptic plasticity and memory. The NF-κB activation provides the potential link between inflammation and hyperplasia.
Table 1 also shows many genes related to metal protein were up-regulated in severe AD including CAMK2B, CALM1, CAPZA2, CHGB, LOC728320/LTF, MPPE1, MT1F, MT1M, SCGN, ZIC1, ZBTB20, ZNF500, ZNF580, ZNF652, ZNF710, SLC24A3 and SLC7A11. Literature showed that the level of metal ion metabolism is closely associated with AD. For example, changes in Ca2+ homeostasis, as occurring after Aβ addition, may influence several physiological responses contributing to neuronal imbalance . CaMKII is a holoenzyme composed of 12 monomers, primarily α and β subunits in neurons. Autophosphorylation of CaMKIIα at Thr286 is required for normal spatial memory and place-cell representation, presumably through the triggering of its calcium-independent kinase activity . Ca2+ influx through the N-methyl-D-aspartate (NMDA) type glutamate receptor leads to activation and postsynaptic accumulation of Ca2+/calmodulin-dependent protein kinase II. NR1 and NR2B subunits of the NMDA receptor serve as high-affinity Ca2+/calmodulin-dependent protein kinase II docking sites in dendritic spines on autophosphorylation of Ca2+/calmodulin-dependent protein kinase II. Research [37, 38] showed a reduction of NR1 and phosphorylated Ca2+/calmodulin-dependent protein kinase II levels in the frontal cortex and hippocampus of AD brains. On the other hand, Ca2+ conveyed proteins CABP1, CACNG3, CAMK2B, CAMK1G, CAPZB (in Table 2) that were at low expressions in severe AD. Some primary neuron-specific transcriptional regulators that may be involved in mediating early neural development are also zinc finger-based.
Brown et al. found the level of neurofilament gene expression seems to directly control axonal diameter that in turn controls how fast electrical signals travel down the axon . Our ICA selected genes (in Table 2): APLP2, CYP26B1, NEFH, NPY, NTRK2, SERPINI1, OLIG2 and NRSN2, showed that the neurofilament family is low in expression in severe AD symptoms presenting at the clinic.
To maintain cellular homeostasis, all cells must continually synthesize new proteins. Ribosomes (polyribosomes) are specialized complexes composed of nucleic acids and proteins that are responsible for mediating all protein synthesis. Impairments in protein synthesis occur in the earliest stages of AD. They occur in affected cortical regions but not the cerebellum, with impairments in protein synthesis apparently mediated by both alterations in ribosomal nucleic acids as well as the polyribosomal complex itself that suggests a novel role for alterations in protein synthesis as a potential mediator of AD pathogenesis . See Table 2, the ribosomal protein: CSPG5, C1orf115, C20orf149, C9orf16, and HNRPA2/HNRPA3P1 are down-regulated in severe AD.
The changes of the cytoskeleton protein expression leads to the formation of disease, with actin filament-based structures being identified as important players in the complex pathology of AD and related dementias. A direct interaction between Tau and actin has been shown in [41, 42]; actin may be a critical mediator of Tau-induced neurotoxicity in AD and related disorders. These kinds of abnormalities also showed in our ICA results for cytoskeleton protein. Some genes like COL21A1, CTBP1, CAPZA2 and FLNA were up-regulated (Table 1), whereas some genes like ACTB and SMARCA4 were down-regulated (Table 2).
APOE, which has three alleles: APOE ε2, APOE ε3 and APOE ε4, is a protein that helps to carry cholesterol and fat in the blood. APOE ε4 is regarded as the best known genetic risk factor for late-onset sporadic AD [43–47]. Aberrant cholesterol metabolism has been implicated in AD and other neurological disorders. Oxysterols and other cholesterol oxidation products are effective ligands of liver X activated receptor (LXR) nuclear receptors and major regulators of genes subserving cholesterol homeostasis. LXR receptors act as molecular sensors of cellular cholesterol concentrations and effectors of tissue cholesterol reduction. Following their interaction with oxysterols, activation of LXRs induce the expression of ATP-binding cassette, sub-family A member 1, and a pivotal modulator of cholesterol efflux. The relative solubility of oxysterols facilitate lipid flux among brain compartments and egress across the blood-brain barrier . The high expression levels of APOC2/APOC4, APOE and ABCA1 can be seen in Table 1.
In addition, ICA also found some significant genes of lipoprotein, binding protein, and membrane protein etc. were up-regulated in severe AD (Table 1), such as: GAD2, LDLRAP1, AEBP1, TAP1, UBAP2L, HLA-DRB4, TRHDE, TMEM92, SPARC and SERPINA3; and some significant genes were down-regulated, such as: CABP1, RIMS3, PCSK1, RIMS2, GRIN1, MBP, MOBP, PIP3-E, PLD3, PTPRT, EIF5A, ISG15, RCAN2, RGS4, SRD5A1 (Table 2). Especially, some oncogenes like ABCA2, ATP6V0C, ATP13A2, BCAS1 had low expression levels in severe AD (Table 2).
Significant genes found by PCA
To compare PCA with ICA, the PCA method for finding differentially expressed genes proposed by Jonnalagadda in 2008  was performed on the same AD microarray data. Firstly, we modeled the control microarray data (where the samples are the variables and the gene expression measurements are the observations) using PCA and represented the expression profile of each gene as a linear combination of the dominant principal components (PCs). Then, the severe AD microarray data were projected onto the developed PCA model, and the scores were extracted. The first 100 most varied genes between the scores obtained by control data and severe AD data were selected for further biological analysis.
PCA also extracted some significant genes in immunoreactions, metal protein, membrane protein, lipoprotein, neuropeptide, cytoskeleton protein, binding protein, ribosomal protein and phosphorylation-related protein. But PCA extracted fewer genes than ICA. In immunity-related protein, PCA found only two significant genes: BCL6 and CD24 had high expression in severe AD. In metal-related protein, PCA found many up-regulated genes of the metallothionein family like: MT1P2, MT1E, MT1F, MT1G, MT1H/MT1P2, MT1X, MT2A; and CAMK2A, CALM1 and zinc finger ZBTB20, LDHA, LOC643287/PTMA. GPRC5B was the only gene found as membrane protein. In the category of lipoprotein, APOE was extracted as an important gene. For neuropeptide, NGFRAP1 and PPIA were extracted. ATP1B1 is the only phosphorylation-related protein found. Many down-regulated genes of cytoskeleton protein were extracted by the PCA method, such as: B2M, COL5A2, CSRP1, COX6A1, MAP1A, SPARC, TUBA1B, TUBA1C, TUBB, TUBB2A and TUBB2C. And PCA found many ribosomal proteins: LOC653737/LOC728501/LOC729402/LOC731567/RPL21, RPL29, RPL30, LOC342994/LOC651249/LOC729536/RPL34, RPL35, RPL4, RPL9, RPS10, RPS11, that were all down-regulated in severe AD, except one gene RPL13 that was up-regulated.
Significant genes found by SVM-RFE
By comparing the weights of the support vectors in a sequential backward elimination manner, the Support Vector Machine Recursive Feature Elimination (SVM-RFE) method is widely used in microarray data analysis. In our experiments, to keep track of variation in gene expression associated with the development of AD, and hence, to biologically analyze the significant genes with the development of AD, the control data were treated as group 1, and the AD case data, at the first stage, were placed in group 2. With the use of SVM-RFE, by comparing the data in group 1 and group 2, the significant genes were identified; then the AD case data at the second stage (moderate) were treated as group 2, and, by comparing the gene expression data between group 1 and group 2, the significant genes were extracted. Finally group 2 consists of the AD case data at the third stage (severe), by comparing data in groups 1 and 2, the significant genes are profiled.
The SVM-RFE method found significant genes in immunoreactions such as CD44, CD74, CDC42EP4, CDK2AP1, MAL, PTMA, among which CD74 and MAL were not found by the ICA and PCA methods. Many metal metabolism-related proteins were also selected by the SVM-RFE method: MT1F, MT1H/MT1P2, MT1M, MT1X, VEZF1, ZBTB20, ZNF91, ZDHHC11, ZHX3. APOC1, USP34 and SPARC were extracted in lipoprotein, neuropeptide and secreted protein, respectively. In cytoskeleton protein, COL21A1, FGFR3, ITGB4, TPPP3, GSN, GFAP, MFAP3 were found up-regulated in severe AD. In ribosomal protein, the SVM-RFE method extracted many high expression genes like: RPL10, RPL13, RPL13A, RPL5, RPS4X, LOC387867, and two low expression genes: RNASE1 and RPL4. In the category of membrane protein, TMEM123 and LAMP2 were extracted as important genes. In phosphorylation-related protein, PIP4K2A, PDE4C, PEA15, PTPN11, PTPRK, ATP8B1, ANP32B, ABCA1, CNP were found up-regulated in severe AD. The SVM-RFE method also selected some significant oncogenes, TPT1, GLTSCR2, GUSBP1, GDF1/LASS1 that were highly expressed, as opposed to MCAM that was found to have low expression levels in severe AD.
In summary, to the best of our knowledge, this work is the first attempt to explore the power of the ICA on analyzing AD-related microarray gene expression data. By validating and identifying known and novel genes in AD-related pathogenesis, it confirms the added value of ICA over PCA and the SVM-RFE methods Our results further indicate that ICA can give researchers the ability to extract potentially disease-related genes from microarray gene expression data, and thus to delineate relevant molecular pathways of disease pathogenesis. Hence, ICA can help to elucidate the molecular taxonomy of AD and enable better experimental design to further validate and identify potential biomarkers and therapeutic targets of AD.
And it can also be rewritten in the vector format as:
X= AS (2)
In some documents, m × n matrix Xwas used to denote m genes under n samples. Then the transform, XT, was used in the ICA model: XT = AS. So, XT here denoted the same n × m matrix in the ICA model.
The gene expression data provided by microarray technology is considered a linear combination of some independent components having specific biological interpretations. Lee and Batzoglou , and Schachtner et al.  gave detailed analyses for matrix Sand A. The n-th row matrix Acontained the weights with which the expression levels of the m genes contribute to the n-th observed expression profile. Hence the assignment for the observed expression profiles with different classes is valid for the rows of A. Each column of Acan be associated with one specific expression mode. For an example of two classes, suppose one of the independent expression modes s n is characteristic of a putative cellular gene regulation process. It should contribute substantially to one of the class experiments whereas its contribution to another class experiments should be less, or vice versa. Since the n-th column of Acontains the weights with which s n contributes to all observations, this column should show large or small entries according to the class labels. After such characteristically latent variables have been obtained, the corresponding elementary modes can be identified to yield useful information for classification. Also, the distribution of gene expression levels generally features a small number of significantly overexpressed or underexpressed genes that form very biologically coherent groups and may be interpreted in terms of regulatory pathways [3–5, 10, 51].
To obtain Sand A, the demixing model can be expressed as
Y= WX (3)
WK would like to express her gratitude for the supports from the Research Foundation of Shanghai Municipal Education Commission (No. 06FZ012 and No.2008098), the National Natural Science Foundation of China (No. 60801060), and Radiology Department of Brigham and Women's Hospital (BWH). XH is supported by grants from the NIA/NIH (5R21AG028850), Alzheimer's Association (IIRG-07-60397), and the research funds from BWH Radiology Department. We thank Ms. Kimberly Lawson at BWH Radiology Department for her extremely helpful comments and editing of our manuscript.
- Saidi SA, Holland CM, Kreil DP, MacKay D, Charnock-Jones DS: Independent component analysis of microarray data in the study of endometrial cancer. Oncogene. 2004, 23 (39): 6677-6683.View ArticlePubMedGoogle Scholar
- Beer DG, Kardia SL, Huang CC, Giordano TJ, Levin AM, Misek DE, Lin L, Chen G, Gharib TG, Thomas DG, Lizyness ML, Kuick R, Hayasaka S, Taylor JM, Iannettoni MD, Orringer MB, Hanash S: Gene-expression profiles predict survival of patients with lung adenocarcinoma. Nat Med. 2002, 8: 816-824.PubMedGoogle Scholar
- Hori G, Inoue M, Nishimura S, Nakahara H: Blind gene classification based on ICA of microarray data. 3rd International Conference on Independent Component Analysis and Signal Separation. Proc ICA2001. 2001, San Diego, USA, 3: 332-336.Google Scholar
- Hori G, Inoue M, Nishimura S, Nakahara H: Blind gene classification – an application of a signal separation method. Genome Informatics Workshop, GIW2001. Tokyo, Japan. 2001, 255-256.Google Scholar
- Liebermeister W: Linear modes of gene expression determined by independent component analysis. Bioinformatics. 2002, 18 (1): 51-60.View ArticlePubMedGoogle Scholar
- Liao XJ, Dasgupta N, Lin SM, Carin L: ICA and PLS modeling for functional analysis and drug sensitivity for DNA microarray signals. IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP. 2002, IV: 3880-3883.Google Scholar
- Suri RE: Application of independent component analysis to microarray data. International Conference on Integration of Knowledge Intensive Multi-Agent Systems. 2003, 375-378.Google Scholar
- Lu XG, Lin YP, Yue W, Wang HJ, Zhou SW: ICA based supervised gene classification of Microarray data in yeast functional genome. Eighth International Conference on High-Performance Computing in Asia-Pacific Region, Beijing, China. 2005, 633-638.Google Scholar
- Martoglio AM, Miskin JW, Smith SK, Mackay DJC: A decomposition model to track gene expression signatures: preview on observer-independent classification of ovarian cancer. Bioinformatics. 2002, 18 (12): 1617-1624.View ArticlePubMedGoogle Scholar
- Chinappetta P, Roubaud MC, Torrésani B: Blind source separation and the analysis of microarray data. Journal of Computational Biology. 2004, 11 (6): 1090-1109.View ArticleGoogle Scholar
- Berger JA, Mitra SK, Edgren H: Studying DNA microarray data using independent component analysis. First International Symposium on Control, Communications and Signal Processing. 2004, 747-750.Google Scholar
- Journée M, Teschendorff AE, Absil PA, Tavaré S, Sepulchre R: Geometric optimization methods for the analysis of gene expression data. BMC Plant Biol. 2006, 14: 6-27.Google Scholar
- Teschendorff AE, Journee M, Absil PA, Sepulchre R, Caldas C: Elucidating the altered transcriptional programs in breast cancer using independent component analysis. PLoS Comput Biol. 2007, 3 (8): e161-PubMed CentralView ArticlePubMedGoogle Scholar
- Saidi SA, Holland CM, Kreil DP, MacKay D, Charnock-Jones DS: Independent component analysis of microarray data in the study of endometrial cancer. Oncogene. 2004, 23 (39): 6677-6683.View ArticlePubMedGoogle Scholar
- Zhu L, Tang C: Microarray sample clustering using independent component analysis. Proceedings of the 2006 IEEE/SMC International Conference on System of Systems Engineering, Los Angeles, CA, USA. 2006, 112-117.Google Scholar
- Zhang XW, Yap YL, Wei D, Chen F, Danchin A: Molecular diagnosis of human cancer type by gene expression profiles and independent component analysis. European journal of Huaman Genetics. 2005, 13: 1303-1311.View ArticleGoogle Scholar
- Frigyesi A, Veerla S, Lindgren D, Hoglund M: Independent component analysis reveals new and biologically significant structures in micro array data. BMC Bioinformatics. 2006, 7: 290-301.PubMed CentralView ArticlePubMedGoogle Scholar
- Blalock EM, Geddes JW, Chen KC, Porter NM, Markesbery WR, Landfield PW: Incipient Alzheimer's disease: microarray correlation analyses reveal major transcriptional and tumor suppressor responses. PNAS. 2004, 101 (7): 2173-2178.PubMed CentralView ArticlePubMedGoogle Scholar
- Blalock EM, Chen KC, Sharrow K, Herman JP, Porter NM, Foster TC, Landfield PW: Gene microarrays in hippocampal aging: statistical profiling identifies novel processes correlated with cognitive impairment. J Neurosci. 2003, 23 (9): 3807-19.PubMedGoogle Scholar
- Hyvärinen A, Oja E: A fast fixed-point algorithm for independent component analysis. Neural Computation. 1997, 9 (7): 1483-1492.View ArticleGoogle Scholar
- Himberg J, Hyvärinen A, Esposito F: Validating the independent components of neuroimaging time-series via clustering and visualization. NeuroImage. 2004, 22: 1214-1222.View ArticlePubMedGoogle Scholar
- Stacie CW, Bruce AY: Inflammation and Alzheimer disease: The good, the bad, and the ugly. Nature Medicine. 2001, 7: 527-528.View ArticleGoogle Scholar
- Iwai K, Hirata K, Ishida T, Takeuchi S, Hirase T, Rikitake Y, Kojima Y, Inoue N, Kawashima S, Yokoyama M: An anti-proliferative gene BTG1 regulates angiogenesis in vitro. Biochem Biophys Res Commun. 2004, 316 (3): 628-635.View ArticlePubMedGoogle Scholar
- Meyer-Luehmann M, Spires-Jones TL, Prada C, Garcia-Alloza M: Rapid appearance and local toxicity of amyloid-beta plaques in a mouse model of Alzheimer's disease. Nature. 2008, 451: 720-725.PubMed CentralView ArticlePubMedGoogle Scholar
- Tony WC, Carol L, Fengrong Y, Gui-Qiu Y, Michelle R, Lisa MC, Eliezer M, Lennart M: TGF-β1 promotes microglial amyloid-β clearance and reduces plaque burden in transgenic mice. Nature Medicine. 2001, 7: 612-618.View ArticleGoogle Scholar
- Akiyama H, Tooyama I, Kawamata T, Ikeda K, McGeer PL: Morphological diversities of CD44 positive astrocytes in the cerebral cortex of normal subjects and patients with Alzheimer's disease. Brain Res. 1993, 632 (1–2): 249-259.View ArticlePubMedGoogle Scholar
- Liu JQ, Carl JW, Joshi PS, RayChaudhury A, Pu XA, Shi FD, Bai XF: CD24 on the resident cells of the central nervous system enhances experimental autoimmune encephalomyelitis. J Immunol. 2007, 178 (10): 6227-6235.View ArticlePubMedGoogle Scholar
- Vogel H, Butcher EC, Picker LJ: H-CAM expression in the human nervous system: evidence for a role in diverse glial interactions. J Neurocytol. 1992, 21 (5): 363-373.View ArticlePubMedGoogle Scholar
- Kalaria RN, Pax AB: Increased collagen content of cerebral microvessels in Alzheimer's disease. Brain Res. 1995, 705 (1–2): 349-52.View ArticlePubMedGoogle Scholar
- Bishop AL, Hall A: Rho GTPases and their effector proteins. Biochem. 2000, 348: 241-255.View ArticleGoogle Scholar
- Ridley AJ, Paterson HF, Johnston CL, Diekmann D, Hall A: The small GTP-binding protein rac regulates growth factor-induced membrane ruffling. Cell. 1992, 70 (3): 401-410.View ArticlePubMedGoogle Scholar
- Kozma R, Ahmed S, Best A, Lim L: The Ras-related protein Cdc42Hs and bradykinin promote formation of peripheral actin microspikes and filopodia in Swiss 3T3 fibroblasts. Mol Cell Biol. 1995, 15: 1942-1952.PubMed CentralView ArticlePubMedGoogle Scholar
- Nobes CD, Hall A: Rho, rac, and cdc42 GTPases regulate the assembly of multimolecular focal complexes associated with actin stress fibers, lamellipodia, and filopodia. Cell. 1995, 81 (1): 53-62.View ArticlePubMedGoogle Scholar
- Schmitz AA, Govek EE, Bottner B, Van AL: Rho GTPases: signaling, migration, and invasion. Exp Cell Res. 2000, 261: 1-12.View ArticlePubMedGoogle Scholar
- Soderling TR, Chang BH, Brickey DA: Cellular signaling through multifunctional Ca2+/calmodulin-dependent protein kinase II. J Biol Chem. 2001, 276: 3719-3722.View ArticlePubMedGoogle Scholar
- Giese KP, Fedorov NB, Filipkowski RK, Silva AJ: Autophosphorylation at Thr286 of the alpha calcium-calmodulin kinase II in LTP and learning. Science. 1998, 279: 870-873.View ArticlePubMedGoogle Scholar
- Amada N, Aihara K, Ravid R, Horie M: Reduction of NR1 and phosphorylated Ca2+/calmodulin-dependent protein kinase II levels in Alzheimer's disease. Neuroreport. 2005, 16 (16): 1809-1813.View ArticlePubMedGoogle Scholar
- Cheung KH, Shineman D, Müller M, Cárdenas C, Mei L, Yang J, Tomita T, Iwatsubo T, Lee VM, Foskett JK: Mechanism of Ca2 Disruption in Alzheimer's Disease by Presenilin Regulation of InsP3 Receptor Channel Gating. Neuron. 2008, 58: 871-883.PubMed CentralView ArticlePubMedGoogle Scholar
- Brown A, Wang L, Jung P: Stochastic simulation of neurofilament transport in axons: the "stop-and-go" hypothesis. Mol Biol Cell. 2005, 16 (9): 4243-55.PubMed CentralView ArticlePubMedGoogle Scholar
- Ding Q, Markesbery WR, Chen Q, Li F, Keller JN: Ribosome dysfunction is an early event in Alzheimer's disease. J Neurosci. 2005, 25 (40): 9171-9175.View ArticlePubMedGoogle Scholar
- Gallo G: Tau is actin up in Alzheimer's disease. Nat Cell Biol. 2007, 9 (2): 133-134.View ArticlePubMedGoogle Scholar
- Fulga TA, Elson-Schwab I, Khurana V, Steinhilb ML, Spires TL, Hyman BT, Feany MB: Abnormal bundling and accumulation of F-actin mediates tau-induced neuronal degeneration in vivo. Nat Cell Biol. 2007, 9 (2): 139-48.View ArticlePubMedGoogle Scholar
- Cosentino S, Scarmeas N, Helzner E, Glymour MM, Brandt J, Albert M, Blacker D, Stern Y: APOE epsilon 4 allele predicts faster cognitive decline in mild Alzheimer disease. Neurology. 2008, 70 (19 Pt 2): 1842-1849.PubMed CentralView ArticlePubMedGoogle Scholar
- Zhong N, Scearce-Levie K, Ramaswamy G, Weisgraber KH: Apolipoprotein E4 domain interaction: synaptic and cognitive deficits in mice. Alzheimers Dement. 2008, 4 (3): 179-92.PubMed CentralView ArticlePubMedGoogle Scholar
- Percy M, Moalem S, Garcia A, Somerville MJ, Hicks M, Andrews D, Azad A: Involvement of ApoE E4 and H63D in sporadic Alzheimer's disease in a folate-supplemented Ontario population. J Alzheimers Dis. 2008, 14 (1): 69-84.PubMedGoogle Scholar
- Related A, Haasl RJ, Ahmadi MR, Meethal SV, Gleason CE, Johnson SC, Asthana S, Bowen RL, Atwood CS: A luteinizing hormone receptor intronic variant is significantly associated with decreased risk of Alzheimer's disease in males carrying an apolipoprotein E epsilon4 allele. BMC Med Genet. 2008, 9: 37-49.Google Scholar
- Bekris LM, Millard SP, Galloway NM, Vuletic S, Albers JJ, Li G, Galasko DR, DeCarli C, Farlow MR, Clark CM, Quinn JF, Kaye JA: Multiple SNPs within and surrounding the apolipoprotein E gene influence cerebrospinal fluid apolipoprotein E protein levels. J Alzheimers Dis. 2008, 13 (3): 255-266.PubMed CentralPubMedGoogle Scholar
- Jacob V, Hyman MS: Oxysterols, cholesterol homeostasis, and Alzheimer disease. Journal of Neurochemistry. 2007, 102: 1727-1737.View ArticleGoogle Scholar
- Jonnalagadda S, Srinivasan R: Principal components analysis based methodology to identify differentially expressed genes in time-course microarray data. BMC Bioinformatics. 2008, 9: 267-282.PubMed CentralView ArticlePubMedGoogle Scholar
- Lee SI, Batzoglou S: Application of independent component analysis to microarrays. Genome Biology. 2003, 4 (11): R76.1-R76.21.View ArticleGoogle Scholar
- Schachtner R, Lutter D, Theis FJ, Lang EW, Schmitz G, Tomé AM, Gómez Vilda P: How to extract marker genes from microarray data sets. Proceedings of the 29th Annual International conference of the IEEE EMBS. Cité Internatinale, Lyon, France. 2007, 4215-4218.Google Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.