Skip Navigation



DNA Research Advance Access published online on October 3, 2009

DNA Research, doi:10.1093/dnares/dsp019
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow Supplementary Data
Right arrowOA All Versions of this Article:
16/6/345    most recent
dsp019v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Request Permissions
Google Scholar
Right arrow Articles by Keerthikumar, S.
Right arrow Articles by Pandey, A.
PubMed
Right arrow PubMed Citation
Right arrow Articles by Keerthikumar, S.
Right arrow Articles by Pandey, A.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2009. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.5/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

Prediction of Candidate Primary Immunodeficiency Disease Genes Using a Support Vector Machine Learning Approach

Shivakumar Keerthikumar1,2,3, Sahely Bhadra4, Kumaran Kandasamy1,2,5, Rajesh Raju1,2,3, Y. L. Ramachandra2, Chiranjib Bhattacharyya4, Kohsuke Imai8, Osamu Ohara6,7, Sujatha Mohan1,3 and Akhilesh Pandey1,5,*

1 Institute of Bioinformatics, International Technology Park, Bangalore 560 066, India
2 Department of Biotechnology and Bioinformatics, Kuvempu University, Jnanasahyadri, Shimoga 577 451, India
3 Research Unit for Immunoinformatics, Research Center for Allergy and Immunology, RIKEN Yokohama Institute, Kanagawa 230-0045, Japan
4 Department of Computer Science and Automation, Indian Institute of Science, Bangalore 560 012, India
5 McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University School of Medicine, 733 N. Broadway, BRB Room 527, Baltimore, MD 21205, USA
6 Laboratory for Immunogenomics, Research Center for Allergy and Immunology, RIKEN, Yokohama Institute, Kanagawa 230-0045, Japan
7 Department of Human Genome Technology, Kazusa DNA Research Institute, 2-6-7 Kazusa-Kamatari, Kisarazu, Chiba 292-0818, Japan
8 Department of Medical Informatics, National Defense Medical College, Saitama 359-8513, Japan

Received 23 July 2009; accepted 5 September 2009.


    Abstract
 Top
 Abstract
 1. Introduction
 2. Materials and methods
 3. Results and discussion
 Supplementary Data
 Funding
 Acknowledgements
 References
 
Screening and early identification of primary immunodeficiency disease (PID) genes is a major challenge for physicians. Many resources have catalogued molecular alterations in known PID genes along with their associated clinical and immunological phenotypes. However, these resources do not assist in identifying candidate PID genes. We have recently developed a platform designated Resource of Asian PDIs, which hosts information pertaining to molecular alterations, protein–protein interaction networks, mouse studies and microarray gene expression profiling of all known PID genes. Using this resource as a discovery tool, we describe the development of an algorithm for prediction of candidate PID genes. Using a support vector machine learning approach, we have predicted 1442 candidate PID genes using 69 binary features of 148 known PID genes and 3162 non-PID genes as a training data set. The power of this approach is illustrated by the fact that six of the predicted genes have recently been experimentally confirmed to be PID genes. The remaining genes in this predicted data set represent attractive candidates for testing in patients where the etiology cannot be ascribed to any of the known PID genes.

Key words: RAPID; SVM; HPRD; Human Proteinpedia; NetPath


    1. Introduction
 Top
 Abstract
 1. Introduction
 2. Materials and methods
 3. Results and discussion
 Supplementary Data
 Funding
 Acknowledgements
 References
 
Primary immunodeficiency diseases (PIDs) are a genetically heterogeneous group of disorders that affect distinct components of the innate and adaptive immune system, such as neutrophils, macrophages, dendritic cells, natural killer cells and T and B lymphocytes. The study of these diseases has provided essential insights into the functioning of our immune system. More than 120 distinct genes have been identified, whose abnormalities account for more than 150 distinct forms of PID.1Go PIDs are challenging for both researchers and clinicians because they represent natural models of immunopathology, which can usually be studied effectively only in animal models, and manifest with a wide range of clinical symptoms ranging from susceptibility to infections and allergies to autoimmune and inflammatory diseases. The genetic defects that cause PIDs can affect the expression and function of proteins involved in a range of biological processes, such as immune development, effector-cell functions, signaling cascades and maintenance of immune homeostasis.2Go

Because genes and proteins rarely work in isolation, genes that directly or functionally interact with known PID genes could also represent additional PID genes. We have recently developed a database of PID genes designated ‘Resource of Asian PDIs (RAPID)’, which contains information pertaining to genes and proteins involved in PDIs along with other relevant information about protein–protein interactions, mouse knockout studies and microarray gene expression profiles in various cells and organs of the immune system. These significant features of PID genes, including their involvement in immune signaling pathways, were used as input binary features for the prediction of additional candidate PID genes using a support vector machine (SVM) learning approach.

SVM is a powerful machine learning technique widely used in the computational biology such as microarray data analysis,3Go–8Go protein secondary structure prediction,9Go prediction of human signal peptide cleavage sites,10Go translational initiation site recognition in DNA,11Go protein fold recognition,12Go,13Go prediction of protein–protein interactions,14Go prediction of protein sub-cellular localization,15Go–18Go and peptide identification from mass-spectrometry derived data.19Go

SVM is a learning algorithm that can be used to generate a classifier from a set of positively and negatively labeled training data sets.20Go SVM learns the classifier by mapping the input training samples into a possibly high-dimensional feature space and seeking a hyperplane in this space, which separates the two types of examples with the largest possible margin, i.e. distance to the nearest points. If the training set is not linearly separable, SVM finds a hyperplane, which optimizes a trade-off between good classification and large margin.20Go

For predicting a classifier between PID and non-PID genes, we have solved the above problem and obtained a linear classifier (Fig. 1). To prove generalization of the predicted classifier, we have reported leave-one-out (LOO) error for the training data set. In this approach, we have used all the known PID genes that have been described in the literature as a positive data set. The gene list for negative data sets was selected from mouse genomic informatics (MGI) database based on the criterion that mutations in mice do not result in either immune or hematopoietic system phenotypes. We trained SVM with 69 features (Supplementary Table S1) for both PID genes (positive data set) and genes that were not reported to be associated with PIDs (negative data set). The trained SVM was then used to predict candidate PID genes by testing all human genes (except those used in the training data sets) as test data set.


Figure 1
View larger version (20K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Figure 1. A schematic of SVM training strategy.

 

    2. Materials and methods
 Top
 Abstract
 1. Introduction
 2. Materials and methods
 3. Results and discussion
 Supplementary Data
 Funding
 Acknowledgements
 References
 
2.1. Initial platform
RAPIDs, which is available as a worldwide web resource at http://rapid.rcai.riken.jp/21Go was used as a source of information about PID genes. RAPID hosts information on sequences and expression at the mRNA and protein levels of genes reported to be involved in PID patients. The main objective of this database was to provide detailed information pertaining to genes and proteins involved in PIDs along with other relevant information about protein–protein interactions, mouse knockout studies and microarray gene expression profiles in various organs and cells of the immune system.

2.2. Features used for training the data sets
The PDIs are characterized by essential defects in the functions of the immune system, leading to increased susceptibility to infections. Although rare, these disorders cover a wide spectrum of defects, including antibody deficiencies, cellular immune deficiencies, combined immune deficiencies, phagocytic defects, complement and other innate immunity defects. On the basis of these observations for all the known PID genes, we selected 69 features (Supplementary Table S1) which not only play an important role in the development, maintenance and normal functioning of immune/hematopoietic systems but also in understanding molecular pathophysiology of PID disease causing genes. These features can be broadly classified as features for signaling pathways from NetPath and KEGG22Go–24Go database, microarray gene expression profile from RefDIC25Go database, site of expression from HPRD26Go and Human Proteinpedia,27Go immune/hematopoietic phenotypes from MGI28Go,29Go and interaction with PID feature from HPRD.

2.3. Data sets
To train the SVM, two types of data sets were generated—the positive data set consists of all the known PID genes, whereas the negative data set contained genes where no immune/hematopoietic system abnormalities were described due to mouse knockouts, knockins or spontaneous mutations reported for the mouse orthologs in the MGI database.30Go On the basis of these criteria, 148 PID genes were in the positive data set and 3162 genes were in the negative data set. Test data set contains 36 677 genes encoded by the human genome. Genes involved in both the training and test data sets were assigned a binary score of ‘1’ and ‘0,’ respectively, based on their presence or absence in a particular feature. The trained SVM was used to classify PID or non-PID genes from an unlabeled test data set which consists of all human genes (Fig. 2).


Figure 2
View larger version (19K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Figure 2. A schematic of the algorithm for prediction of candidate PID genes.

 
2.4. SVM implementation
We used SVMlight (http://svmlight.joachims.org/), an implementation of Support Vector Machines in C, and also used customized functions written in MATLAB (http://www.mathtools.net/MATLAB/) for the calculation of confidence score for each predicted candidate PID gene. Absolute score also known as confidence score can be defined as Formula where Formula represents the separating hyperplane calculated by SVM. The score indicates how far that particular gene from the positive side of the hyperplane. In other words, higher the score more likely that a particular gene is a candidate PID gene. Using this approach, 1442 candidate PID genes were predicted which falls on the positive side of the hyperplane.

2.5. LOO error
LOO error measurement involves removing one gene from the training set, training the SVM on the remaining genes and then predicting the class label of that gene that was left out. This process is repeated until all the genes are left out exactly once. If the gene was classified correctly, the error was reported as zero, else the error was reported as one. This process was repeated by leaving out each gene once and the LOO error of the data set represent the average of individual errors.


    3. Results and discussion
 Top
 Abstract
 1. Introduction
 2. Materials and methods
 3. Results and discussion
 Supplementary Data
 Funding
 Acknowledgements
 References
 
Over 1500 Mendelian disorders whose molecular basis is unknown are catalogued in the online Mendelian inheritance in man (OMIM) database.31Go Most of disease-gene identification efforts involve either linkage analysis or association studies.32Go,33Go Recently, a number of in silico approaches to identify candidate disease genes have been developed that use available information reported from various studies such as functional annotation, gene expression profiles, annotated sequence features, protein–protein interactions and pathway information.34Go–39Go Several machine learning approaches have also been employed to identify important genes for disease classification. SVM approach is generally preferred owing to its superior performance.40Go In most instances, SVM is a powerful tool in dealing with high-dimensional low sample size data sets, which also performs well in various biological analyses including text categorization, evaluating microarray expression data and inferring functional annotation from protein sequence and structure data.3Go,4Go,41Go,42Go In this study, we trained an SVM with 69 features for both positive (all known PID genes) and negative (genes with no immune/hematopoietic systems affected due to mutations from MGI) gene data sets.

As the number of genes in the positive data set is small, the LOO error was calculated for showing generalization of the algorithm. LOO error is explained in detail under the Materials and methods section. For this, we used a data set containing 148 PID genes from positive data sets along with 148 genes that were randomly selected from the negative data set. This process was repeated and from 60 such data sets, the LOO error was calculated. The average LOO error reported over 60 data sets was ~8%. The LOO error reported by leaving out only the PID (positive) genes one by one (where training set contains same setting of 296 data points) was ~15%.

3.1. Sensitivity and specificity
The sensitivity and specificity of the data sets was 0.85 and 0.98, respectively. On the basis of these results, we conclude that the number of genes falsely predicted to be PID genes by the trained classifier is ~2%. We believe that availability of comprehensive and accurate biological data is a limitation that restricts the prediction accuracy and performance of this algorithm. As more data accumulates about the human genome and proteome, we expect the performance of this algorithm to improve further in the future. The complete list of predicted candidate genes is provided in Supplementary Table S2 and also available at the RAPID website http://rapid.rcai.riken.jp/. All 69 features of the predicted candidate PID genes can also be downloaded from the RAPID website.

3.2. Evaluation studies
We were able to evaluate our predictions in a limited fashion because a few studies have been published describing novel PID genes that were not included in our original list of PID genes. These experimental studies have confirmed six of the genes in our predicted list of PID genes as true PID genes. These are myeloid differentiation factor-88 (MYD88), catalytic subunit of DNA dependent serine/threonine protein kinase (PRKDC), glucose-6-phosphatase, catalytic subunit 3 (G6PC3),43Go–45Go IL2-inducible T-cell kinase (ITK), coronin, actin binding protein 1A (CORO1A) and Interleukin 1 receptor antagonist (IL1RN).46Go–49Go MyD88 is a key downstream adaptor protein in IL1 receptor complex and toll-like receptors signaling pathways involved in inflammatory response and host defense. In addition, MyD88 is also involved in tumorigenesis in models of hepatocarcinoma and familial associated polyposis; negative regulation of TLR3 signaling and in PKC epsilon activation.50Go Patients with MyD88 deficiency are reported to be susceptible to pyogenic bacterial infections including invasive pneumococcal disease.45Go Defect in PRKDC has been reported for the first time in a radiosensitive T-B-SCID patient that results in inhibition of Artemis activation and non-homologous end-joining.44Go A report of mutations in G6PC3 gene has been observed among patients with severe congenital neutropenia syndrome and also shown to be susceptible to increased apoptosis that leads to disturbances in cardiac or urogenital development.43Go A novel PDI, IL-2 inducible T-cell kinase (ITK) deficiency has been observed due to fatal immune dysregulation followed by EBV infection and identified homozygous mutation in the SH2 domain of ITK gene that resulted in protein destabilization and absence of NKT cells.47Go A patient with T cell-deficient, B cell-sufficient and NK cell-sufficient severe combined immunodeficiency has been identified with mutation in CORO1A gene along with reduced T-cell function that was earlier demonstrated in knock-out mice of coro1a gene with similar phenotypes.49Go Deficiency of the IL1-receptor antagonist, an autosomal recessive autoinflammatory disease, has been reported for the first time in children presented with clinical phenotypes of multifocal osteomyelitis, periostitis, pustulosis, thrombosis and respiratory insufficiency due to the homozygous deletion of the IL1RN gene.46Go,48Go Further, functional analysis of these mutants confirmed diminished or lack of mRNA and protein expressions leading to cytokine abnormalities.

There are two recent independent reports51Go,52Go on the identification and prioritization of candidate disease genes in general as well as specific to primary immunodeficiencies by integrating functional annotations from gene ontology and compilation of protein interaction network data sets from BIND,53Go BioGRID54Go and HPRD.26Go In the latter studies, 24 candidate genes were reported that are likely to be involved in PID have been identified using these parameters, out of which, over 80% of these genes are already listed as candidates in our SVM analysis, thereby, paving the way for successful implementation of this approach in the future.

We have also summarized reports of genome-wide association studies and other related studies for newly identified candidate PID genes and the associated immunological disorder (Table 1). Because the candidate PID gene list is still large, this approach of integrating data from high-throughput studies would allow further prioritization of genes for confirmation in patients with PID where the exact gene is not yet identified. We hope that such integrated approaches should assist PID physicians and researchers to gain insights into the pathophysiology of these diseases at a faster pace, which could be translated to improve the diagnosis and/or treatment of PIDs.


View this table:
[in this window]
[in a new window]

 
Table 1. A list of predicted PID genes whose association with immunological disorders has been reported recently

 
3.3. Availability
The list of predicted PID genes is available as Supplementary Table S2 and at the RAPID website http://rapid.rcai.riken.jp/.


    Supplementary Data
 Top
 Abstract
 1. Introduction
 2. Materials and methods
 3. Results and discussion
 Supplementary Data
 Funding
 Acknowledgements
 References
 
Supplementary data are available at www.dnaresearch.oxfordjournals.org.


    Funding
 Top
 Abstract
 1. Introduction
 2. Materials and methods
 3. Results and discussion
 Supplementary Data
 Funding
 Acknowledgements
 References
 
We thank the Department of Biotechnology of the Government of India for research support to the Institute of Bioinformatics, Bangalore.


    Acknowledgements
 Top
 Abstract
 1. Introduction
 2. Materials and methods
 3. Results and discussion
 Supplementary Data
 Funding
 Acknowledgements
 References
 
The authors thank Shigeaki Nonoyama, Hirokazu Kanegane, Toshio Miyawaki, Koichi Oshima and Atsushi Hijikata for their valuable input and suggestions.


    Footnotes
 
* To whom correspondence should be addressed. Tel. +1 410-502-6662. Fax. +1 410-502-7544. E-mail: pandey{at}jhmi.e.du

Edited by Minoru Ko


    References
 Top
 Abstract
 1. Introduction
 2. Materials and methods
 3. Results and discussion
 Supplementary Data
 Funding
 Acknowledgements
 References
 

  1. Geha R.S., Notarangelo L.D., Casanova J.L., et al. Primary immunodeficiency diseases: an update from the International Union of Immunological Societies Primary Immunodeficiency Diseases Classification Committee. J. Allergy Clin. Immunol. (2007) 120:776–94.[CrossRef][Web of Science][Medline]
  2. Marodi L., Notarangelo L.D. Immunological and genetic bases of new primary immunodeficiencies. Nat. Rev. Immunol. (2007) 7:851–61.[CrossRef][Web of Science][Medline]
  3. Brown M.P., Grundy W.N., Lin D., et al. Knowledge-based analysis of microarray gene expression data by using support vector machines. Proc. Natl Acad. Sci. USA (2000) 97:262–7.[Abstract/Free Full Text]
  4. Furey T.S., Cristianini N., Duffy N., Bednarski D.W., Schummer M., Haussler D. Support vector machine classification and validation of cancer tissue samples using microarray expression data. Bioinformatics (2000) 16:906–14.[Abstract/Free Full Text]
  5. Pirooznia M., Yang J.Y., Yang M.Q., Deng Y. A comparative study of different machine learning methods on microarray gene expression data. BMC Genomics (2008) 9(Suppl_1):S13.
  6. Wang L., Zhu J., Zou H. Hybrid huberized support vector machines for microarray classification and gene selection. Bioinformatics (2008) 24:412–9.[Abstract/Free Full Text]
  7. Wang Y., Tetko I.V., Hall M.A., et al. Gene selection from microarray data for cancer classification—a machine learning approach. Comput. Biol. Chem. (2005) 29:37–46.[CrossRef][Web of Science][Medline]
  8. Yeang C.H., Ramaswamy S., Tamayo P., et al. Molecular classification of multiple tumor types. Bioinformatics (2001) 17(Suppl 1):S316–22.[Abstract]
  9. Hua S., Sun Z. A novel method of protein secondary structure prediction with high segment overlap measure: support vector machine approach. J. Mol. Biol. (2001) 308:397–407.[CrossRef][Web of Science][Medline]
  10. Jagla B., Schuchhardt J. Adaptive encoding neural networks for the recognition of human signal peptide cleavage sites. Bioinformatics (2000) 16:245–50.[Abstract/Free Full Text]
  11. Zien A., Ratsch G., Mika S., Scholkopf B., Lengauer T., Muller K.R. Engineering support vector machine kernels that recognize translation initiation sites. Bioinformatics (2000) 16:799–807.[Abstract/Free Full Text]
  12. Cai Y.D., Liu X.J., Xu X., Zhou G.P. Support vector machines for predicting protein structural class. BMC Bioinformatics (2001) 2:3.[CrossRef][Medline]
  13. Ding C.H., Dubchak I. Multi-class protein fold recognition using support vector machines and neural networks. Bioinformatics (2001) 17:349–58.[Abstract/Free Full Text]
  14. Bock J.R., Gough D.A. Predicting protein–protein interactions from primary structure. Bioinformatics (2001) 17:455–60.[Abstract/Free Full Text]
  15. Bhasin M., Raghava G.P. ESLpred: SVM-based method for subcellular localization of eukaryotic proteins using dipeptide composition and PSI-BLAST. Nucleic Acids Res. (2004) 32:W414–9.[Abstract/Free Full Text]
  16. Garg A., Bhasin M., Raghava G.P. Support vector machine-based method for subcellular localization of human proteins using amino acid compositions, their order, and similarity search. J. Biol. Chem. (2005) 280:14427–32.[Abstract/Free Full Text]
  17. Hua S., Sun Z. Support vector machine approach for protein subcellular localization prediction. Bioinformatics (2001) 17:721–8.[Abstract/Free Full Text]
  18. Shi J.Y., Zhang S.W., Pan Q., Cheng Y.M., Xie J. Prediction of protein subcellular localization by support vector machines using multi-scale energy and pseudo amino acid composition. Amino Acids (2007) 33:69–74.[CrossRef][Web of Science][Medline]
  19. Anderson D.C., Li W., Payan D.G., Noble W.S. A new algorithm for the evaluation of shotgun peptide sequencing in proteomics: support vector machine classification of peptide MS/MS spectra and SEQUEST scores. J. Proteome. Res. (2003) 2:137–46.[CrossRef][Web of Science][Medline]
  20. Park K.J., Gromiha M.M., Horton P., Suwa M. Discrimination of outer membrane proteins using support vector machines. Bioinformatics (2005) 21:4223–9.[Abstract/Free Full Text]
  21. Keerthikumar S., Raju R., Kandasamy K., et al. RAPID: resource of Asian primary immunodeficiency diseases. Nucleic Acids Res. (2009) 37:D863–7.[Abstract/Free Full Text]
  22. Kanehisa M., Araki M., Goto S., et al. KEGG for linking genomes to life and the environment. Nucleic Acids Res. (2008) 36:D480–4.[Abstract/Free Full Text]
  23. Kanehisa M., Goto S. KEGG: Kyoto encyclopedia of genes and genomes. Nucleic Acids Res. (2000) 28:27–30.[Abstract/Free Full Text]
  24. Kanehisa M., Goto S., Hattori M., et al. From genomics to chemical genomics: new developments in KEGG. Nucleic Acids Res. (2006) 34:D354–7.[Abstract/Free Full Text]
  25. Hijikata A., Kitamura H., Kimura Y., et al. Construction of an open-access database that integrates cross-reference information from the transcriptome and proteome of immune cells. Bioinformatics (2007) 23:2934–41.[Abstract/Free Full Text]
  26. Keshava Prasad T.S., Goel R., Kandasamy K., et al. Human protein reference database—2009 update. Nucleic Acids Res. (2009) 37:D767–72.[Abstract/Free Full Text]
  27. Kandasamy K., Keerthikumar S., Goel R., et al. Human Proteinpedia: a unified discovery resource for proteomics research. Nucleic Acids Res. (2009) 37:D773–81.[Abstract/Free Full Text]
  28. Blake J.A., Bult C.J., Eppig J.T., Kadin J.A., Richardson J.E. The mouse genome database genotypes:phenotypes. Nucleic Acids Res. (2009) 37:D712–9.[Abstract/Free Full Text]
  29. Bult C.J., Eppig J.T., Kadin J.A., Richardson J.E., Blake J.A. The mouse genome database (MGD): mouse biology and model systems. Nucleic Acids Res. (2008) 36:D724–8.[Abstract/Free Full Text]
  30. Eppig J.T., Blake J.A., Bult C.J., Kadin J.A., Richardson J.E. The mouse genome database (MGD): new features facilitating a model system. Nucleic Acids Res. (2007) 35:D630–7.[Abstract/Free Full Text]
  31. Amberger J., Bocchini C.A., Scott A.F., Hamosh A. McKusick's online Mendelian inheritance in man (OMIM). Nucleic Acids Res. (2009) 37:D793–6.[Abstract/Free Full Text]
  32. Botstein D., Risch N. Discovering genotypes underlying human phenotypes: past successes for Mendelian disease, future approaches for complex disease. Nat. Genet. (2003) 33(Suppl):228–37.[CrossRef][Web of Science][Medline]
  33. Glazier A.M., Nadeau J.H., Aitman T.J. Finding genes that underlie complex traits. Science (2002) 298:2345–9.[Abstract/Free Full Text]
  34. Freudenberg J., Propping P. A similarity-based method for genome-wide prediction of disease-relevant human genes. Bioinformatics (2002) 18(Suppl 2):S110–5.[Abstract]
  35. Huang D., Chow T.W. Identifying the biologically relevant gene categories based on gene expression and biological data: an example on prostate cancer. Bioinformatics (2007) 23:1503–10.[Abstract/Free Full Text]
  36. Kohler S., Bauer S., Horn D., Robinson P.N. Walking the interactome for prioritization of candidate disease genes. Am. J. Hum. Genet. (2008) 82:949–58.[CrossRef][Web of Science][Medline]
  37. Perez-Iratxeta C., Bork P., Andrade M.A. Association of genes to genetically inherited diseases using data mining. Nat. Genet. (2002) 31:316–9.[Web of Science][Medline]
  38. Segal E., Wang H., Koller D. Discovering molecular pathways from protein interaction and gene expression data. Bioinformatics (2003) 19(Suppl 1):i264–71.[Abstract]
  39. Wang K., Li M., Bucan M. Pathway-based approaches for analysis of genomewide association studies. Am. J. Hum. Genet. (2007) 81:1278–1283.[CrossRef][Web of Science]
  40. Zhang H.H., Ahn J., Lin X., Park C. Gene selection using support vector machines with non-convex penalty. Bioinformatics (2006) 22:88–95.[Abstract/Free Full Text]
  41. Lewis D.P., Jebara T., Noble W.S. Support vector machine learning from heterogeneous data: an empirical analysis using protein sequence and structure. Bioinformatics (2006) 22:2753–60.[Abstract/Free Full Text]
  42. Radivojac P., Peng K., Clark W.T., et al. An integrated approach to inferring gene-disease associations in humans. Proteins (2008) 72:1030–7.[CrossRef][Web of Science][Medline]
  43. Boztug K., Appaswamy G., Ashikov A., et al. A syndrome with congenital neutropenia and mutations in G6PC3. N. Engl. J. Med. (2009) 360:32–43.[Abstract/Free Full Text]
  44. van der Burg M., Ijspeert H., Verkaik N.S., et al. A DNA-PKcs mutation in a radiosensitive T-B- SCID patient inhibits Artemis activation and nonhomologous end-joining. J. Clin. Invest. (2009) 119:91–8.[Web of Science][Medline]
  45. von Bernuth H., Picard C., Jin Z., et al. Pyogenic bacterial infections in humans with MyD88 deficiency. Science (2008) 321:691–6.[Abstract/Free Full Text]
  46. Aksentijevich I., Masters S.L., Ferguson P.J., et al. An autoinflammatory disease with deficiency of the interleukin-1-receptor antagonist. N. Engl. J. Med. (2009) 360:2426–37.[Abstract/Free Full Text]
  47. Huck K., Feyen O., Niehues T., et al. Girls homozygous for an IL-2-inducible T cell kinase mutation that leads to protein deficiency develop fatal EBV-associated lymphoproliferation. J. Clin. Invest. (2009) 119:1350–8.[CrossRef][Web of Science][Medline]
  48. Reddy S., Jia S., Geoffrey R., et al. An autoinflammatory disease due to homozygous deletion of the IL1RN locus. N. Engl. J. Med. (2009) 360:2438–44.[Abstract/Free Full Text]
  49. Shiow L.R., Roadcap D.W., Paris K., et al. The actin regulator coronin 1A is mutant in a thymic egress-deficient mouse strain and in a patient with severe combined immunodeficiency. Nat. Immunol. (2008) 9:1307–15.[CrossRef][Web of Science][Medline]
  50. Kenny E.F., O'Neill L.A. Signalling adaptors used by toll-like receptors: an update. Cytokine (2008) 43:342–9.[CrossRef][Web of Science][Medline]
  51. Chen J., Aronow B.J., Jegga A.G. Disease candidate gene identification and prioritization using protein interaction networks. BMC Bioinformatics (2009) 10:73.[CrossRef][Medline]
  52. Ortutay C., Vihinen M. Identification of candidate disease genes by integrating Gene Ontologies and protein-interaction networks: case study of primary immunodeficiencies. Nucleic Acids Res. (2009) 37:622–8.[Abstract/Free Full Text]
  53. Bader G.D., Betel D., Hogue C.W. BIND: the biomolecular interaction network database. Nucleic Acids Res. (2003) 31:248–50.[Abstract/Free Full Text]
  54. Breitkreutz B.J., Stark C., Reguly T., et al. The BioGRID interaction database: 2008 update. Nucleic Acids Res. (2008) 36:D637–40.[Abstract/Free Full Text]
  55. Harley J.B., Alarcon-Riquelme M.E., Criswell L.A., et al. Genome-wide association scan in women with systemic lupus erythematosus identifies susceptibility variants in ITGAM, PXK, KIAA1542 and other loci. Nat. Genet. (2008) 40:204–10.[CrossRef][Web of Science][Medline]
  56. Nath S.K., Han S., Kim-Howard X., et al. A nonsynonymous functional variant in integrin-alpha(M) (encoded by ITGAM) is associated with systemic lupus erythematosus. Nat. Genet. (2008) 40:152–4.[CrossRef][Web of Science][Medline]
  57. Kozyrev S.V., Abelson A.K., Wojcik J., et al. Functional variants in the B-cell gene BANK1 are associated with systemic lupus erythematosus. Nat. Genet. (2008) 40:211–6.[CrossRef][Web of Science][Medline]
  58. Goyette P., Lefebvre C., Ng A., et al. Gene-centric association mapping of chromosome 3p implicates MST1 in IBD pathogenesis. Mucosal Immunol. (2008) 1:131–8.[CrossRef][Web of Science][Medline]
  59. Johnson A.D., O'Donnell C.J. An open access database of genome-wide association results. BMC Med. Genet. (2009) 10:6.[CrossRef][Medline]
  60. Wellcome Trust Case Control Consortium. Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature (2007) 447:661–78.[CrossRef][Web of Science][Medline]
  61. Todd J.A., Walker N.M., Cooper J.D., et al. Robust associations of four new chromosome regions from genome-wide analyses of type 1 diabetes. Nat. Genet. (2007) 39:857–64.[CrossRef][Medline]
  62. Plenge R.M., Cotsapas C., Davies L., et al. Two independent alleles at 6q23 associated with risk of rheumatoid arthritis. Nat. Genet. (2007) 39:1477–82.[CrossRef][Web of Science][Medline]
  63. Remmers E.F., Plenge R.M., Lee A.T., et al. STAT4 and the risk of rheumatoid arthritis and systemic lupus erythematosus. N. Engl. J. Med. (2007) 357:977–86.[Abstract/Free Full Text]
  64. Graham D.S., Graham R.R., Manku H., et al. Polymorphism at the TNF superfamily gene TNFSF4 confers susceptibility to systemic lupus erythematosus. Nat. Genet. (2008) 40:83–9.[CrossRef][Web of Science][Medline]
  65. Ueda H., Howson J.M., Esposito L., et al. Association of the T-cell regulatory gene CTLA4 with susceptibility to autoimmune disease. Nature (2003) 423:506–11.[CrossRef][Medline]
  66. Ikegami H., Awata T., Kawasaki E., et al. The association of CTLA4 polymorphism with type 1 diabetes is concentrated in patients complicated with autoimmune thyroid disease: a multicenter collaborative study in Japan. J. Clin. Endocrinol. Metab. (2006) 91:1087–92.[Abstract/Free Full Text]

Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?



This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow Supplementary Data
Right arrowOA All Versions of this Article:
16/6/345    most recent
dsp019v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Request Permissions
Google Scholar
Right arrow Articles by Keerthikumar, S.
Right arrow Articles by Pandey, A.
PubMed
Right arrow PubMed Citation
Right arrow Articles by Keerthikumar, S.
Right arrow Articles by Pandey, A.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?