DNA Research Advance Access originally published online on October 3, 2009
DNA Research 2009 16(6):345-351; doi:10.1093/dnares/dsp019
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Prediction of Candidate Primary Immunodeficiency Disease Genes Using a Support Vector Machine Learning Approach
1 Institute of Bioinformatics, International Technology Park, Bangalore 560 066, India
2 Department of Biotechnology and Bioinformatics, Kuvempu University, Jnanasahyadri, Shimoga 577 451, India
3 Research Unit for Immunoinformatics, Research Center for Allergy and Immunology, RIKEN Yokohama Institute, Kanagawa 230-0045, Japan
4 Department of Computer Science and Automation, Indian Institute of Science, Bangalore 560 012, India
5 McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University School of Medicine, 733 N. Broadway, BRB Room 527, Baltimore, MD 21205, USA
6 Laboratory for Immunogenomics, Research Center for Allergy and Immunology, RIKEN, Yokohama Institute, Kanagawa 230-0045, Japan
7 Department of Human Genome Technology, Kazusa DNA Research Institute, 2-6-7 Kazusa-Kamatari, Kisarazu, Chiba 292-0818, Japan
8 Department of Medical Informatics, National Defense Medical College, Saitama 359-8513, Japan
Received 23 July 2009 ; accepted 5 September 2009.
Screening and early identification of primary immunodeficiency disease (PID) genes is a major challenge for physicians. Many resources have catalogued molecular alterations in known PID genes along with their associated clinical and immunological phenotypes. However, these resources do not assist in identifying candidate PID genes. We have recently developed a platform designated Resource of Asian PDIs, which hosts information pertaining to molecular alterations, protein–protein interaction networks, mouse studies and microarray gene expression profiling of all known PID genes. Using this resource as a discovery tool, we describe the development of an algorithm for prediction of candidate PID genes. Using a support vector machine learning approach, we have predicted 1442 candidate PID genes using 69 binary features of 148 known PID genes and 3162 non-PID genes as a training data set. The power of this approach is illustrated by the fact that six of the predicted genes have recently been experimentally confirmed to be PID genes. The remaining genes in this predicted data set represent attractive candidates for testing in patients where the etiology cannot be ascribed to any of the known PID genes.
Key words: RAPID; SVM; HPRD; Human Proteinpedia; NetPath
* To whom correspondence should be addressed. Tel. +1 410-502-6662. Fax. +1 410-502-7544. E-mail: pandey{at}jhmi.e.du