DNA Research Advance Access published online on October 3, 2009
DNA Research, doi:10.1093/dnares/dsp019
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Prediction of Candidate Primary Immunodeficiency Disease Genes Using a Support Vector Machine Learning Approach
1 Institute of Bioinformatics, International Technology Park, Bangalore 560 066, India
2 Department of Biotechnology and Bioinformatics, Kuvempu University, Jnanasahyadri, Shimoga 577 451, India
3 Research Unit for Immunoinformatics, Research Center for Allergy and Immunology, RIKEN Yokohama Institute, Kanagawa 230-0045, Japan
4 Department of Computer Science and Automation, Indian Institute of Science, Bangalore 560 012, India
5 McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University School of Medicine, 733 N. Broadway, BRB Room 527, Baltimore, MD 21205, USA
6 Laboratory for Immunogenomics, Research Center for Allergy and Immunology, RIKEN, Yokohama Institute, Kanagawa 230-0045, Japan
7 Department of Human Genome Technology, Kazusa DNA Research Institute, 2-6-7 Kazusa-Kamatari, Kisarazu, Chiba 292-0818, Japan
8 Department of Medical Informatics, National Defense Medical College, Saitama 359-8513, Japan
Received 23 July 2009; accepted 5 September 2009.
| Abstract |
|---|
|
|
|---|
Screening and early identification of primary immunodeficiency disease (PID) genes is a major challenge for physicians. Many resources have catalogued molecular alterations in known PID genes along with their associated clinical and immunological phenotypes. However, these resources do not assist in identifying candidate PID genes. We have recently developed a platform designated Resource of Asian PDIs, which hosts information pertaining to molecular alterations, protein–protein interaction networks, mouse studies and microarray gene expression profiling of all known PID genes. Using this resource as a discovery tool, we describe the development of an algorithm for prediction of candidate PID genes. Using a support vector machine learning approach, we have predicted 1442 candidate PID genes using 69 binary features of 148 known PID genes and 3162 non-PID genes as a training data set. The power of this approach is illustrated by the fact that six of the predicted genes have recently been experimentally confirmed to be PID genes. The remaining genes in this predicted data set represent attractive candidates for testing in patients where the etiology cannot be ascribed to any of the known PID genes.
Key words: RAPID; SVM; HPRD; Human Proteinpedia; NetPath
| 1. Introduction |
|---|
|
|
|---|
Primary immunodeficiency diseases (PIDs) are a genetically heterogeneous group of disorders that affect distinct components of the innate and adaptive immune system, such as neutrophils, macrophages, dendritic cells, natural killer cells and T and B lymphocytes. The study of these diseases has provided essential insights into the functioning of our immune system. More than 120 distinct genes have been identified, whose abnormalities account for more than 150 distinct forms of PID.1
Because genes and proteins rarely work in isolation, genes that directly or functionally interact with known PID genes could also represent additional PID genes. We have recently developed a database of PID genes designated Resource of Asian PDIs (RAPID), which contains information pertaining to genes and proteins involved in PDIs along with other relevant information about protein–protein interactions, mouse knockout studies and microarray gene expression profiles in various cells and organs of the immune system. These significant features of PID genes, including their involvement in immune signaling pathways, were used as input binary features for the prediction of additional candidate PID genes using a support vector machine (SVM) learning approach.
SVM is a powerful machine learning technique widely used in the computational biology such as microarray data analysis,3
–8
protein secondary structure prediction,9
prediction of human signal peptide cleavage sites,10
translational initiation site recognition in DNA,11
protein fold recognition,12
,13
prediction of protein–protein interactions,14
prediction of protein sub-cellular localization,15
–18
and peptide identification from mass-spectrometry derived data.19
SVM is a learning algorithm that can be used to generate a classifier from a set of positively and negatively labeled training data sets.20
SVM learns the classifier by mapping the input training samples into a possibly high-dimensional feature space and seeking a hyperplane in this space, which separates the two types of examples with the largest possible margin, i.e. distance to the nearest points. If the training set is not linearly separable, SVM finds a hyperplane, which optimizes a trade-off between good classification and large margin.20
For predicting a classifier between PID and non-PID genes, we have solved the above problem and obtained a linear classifier (Fig. 1). To prove generalization of the predicted classifier, we have reported leave-one-out (LOO) error for the training data set. In this approach, we have used all the known PID genes that have been described in the literature as a positive data set. The gene list for negative data sets was selected from mouse genomic informatics (MGI) database based on the criterion that mutations in mice do not result in either immune or hematopoietic system phenotypes. We trained SVM with 69 features (Supplementary Table S1) for both PID genes (positive data set) and genes that were not reported to be associated with PIDs (negative data set). The trained SVM was then used to predict candidate PID genes by testing all human genes (except those used in the training data sets) as test data set.
|
| 2. Materials and methods |
|---|
|
|
|---|
2.1. Initial platform
RAPIDs, which is available as a worldwide web resource at http://rapid.rcai.riken.jp/21
2.2. Features used for training the data sets
The PDIs are characterized by essential defects in the functions of the immune system, leading to increased susceptibility to infections. Although rare, these disorders cover a wide spectrum of defects, including antibody deficiencies, cellular immune deficiencies, combined immune deficiencies, phagocytic defects, complement and other innate immunity defects. On the basis of these observations for all the known PID genes, we selected 69 features (Supplementary Table S1) which not only play an important role in the development, maintenance and normal functioning of immune/hematopoietic systems but also in understanding molecular pathophysiology of PID disease causing genes. These features can be broadly classified as features for signaling pathways from NetPath and KEGG22
–24
database, microarray gene expression profile from RefDIC25
database, site of expression from HPRD26
and Human Proteinpedia,27
immune/hematopoietic phenotypes from MGI28
,29
and interaction with PID feature from HPRD.
2.3. Data sets
To train the SVM, two types of data sets were generated—the positive data set consists of all the known PID genes, whereas the negative data set contained genes where no immune/hematopoietic system abnormalities were described due to mouse knockouts, knockins or spontaneous mutations reported for the mouse orthologs in the MGI database.30
On the basis of these criteria, 148 PID genes were in the positive data set and 3162 genes were in the negative data set. Test data set contains 36 677 genes encoded by the human genome. Genes involved in both the training and test data sets were assigned a binary score of 1 and 0, respectively, based on their presence or absence in a particular feature. The trained SVM was used to classify PID or non-PID genes from an unlabeled test data set which consists of all human genes (Fig. 2).
|
2.4. SVM implementation
We used SVMlight (http://svmlight.joachims.org/), an implementation of Support Vector Machines in C, and also used customized functions written in MATLAB (http://www.mathtools.net/MATLAB/) for the calculation of confidence score for each predicted candidate PID gene. Absolute score also known as confidence score can be defined as
2.5. LOO error
LOO error measurement involves removing one gene from the training set, training the SVM on the remaining genes and then predicting the class label of that gene that was left out. This process is repeated until all the genes are left out exactly once. If the gene was classified correctly, the error was reported as zero, else the error was reported as one. This process was repeated by leaving out each gene once and the LOO error of the data set represent the average of individual errors.
| 3. Results and discussion |
|---|
|
|
|---|
Over 1500 Mendelian disorders whose molecular basis is unknown are catalogued in the online Mendelian inheritance in man (OMIM) database.31
As the number of genes in the positive data set is small, the LOO error was calculated for showing generalization of the algorithm. LOO error is explained in detail under the Materials and methods section. For this, we used a data set containing 148 PID genes from positive data sets along with 148 genes that were randomly selected from the negative data set. This process was repeated and from 60 such data sets, the LOO error was calculated. The average LOO error reported over 60 data sets was
8%. The LOO error reported by leaving out only the PID (positive) genes one by one (where training set contains same setting of 296 data points) was
15%.
3.1. Sensitivity and specificity
The sensitivity and specificity of the data sets was 0.85 and 0.98, respectively. On the basis of these results, we conclude that the number of genes falsely predicted to be PID genes by the trained classifier is
2%. We believe that availability of comprehensive and accurate biological data is a limitation that restricts the prediction accuracy and performance of this algorithm. As more data accumulates about the human genome and proteome, we expect the performance of this algorithm to improve further in the future. The complete list of predicted candidate genes is provided in Supplementary Table S2 and also available at the RAPID website http://rapid.rcai.riken.jp/. All 69 features of the predicted candidate PID genes can also be downloaded from the RAPID website.
3.2. Evaluation studies
We were able to evaluate our predictions in a limited fashion because a few studies have been published describing novel PID genes that were not included in our original list of PID genes. These experimental studies have confirmed six of the genes in our predicted list of PID genes as true PID genes. These are myeloid differentiation factor-88 (MYD88), catalytic subunit of DNA dependent serine/threonine protein kinase (PRKDC), glucose-6-phosphatase, catalytic subunit 3 (G6PC3),43
–45
IL2-inducible T-cell kinase (ITK), coronin, actin binding protein 1A (CORO1A) and Interleukin 1 receptor antagonist (IL1RN).46
–49
MyD88 is a key downstream adaptor protein in IL1 receptor complex and toll-like receptors signaling pathways involved in inflammatory response and host defense. In addition, MyD88 is also involved in tumorigenesis in models of hepatocarcinoma and familial associated polyposis; negative regulation of TLR3 signaling and in PKC epsilon activation.50
Patients with MyD88 deficiency are reported to be susceptible to pyogenic bacterial infections including invasive pneumococcal disease.45
Defect in PRKDC has been reported for the first time in a radiosensitive T-B-SCID patient that results in inhibition of Artemis activation and non-homologous end-joining.44
A report of mutations in G6PC3 gene has been observed among patients with severe congenital neutropenia syndrome and also shown to be susceptible to increased apoptosis that leads to disturbances in cardiac or urogenital development.43
A novel PDI, IL-2 inducible T-cell kinase (ITK) deficiency has been observed due to fatal immune dysregulation followed by EBV infection and identified homozygous mutation in the SH2 domain of ITK gene that resulted in protein destabilization and absence of NKT cells.47
A patient with T cell-deficient, B cell-sufficient and NK cell-sufficient severe combined immunodeficiency has been identified with mutation in CORO1A gene along with reduced T-cell function that was earlier demonstrated in knock-out mice of coro1a gene with similar phenotypes.49
Deficiency of the IL1-receptor antagonist, an autosomal recessive autoinflammatory disease, has been reported for the first time in children presented with clinical phenotypes of multifocal osteomyelitis, periostitis, pustulosis, thrombosis and respiratory insufficiency due to the homozygous deletion of the IL1RN gene.46
,48
Further, functional analysis of these mutants confirmed diminished or lack of mRNA and protein expressions leading to cytokine abnormalities.
There are two recent independent reports51
,52
on the identification and prioritization of candidate disease genes in general as well as specific to primary immunodeficiencies by integrating functional annotations from gene ontology and compilation of protein interaction network data sets from BIND,53
BioGRID54
and HPRD.26
In the latter studies, 24 candidate genes were reported that are likely to be involved in PID have been identified using these parameters, out of which, over 80% of these genes are already listed as candidates in our SVM analysis, thereby, paving the way for successful implementation of this approach in the future.
We have also summarized reports of genome-wide association studies and other related studies for newly identified candidate PID genes and the associated immunological disorder (Table 1). Because the candidate PID gene list is still large, this approach of integrating data from high-throughput studies would allow further prioritization of genes for confirmation in patients with PID where the exact gene is not yet identified. We hope that such integrated approaches should assist PID physicians and researchers to gain insights into the pathophysiology of these diseases at a faster pace, which could be translated to improve the diagnosis and/or treatment of PIDs.
|
3.3. Availability
The list of predicted PID genes is available as Supplementary Table S2 and at the RAPID website http://rapid.rcai.riken.jp/.
| Supplementary Data |
|---|
|
|
|---|
Supplementary data are available at www.dnaresearch.oxfordjournals.org.
| Funding |
|---|
|
|
|---|
We thank the Department of Biotechnology of the Government of India for research support to the Institute of Bioinformatics, Bangalore.
| Acknowledgements |
|---|
|
|
|---|
The authors thank Shigeaki Nonoyama, Hirokazu Kanegane, Toshio Miyawaki, Koichi Oshima and Atsushi Hijikata for their valuable input and suggestions.
| Footnotes |
|---|
* To whom correspondence should be addressed. Tel. +1 410-502-6662. Fax. +1 410-502-7544. E-mail: pandey{at}jhmi.e.du
| References |
|---|
|
|
|---|
- Geha R.S., Notarangelo L.D., Casanova J.L., et al. Primary immunodeficiency diseases: an update from the International Union of Immunological Societies Primary Immunodeficiency Diseases Classification Committee. J. Allergy Clin. Immunol. (2007) 120:776–94.[CrossRef][Web of Science][Medline]
- Marodi L., Notarangelo L.D. Immunological and genetic bases of new primary immunodeficiencies. Nat. Rev. Immunol. (2007) 7:851–61.[CrossRef][Web of Science][Medline]
- Brown M.P., Grundy W.N., Lin D., et al. Knowledge-based analysis of microarray gene expression data by using support vector machines. Proc. Natl Acad. Sci. USA (2000) 97:262–7.
[Abstract/Free Full Text] - Furey T.S., Cristianini N., Duffy N., Bednarski D.W., Schummer M., Haussler D. Support vector machine classification and validation of cancer tissue samples using microarray expression data. Bioinformatics (2000) 16:906–14.
[Abstract/Free Full Text] - Pirooznia M., Yang J.Y., Yang M.Q., Deng Y. A comparative study of different machine learning methods on microarray gene expression data. BMC Genomics (2008) 9(Suppl_1):S13.
- Wang L., Zhu J., Zou H. Hybrid huberized support vector machines for microarray classification and gene selection. Bioinformatics (2008) 24:412–9.
[Abstract/Free Full Text] - Wang Y., Tetko I.V., Hall M.A., et al. Gene selection from microarray data for cancer classification—a machine learning approach. Comput. Biol. Chem. (2005) 29:37–46.[CrossRef][Web of Science][Medline]
- Yeang C.H., Ramaswamy S., Tamayo P., et al. Molecular classification of multiple tumor types. Bioinformatics (2001) 17(Suppl 1):S316–22.[Abstract]
- Hua S., Sun Z. A novel method of protein secondary structure prediction with high segment overlap measure: support vector machine approach. J. Mol. Biol. (2001) 308:397–407.[CrossRef][Web of Science][Medline]
- Jagla B., Schuchhardt J. Adaptive encoding neural networks for the recognition of human signal peptide cleavage sites. Bioinformatics (2000) 16:245–50.
[Abstract/Free Full Text] - Zien A., Ratsch G., Mika S., Scholkopf B., Lengauer T., Muller K.R. Engineering support vector machine kernels that recognize translation initiation sites. Bioinformatics (2000) 16:799–807.
[Abstract/Free Full Text] - Cai Y.D., Liu X.J., Xu X., Zhou G.P. Support vector machines for predicting protein structural class. BMC Bioinformatics (2001) 2:3.[CrossRef][Medline]
- Ding C.H., Dubchak I. Multi-class protein fold recognition using support vector machines and neural networks. Bioinformatics (2001) 17:349–58.
[Abstract/Free Full Text] - Bock J.R., Gough D.A. Predicting protein–protein interactions from primary structure. Bioinformatics (2001) 17:455–60.
[Abstract/Free Full Text] - Bhasin M., Raghava G.P. ESLpred: SVM-based method for subcellular localization of eukaryotic proteins using dipeptide composition and PSI-BLAST. Nucleic Acids Res. (2004) 32:W414–9.
[Abstract/Free Full Text] - Garg A., Bhasin M., Raghava G.P. Support vector machine-based method for subcellular localization of human proteins using amino acid compositions, their order, and similarity search. J. Biol. Chem. (2005) 280:14427–32.
[Abstract/Free Full Text] - Hua S., Sun Z. Support vector machine approach for protein subcellular localization prediction. Bioinformatics (2001) 17:721–8.
[Abstract/Free Full Text] - Shi J.Y., Zhang S.W., Pan Q., Cheng Y.M., Xie J. Prediction of protein subcellular localization by support vector machines using multi-scale energy and pseudo amino acid composition. Amino Acids (2007) 33:69–74.[CrossRef][Web of Science][Medline]
- Anderson D.C., Li W., Payan D.G., Noble W.S. A new algorithm for the evaluation of shotgun peptide sequencing in proteomics: support vector machine classification of peptide MS/MS spectra and SEQUEST scores. J. Proteome. Res. (2003) 2:137–46.[CrossRef][Web of Science][Medline]
- Park K.J., Gromiha M.M., Horton P., Suwa M. Discrimination of outer membrane proteins using support vector machines. Bioinformatics (2005) 21:4223–9.
[Abstract/Free Full Text] - Keerthikumar S., Raju R., Kandasamy K., et al. RAPID: resource of Asian primary immunodeficiency diseases. Nucleic Acids Res. (2009) 37:D863–7.
[Abstract/Free Full Text] - Kanehisa M., Araki M., Goto S., et al. KEGG for linking genomes to life and the environment. Nucleic Acids Res. (2008) 36:D480–4.
[Abstract/Free Full Text] - Kanehisa M., Goto S. KEGG: Kyoto encyclopedia of genes and genomes. Nucleic Acids Res. (2000) 28:27–30.
[Abstract/Free Full Text] - Kanehisa M., Goto S., Hattori M., et al. From genomics to chemical genomics: new developments in KEGG. Nucleic Acids Res. (2006) 34:D354–7.
[Abstract/Free Full Text] - Hijikata A., Kitamura H., Kimura Y., et al. Construction of an open-access database that integrates cross-reference information from the transcriptome and proteome of immune cells. Bioinformatics (2007) 23:2934–41.
[Abstract/Free Full Text] - Keshava Prasad T.S., Goel R., Kandasamy K., et al. Human protein reference database—2009 update. Nucleic Acids Res. (2009) 37:D767–72.
[Abstract/Free Full Text] - Kandasamy K., Keerthikumar S., Goel R., et al. Human Proteinpedia: a unified discovery resource for proteomics research. Nucleic Acids Res. (2009) 37:D773–81.
[Abstract/Free Full Text] - Blake J.A., Bult C.J., Eppig J.T., Kadin J.A., Richardson J.E. The mouse genome database genotypes:phenotypes. Nucleic Acids Res. (2009) 37:D712–9.
[Abstract/Free Full Text] - Bult C.J., Eppig J.T., Kadin J.A., Richardson J.E., Blake J.A. The mouse genome database (MGD): mouse biology and model systems. Nucleic Acids Res. (2008) 36:D724–8.
[Abstract/Free Full Text] - Eppig J.T., Blake J.A., Bult C.J., Kadin J.A., Richardson J.E. The mouse genome database (MGD): new features facilitating a model system. Nucleic Acids Res. (2007) 35:D630–7.
[Abstract/Free Full Text] - Amberger J., Bocchini C.A., Scott A.F., Hamosh A. McKusick's online Mendelian inheritance in man (OMIM). Nucleic Acids Res. (2009) 37:D793–6.
[Abstract/Free Full Text] - Botstein D., Risch N. Discovering genotypes underlying human phenotypes: past successes for Mendelian disease, future approaches for complex disease. Nat. Genet. (2003) 33(Suppl):228–37.[CrossRef][Web of Science][Medline]
- Glazier A.M., Nadeau J.H., Aitman T.J. Finding genes that underlie complex traits. Science (2002) 298:2345–9.
[Abstract/Free Full Text] - Freudenberg J., Propping P. A similarity-based method for genome-wide prediction of disease-relevant human genes. Bioinformatics (2002) 18(Suppl 2):S110–5.[Abstract]
- Huang D., Chow T.W. Identifying the biologically relevant gene categories based on gene expression and biological data: an example on prostate cancer. Bioinformatics (2007) 23:1503–10.
[Abstract/Free Full Text] - Kohler S., Bauer S., Horn D., Robinson P.N. Walking the interactome for prioritization of candidate disease genes. Am. J. Hum. Genet. (2008) 82:949–58.[CrossRef][Web of Science][Medline]
- Perez-Iratxeta C., Bork P., Andrade M.A. Association of genes to genetically inherited diseases using data mining. Nat. Genet. (2002) 31:316–9.[Web of Science][Medline]
- Segal E., Wang H., Koller D. Discovering molecular pathways from protein interaction and gene expression data. Bioinformatics (2003) 19(Suppl 1):i264–71.[Abstract]
- Wang K., Li M., Bucan M. Pathway-based approaches for analysis of genomewide association studies. Am. J. Hum. Genet. (2007) 81:1278–1283.[CrossRef][Web of Science]
- Zhang H.H., Ahn J., Lin X., Park C. Gene selection using support vector machines with non-convex penalty. Bioinformatics (2006) 22:88–95.
[Abstract/Free Full Text] - Lewis D.P., Jebara T., Noble W.S. Support vector machine learning from heterogeneous data: an empirical analysis using protein sequence and structure. Bioinformatics (2006) 22:2753–60.
[Abstract/Free Full Text] - Radivojac P., Peng K., Clark W.T., et al. An integrated approach to inferring gene-disease associations in humans. Proteins (2008) 72:1030–7.[CrossRef][Web of Science][Medline]
- Boztug K., Appaswamy G., Ashikov A., et al. A syndrome with congenital neutropenia and mutations in G6PC3. N. Engl. J. Med. (2009) 360:32–43.
[Abstract/Free Full Text] - van der Burg M., Ijspeert H., Verkaik N.S., et al. A DNA-PKcs mutation in a radiosensitive T-B- SCID patient inhibits Artemis activation and nonhomologous end-joining. J. Clin. Invest. (2009) 119:91–8.[Web of Science][Medline]
- von Bernuth H., Picard C., Jin Z., et al. Pyogenic bacterial infections in humans with MyD88 deficiency. Science (2008) 321:691–6.
[Abstract/Free Full Text] - Aksentijevich I., Masters S.L., Ferguson P.J., et al. An autoinflammatory disease with deficiency of the interleukin-1-receptor antagonist. N. Engl. J. Med. (2009) 360:2426–37.
[Abstract/Free Full Text] - Huck K., Feyen O., Niehues T., et al. Girls homozygous for an IL-2-inducible T cell kinase mutation that leads to protein deficiency develop fatal EBV-associated lymphoproliferation. J. Clin. Invest. (2009) 119:1350–8.[CrossRef][Web of Science][Medline]
- Reddy S., Jia S., Geoffrey R., et al. An autoinflammatory disease due to homozygous deletion of the IL1RN locus. N. Engl. J. Med. (2009) 360:2438–44.
[Abstract/Free Full Text] - Shiow L.R., Roadcap D.W., Paris K., et al. The actin regulator coronin 1A is mutant in a thymic egress-deficient mouse strain and in a patient with severe combined immunodeficiency. Nat. Immunol. (2008) 9:1307–15.[CrossRef][Web of Science][Medline]
- Kenny E.F., O'Neill L.A. Signalling adaptors used by toll-like receptors: an update. Cytokine (2008) 43:342–9.[CrossRef][Web of Science][Medline]
- Chen J., Aronow B.J., Jegga A.G. Disease candidate gene identification and prioritization using protein interaction networks. BMC Bioinformatics (2009) 10:73.[CrossRef][Medline]
- Ortutay C., Vihinen M. Identification of candidate disease genes by integrating Gene Ontologies and protein-interaction networks: case study of primary immunodeficiencies. Nucleic Acids Res. (2009) 37:622–8.
[Abstract/Free Full Text] - Bader G.D., Betel D., Hogue C.W. BIND: the biomolecular interaction network database. Nucleic Acids Res. (2003) 31:248–50.
[Abstract/Free Full Text] - Breitkreutz B.J., Stark C., Reguly T., et al. The BioGRID interaction database: 2008 update. Nucleic Acids Res. (2008) 36:D637–40.
[Abstract/Free Full Text] - Harley J.B., Alarcon-Riquelme M.E., Criswell L.A., et al. Genome-wide association scan in women with systemic lupus erythematosus identifies susceptibility variants in ITGAM, PXK, KIAA1542 and other loci. Nat. Genet. (2008) 40:204–10.[CrossRef][Web of Science][Medline]
- Nath S.K., Han S., Kim-Howard X., et al. A nonsynonymous functional variant in integrin-alpha(M) (encoded by ITGAM) is associated with systemic lupus erythematosus. Nat. Genet. (2008) 40:152–4.[CrossRef][Web of Science][Medline]
- Kozyrev S.V., Abelson A.K., Wojcik J., et al. Functional variants in the B-cell gene BANK1 are associated with systemic lupus erythematosus. Nat. Genet. (2008) 40:211–6.[CrossRef][Web of Science][Medline]
- Goyette P., Lefebvre C., Ng A., et al. Gene-centric association mapping of chromosome 3p implicates MST1 in IBD pathogenesis. Mucosal Immunol. (2008) 1:131–8.[CrossRef][Web of Science][Medline]
- Johnson A.D., O'Donnell C.J. An open access database of genome-wide association results. BMC Med. Genet. (2009) 10:6.[CrossRef][Medline]
- Wellcome Trust Case Control Consortium. Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature (2007) 447:661–78.[CrossRef][Web of Science][Medline]
- Todd J.A., Walker N.M., Cooper J.D., et al. Robust associations of four new chromosome regions from genome-wide analyses of type 1 diabetes. Nat. Genet. (2007) 39:857–64.[CrossRef][Medline]
- Plenge R.M., Cotsapas C., Davies L., et al. Two independent alleles at 6q23 associated with risk of rheumatoid arthritis. Nat. Genet. (2007) 39:1477–82.[CrossRef][Web of Science][Medline]
- Remmers E.F., Plenge R.M., Lee A.T., et al. STAT4 and the risk of rheumatoid arthritis and systemic lupus erythematosus. N. Engl. J. Med. (2007) 357:977–86.
[Abstract/Free Full Text] - Graham D.S., Graham R.R., Manku H., et al. Polymorphism at the TNF superfamily gene TNFSF4 confers susceptibility to systemic lupus erythematosus. Nat. Genet. (2008) 40:83–9.[CrossRef][Web of Science][Medline]
- Ueda H., Howson J.M., Esposito L., et al. Association of the T-cell regulatory gene CTLA4 with susceptibility to autoimmune disease. Nature (2003) 423:506–11.[CrossRef][Medline]
- Ikegami H., Awata T., Kawasaki E., et al. The association of CTLA4 polymorphism with type 1 diabetes is concentrated in patients complicated with autoimmune thyroid disease: a multicenter collaborative study in Japan. J. Clin. Endocrinol. Metab. (2006) 91:1087–92.
[Abstract/Free Full Text]
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

