Skip Navigation


DNA Research Advance Access originally published online on October 17, 2006
DNA Research 2006 13(4):135-140; doi:10.1093/dnares/dsl007
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrowOA All Versions of this Article:
13/4/135    most recent
dsl007v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (2)
Right arrow Request Permissions
Google Scholar
Right arrow Articles by Takahashi, N.
Right arrow Articles by Nakashima, H.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Takahashi, N.
Right arrow Articles by Nakashima, H.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2006. Kazusa DNA Research Institute
The online version of this article has been published under an open access model. Users are entitled to use, reproduce, disseminate, or display the open access version of this article for non-commercial purposes provided that: the original authorship is properly and fully attributed; the Journal and Oxford University Press are attributed as the original place of publication with the correct citation details given; if an article is subsequently reproduced or disseminated not in its entirety but only in part or as a derivative work this must be clearly indicated. For commercial re-use, please contact journals.permissions@oxfordjournals.org


Short Communication

Negative Correlation of G+C Content at Silent Substitution Sites Between Orthologous Human and Mouse Protein-Coding Sequences

Naoki Takahashi and Hiroshi Nakashima*

Department of Clinical Laboratory Science, Graduate Course of Medical Science and Technology Division of Health Science, Kanazawa University, 5-11-80 Kodatsuno, Kanazawa 920-0942, Japan

Received 31 January 2006; revised 22 August 2006


    Abstract
 Top
 Abstract
 1. Correlation of G+C...
 2. Classification of Orthologous...
 3. Frequencies of Substitution
 4. The Implications of...
 Acknowledgements
 References
 
We conducted a genome-wide analysis of variations in guanine plus cytosine (G+C) content at the third codon position at silent substitution sites of orthologous human and mouse protein-coding nucleotide sequences. Alignments of 3776 human protein-coding DNA sequences with mouse orthologs having >50 synonymous codons were analyzed, and nucleotide substitutions were counted by comparing sequences in the alignments extracted from gap-free regions. The G+C content at silent sites in these pairs of genes showed a strong negative correlation (r = –0.93). Some gene pairs showed significant differences in G+C content at the third codon position at silent substitution sites. For example, human thymine-DNA glycosylase was A+T-rich at the silent substitution sites, while the orthologous mouse sequence was G+C-rich at the corresponding sites. In contrast, human matrix metalloproteinase 23B was G+C-rich at silent substitution sites, while the mouse ortholog was A+T-rich. We discuss possible implications of this significant negative correlation of G+C content at silent sites.

Key words: G+C content variation; human–mouse orthologs; nucleotide substitutions in synonymous codons


The availability of complete mammalian genome sequences1Go–4Go provides an opportunity to characterize nucleotide substitution patterns among mammalian genomes by comparative sequence analysis.5Go–11Go The G+C content in bacterial genomes varies among species from 25% to 75%, but it is relatively homogeneous for genes within a given bacterial genome.12Go,13Go However, the G+C content of genes in a given mammalian genome varies considerably, because mammalian genomes are a mosaic of long (over hundreds of kilobases) DNA segments known as isochores.14Go Some G+C-poor isochores have G+C contents as low as 35%, while G+C-rich isochores have G+C contents as high as 60%. It is known that the genes within a given isochore are fairly homogeneous in G+C content. It has been reported that some homologous mammalian genes that occupy different chromosomal positions differ considerably in their base composition and codon usage.14Go–16Go Human {alpha}- and ß-globin genes are an example of this position-dependent variation. The {alpha}-globin gene cluster occupies a G+C-rich region, while the ß-globin gene resides in a G+C-poor region.14Go

In comparing alignments of orthologous human and mouse sequences, we noted that silent substitution sites at the third codon position were biased toward G+C-rich or A+T-rich nucleotides. For example, human thymine-DNA glycosylase was A+T-rich at silent substitution sites, while the orthologous mouse sequence was G+C-rich at the corresponding sites. However, human matrix metalloproteinase 23B was G+C-rich at silent substitution sites, whereas the mouse ortholog was A+T-rich at those sites. Since complete human and mouse genome sequences are now available, we conducted a comparative genome-wide analysis of G+C content variation at silent substitution sites in orthologous human and mouse sequences.


    1. Correlation of G+C Content at the Third Codon Position at Silent Substitution Sites
 Top
 Abstract
 1. Correlation of G+C...
 2. Classification of Orthologous...
 3. Frequencies of Substitution
 4. The Implications of...
 Acknowledgements
 References
 
The G+C content at the third codon position in synonymous codons, i.e. silent substitutions of the same amino acid, was determined for both human and mouse sequences. For simplicity, only synonymous codons that had an identical nucleotide at the first codon position were considered. The G+C contents at the third codon position at silent substitution sites in the 3776 pairs of human and mouse genes are plotted in Fig. 1, showing a high negative correlation (r = –0.93). The plot indicates distinct variations in G+C content at mutual silent sites in many human and mouse orthologs.


Figure 1
View larger version (24K):
[in this window]
[in a new window]
 
Figure 1. G+C contents at silent sites in 3776 human protein-coding sequences versus G+C contents at silent sites in 3776 corresponding mouse sequences. The human and mouse cDNA sequences were obtained from Reference Sequence Release 11 from the U.S. National Center for Biotechnology Information (ftp://ftp.ncbi.nih.gov/refseq/). The protein-encoding nucleotide sequences were selected according to the feature table for the data. The amino acid sequences of 28 893 human cDNAs and 25 298 mouse cDNAs were obtained by translation. Orthologs were identified by the two-directional best hit approach using BLASTP.17Go Pairs of a given sequence were selected if they showed >30% amino acid identity over three-fourths of the total length. To avoid bias, proteins showing >30% sequence identity with other proteins in the same species were excluded. Gap-free alignment regions longer than 100 amino acid residues and the corresponding DNA sequences were analyzed. Based on this criterion, 3776 pairwise alignments of human and mouse sequences that had >50 synonymous codons were chosen for analysis in this study.

 
In 2-fold degenerate codons, the equivalent third position nucleotides are either two purines (A/G) or two pyrimidines (C/T). Therefore, their silent substitutions always result in G+C content variation. However, not every silent substitution in 4-fold degenerate codons yields G+C content variation. To further examine G+C content correlation at silent substitution sites, the G+C contents in 4-fold degenerate codons were analyzed. The G+C contents at silent substitution sites were determined for the eight sets of 4-fold degenerate codons (Ala: GCN, Arg: CGN, Gly: GGN, Leu: CUN, Pro: CCN, Ser: UCN, Thr: ACN and Val: GUN) in human and mouse sequences. The plot of the G+C contents in 4-fold degenerate sites between 2084 human and mouse orthologous sequences having >50 4-fold degenerate codons had a high negative correlation coefficient (r = –0.82; data not shown).


    2. Classification of Orthologous Sequences According to G+C Content at the Third Codon Position at Silent Substitution Sites
 Top
 Abstract
 1. Correlation of G+C...
 2. Classification of Orthologous...
 3. Frequencies of Substitution
 4. The Implications of...
 Acknowledgements
 References
 
Human and mouse orthologous sequence pairs were divided into three groups according to the G+C content at the third codon position in synonymous substitution sites. In group (a), the human gene had a much lower G+C content than that in the mouse gene, in group (b) the human gene had a much higher G+C content than that in the mouse gene, and in group (c) the human gene had a G+C content similar to that in the mouse gene. The number of genes in the three groups varied according to the criterion of the classification. Using a cut-off of 30% lower G+C content in the human gene than in the mouse, 25.4% (960/3776) of the orthologous sequence pairs were classified in group (a). Based on 30% higher G+C content in the human gene than in the mouse gene, group (b) contained 17.2% (648/3776) of the orthologous sequence pairs. Group (c) contained the remaining 57.4% (2168/3776) of sequence pairs, which showed deviations between –30 and +30% in G+C content when human and mouse orthologous sequences were compared. Table 1 lists the proteins encoded by 10 well-characterized genes in each of the three groups. Most of the gene pairs within groups (a), (b) and (c) had different chromosomal locations. However, some genes within the same group had the same chromosomal locations. For example, in group (a), human potassium channel tetramerization domain containing protein 3 and human ribosomal protein S6 kinase are both located on human chromosome 1q41, and both of the corresponding mouse genes are on mouse chromosome 1H6. In group (b), human agrin, human calcium binding protein Cab45 precursor, and human matrix metalloproteinase 23B are all located on human chromosome 1p36.33, and the corresponding mouse genes are on mouse chromosome 4E2. These findings suggest that the G+C content at some silent substitution sites might be determined by their chromosomal locations. This finding is consistent with the report by Bernardi et al.14Go


View this table:
[in this window]
[in a new window]
 
Table 1. Variations in G+C content at the third codon position in synonymous substitution sites (G+C III) for protein-coding genes

 

    3. Frequencies of Substitution
 Top
 Abstract
 1. Correlation of G+C...
 2. Classification of Orthologous...
 3. Frequencies of Substitution
 4. The Implications of...
 Acknowledgements
 References
 
The parts of the alignments for thymine-DNA glycosylase (human gene: NM_003211 [GenBank] .3 and mouse gene: NM_011561 [GenBank] .1) and for matrix metalloproteinase 23B (human gene: NM_006983 [GenBank] .1 and mouse gene: NM_011985 [GenBank] .1) are shown in Fig. 2a and b, respectively. In Fig. 2, nucleotides A and T at silent substitution sites are red, nucleotides G and C at silent substitution sites are blue, and other nucleotides are shown in yellow. Amino acids are shown along the DNA sequences.


Figure 2
View larger version (52K):
[in this window]
[in a new window]
 
Figure 2. Panels (a) and (b) show the alignment between sequences of human and mouse thymine-DNA glycosylase, and alignment of human and mouse sequences of matrix metalloproteinase 23B.

 
Nucleotide substitutions were observed in 14% (738 506/5 401 758) of all of the total nucleotides contained within the 3776 pairs of orthologous human and mouse genes. Substitution frequencies in codons were 18% at the first position, 12% at the second and 70% at the third. Silent nucleotide substitutions with an identical nucleotide at the first codon position accounted for 58% (425 945/738 506) of the total substitutions. The substitution frequency of transitions (purine–purine and pyrimidine–pyrimidine substitutions) was 66.1%, and that of transversions (purine–pyrimidine and pyrimidine–purine substitutions) was 33.9%. Transitions accounted for 72.1%, and transversions for 27.9%, of the total silent nucleotide substitutions. Transitions were more frequent than transversions at silent substitutions because transitions at the third codon position are essentially silent.

When a silent substitution was observed at an alignment site, a silent nucleotide substitution was assumed to occur once in either branch at a synonymous codon site since the divergence of human and mouse lineage. Silent substitutions at synonymous codon sites between human and mouse sequences were estimated. Nucleotide substitutions were considered from the human sequence. The frequencies of the four nucleotides A, T, C and G at the third codon position at silent substitution sites in human sequence were expressed as f(A), f(T), f(C) and f(G). Let {alpha} and ß be the transition and the transversion substitution rate per year per site. T indicates the divergence time between human and mouse. Then, the nucleotide substitution frequencies at silent sites from human to mouse were calculated as shown in Table 2. Substitution frequency at G or C nucleotide in human silent sites is 2T({alpha}+2ß)·(f(G)+f(C)), and that in mouse silent sites is 2T({alpha}+ß)·(f(A)+f(T))+2Tß·(f(G)+f(C)), which is equivalent to 2T({alpha}+ß)-2T{alpha}·(f(G)+f(C)). The above nucleotide substitution frequencies were expressed as X and Y, respectively.

Formula

Formula
The relationship between X and Y is

Formula
This equation indicates that Y increases when X decreases and Y decreases when X increases. This result indicated a negative correlation in the variation of G+C content at silent sites between the two DNA sequences.


View this table:
[in this window]
[in a new window]
 
Table 2. Rates of nucleotide substitution

 

    4. The Implications of Substitutions at Silent Sites
 Top
 Abstract
 1. Correlation of G+C...
 2. Classification of Orthologous...
 3. Frequencies of Substitution
 4. The Implications of...
 Acknowledgements
 References
 
Because substitutions at silent sites in codons do not change amino acids, no effect on proteins would be expected, and these substitutions are commonly thought of as being evolutionarily neutral. However, substitutions at silent sites do alter codon usage. Grantham reported that synonymous codons are used differently by different organisms,18Go and Ikemura found a strong positive correlation between codon usage and tRNA content in unicellular organisms.19Go The codon-choice patterns of genes are often very different among multicellular eukaryotes, and codon usage in mammals is known to have dramatic effects on the translation rate.20Go Our findings on the differential codon usage between human and mouse genes suggest the possibility of different expression patterns.

Evidence indicates that genes with a high G+C content at the third codon position are usually surrounded by long G+C-rich genomic sequences, while those with a low G+C content at the third position are usually surrounded by long A+T-rich sequences.19Go,21Go Human–mouse genome sequence comparisons demonstrated a large number of rearrangements of conserved syntenic segments.2Go,22Go Since human and mouse genomes exhibit large variations in G+C content (e.g. isochores),14Go the rearrangements might produce a large deviation in G+C content between human and mouse genes by changing the surrounding sequences. The gene pairs classified into groups (a) and (b), which showed a large variation in G+C content at silent substitution sites, are considered to be products of the rearrangements of syntenic segments. The genes located in an identical syntenic segment exhibited similar G+C content variation. Gene rearrangements could be the cause of large variations in G+C content at silent substitution sites, and might lead to a significant negative correlation.

Nucleotide substitutions between human and mouse sequences have accumulated during evolution ever since their divergence from a common ancestor. It is generally assumed that the substitutions occurred independently in the two species, and there would seem to be no connection between the silent nucleotide substitutions in humans and mice. The results of this study raise the question of whether correlated substitutions at silent sites might have some possible evolutionary function. Further study is needed to address this issue.


    Acknowledgements
 Top
 Abstract
 1. Correlation of G+C...
 2. Classification of Orthologous...
 3. Frequencies of Substitution
 4. The Implications of...
 Acknowledgements
 References
 
We are grateful to one anonymous referee and an editor for constructive suggestions regarding negative correlation.


    Footnotes
 
*To whom correspondence should be addressed. Tel. +81-76-265-2582, Fax. +81-76-234-4360, E-mail: naka{at}kenroku.kanazawa-u.ac.jp

Communicated by Hiroyuki Toh


    References
 Top
 Abstract
 1. Correlation of G+C...
 2. Classification of Orthologous...
 3. Frequencies of Substitution
 4. The Implications of...
 Acknowledgements
 References
 

  1. International Human Genome Sequencing Consortium. 2001, Initial sequencing and analysis of the human genome, Nature, 409, 860–921.[CrossRef][Medline]
  2. Mouse Genome Sequencing consortium. 2002, Initial sequencing and comparative analysis of the mouse genome, Nature, 420, 520–562.[CrossRef][Medline]
  3. Rat Genome Sequencing Project Consortium. 2004, Genome sequence of the Brown Norway Rat yields insights into mammalian evolution, Nature, 428, 493–521.[CrossRef][Medline]
  4. The Chimpanzee Sequencing and Analysis Consortium. 2005, Initial sequence of the chimpanzee genome and comparison with the human genome, Nature, 437, 69–87.[CrossRef][Medline]
  5. Castresana, J. 2002, Genes on human chromosome 19 show extreme divergence from the mouse orthologs and a high GC content, Nucleic Acids Res., 30, 1751–1756.[Abstract/Free Full Text]
  6. Hardison, R. C., Roskin, K. M., Yang, S., et al. 2003, Covariation in frequencies of substitution, deletion, transposition, and recombination during eutherian evolution, Genome Res., 13, 13–26.[Abstract/Free Full Text]
  7. Yang, S., Smit, A. F., Schwartz, S., et al. 2004, Patterns of insertions and their covariation with substitutions in the rat, mouse, and human genomes, Genome Res., 14, 517–527.[Abstract/Free Full Text]
  8. Cooper, G. M., Brudno, M., Stone, E. A., Dubchak, I., Batzoglou, S., Sidow, A. 2004, Characterization of evolutionary rates and constraints in three mammalian genomes, Genome Res., 14, 539–548.[Abstract/Free Full Text]
  9. Zhang, Z. and Gerstein, M. 2003, Patterns of nucleotide substitution, insertion and deletion in the human genome inferred from pseudogenes, Nucleic Acids Res., 31, 5338–5348.[Abstract/Free Full Text]
  10. Subramanian, S. and Kumar, S. 2003, Neutral substitutions occur at a faster rate in exons than in noncoding DNA in primate genomes, Genome Res., 13, 838–844.[Abstract/Free Full Text]
  11. Clark, A. G., Glanowski, S., Nielsen, R., et al. 2003, Inferring nonneutral evolution from human-chimp-mouse orthologous gene trios, Science, 302, 1960–1963.[Abstract/Free Full Text]
  12. Muto, A. and Osawa, S. 1987, The guanine and cytosine content of genomic DNA and bacterial evolution, Proc. Natl Acad. Sci. USA, 84, 166–169.[Abstract/Free Full Text]
  13. Lawrence, J. G. and Ochman, H. 1997, Amelioration of bacterial genomes: rates of change and exchange, J. Mol. Evol., 44, 383–397.[CrossRef][Web of Science][Medline]
  14. Bernardi, G., Olofsson, B., Filipski, J., et al. 1985, The mosaic genome of warm-blooded vertebrates, Science, 228, 953–958.[Abstract/Free Full Text]
  15. Nadeau, J. H. and Taylor, B. A. 1984, Lengths of chromosomal segments conserved since divergence of man and mouse, Proc. Natl Acad. Sci. USA, 81, 814–818.[Abstract/Free Full Text]
  16. Mouchiroud, D. and Gautier, C. 1988, High codon-usage changes in mammalian genes, Mol. Biol. Evol., 5, 192–194.[Web of Science][Medline]
  17. Altschul, S. F., Gish, W., Miller, W., Myers, E. W., Lipman, D. J. 1990, Basic local alignment search tool, J. Mol. Biol., 215, 403–410.[CrossRef][Web of Science][Medline]
  18. Grantham, R. 1980, Workings of the genetic code, Trend Biochem. Sci., 5, 327–331.
  19. Ikemura, T. 1985, Codon usage and tRNA content in unicellular and multicellular organisms, Mol. Biol. Evol., 2, 13–34.[Abstract]
  20. Zolotukhin, S., Potter, M., Hauswirth, W. W., Guy, J., Muzyczka, N. 1996, A "humanized" green fluorescent protein cDNA adapted for high-level expression in mammalian cells, J. Virol., 70, 4646–4654.[Abstract/Free Full Text]
  21. Ikemura, T. and Wada, K. 1991, Evident diversity of codon usage patterns of human genes with respect to chromosome banding patterns and chromosome numbers; relation between nucleotide sequence data and cytogenetic data, Nucleic Acids Res., 19, 4333–4339.[Abstract/Free Full Text]
  22. Bourque, G., Pevzner, P. A., Tesler, G. 2004, Reconstructing the genomic architecture of ancestral mammals: lessons from human, mouse, and rat genomes, Genome Res., 14, 507–516.[Abstract/Free Full Text]

Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
DNA ResHome page
S. K. Bag, S. Paul, S. Ghosh, and C. Dutta
Reverse Polarization in Amino acid and Nucleotide Substitution Patterns Between Human Mouse Orthologs of Two Compositional Extrema
DNA Res, September 25, 2007; (2007) dsm015v1.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrowOA All Versions of this Article:
13/4/135    most recent
dsl007v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (2)
Right arrow Request Permissions
Google Scholar
Right arrow Articles by Takahashi, N.
Right arrow Articles by Nakashima, H.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Takahashi, N.
Right arrow Articles by Nakashima, H.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?