Skip Navigation


DNA Research Advance Access originally published online on September 30, 2008
DNA Research 2008 15(6):347-356; doi:10.1093/dnares/dsn023
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow Supplementary Data
Right arrowOA All Versions of this Article:
15/6/347    most recent
dsn023v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Request Permissions
Google Scholar
Right arrow Articles by Mukhopadhyay, P.
Right arrow Articles by Ghosh, T. C.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Mukhopadhyay, P.
Right arrow Articles by Ghosh, T. C.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2008. Kazusa DNA Research Institute
The online version of this article has been published under an open access model. Users are entitled to use, reproduce, disseminate, or display the open access version of this article for non-commercial purposes provided that: the original authorship is properly and fully attributed; the Journal and Oxford University Press are attributed as the original place of publication with the correct citation details given; if an article is subsequently reproduced or disseminated not in its entirety but only in part or as a derivative work this must be clearly indicated. For commercial re-use, please contact journals.permissions@oxfordjournals.org

Differential Selective Constraints Shaping Codon Usage Pattern of Housekeeping and Tissue-specific Homologous Genes of Rice and Arabidopsis

Pamela Mukhopadhyay1, Surajit Basak2 and Tapash Chandra Ghosh1,*

1 Bioinformatics Centre, Bose Institute, P 1/12, C.I.T. Scheme VII M, Kolkata 700 054, India
2 Biomedical Informatics Centre, National Institute of Cholera and Enteric Diseases, P- 33, CIT Scheme XM, Kolkata 700 010, India

Received 17 March 2008; accepted 2 September 2008.


    Abstract
 Top
 Abstract
 1. Introduction
 2. Materials and methods
 3. Results and discussion
 Supplementary Data
 Funding
 Acknowledgements
 References
 
Intra-genomic variation between housekeeping and tissue-specific genes has always been a study of interest in higher eukaryotes. To-date, however, no such investigation has been done in plants. Availability of whole genome expression data for both rice and Arabidopsis has made it possible to examine the evolutionary forces in shaping codon usage pattern in both housekeeping and tissue-specific genes in plants. In the present work, we have taken 4065 rice–Arabidopsis homologous gene pairs to study evolutionary forces responsible for codon usage divergence between housekeeping and tissue-specific genes. In both rice and Arabidopsis, it is mutational bias that regulates error minimization in highly expressed genes of both housekeeping and tissue-specific genes. Our results show that, in comparison to tissue-specific genes, housekeeping genes are under strong selective constraint in plants. However, in tissue-specific genes, lowly expressed genes are under stronger selective constraint compared with highly expressed genes. We demonstrated that constraint acting on mRNA secondary structure is responsible for modulating codon usage variations in rice tissue-specific genes. Thus, different evolutionary forces must underline the evolution of synonymous codon usage of highly expressed genes of housekeeping and tissue-specific genes in rice and Arabidopsis.

Key words: error minimization; housekeeping; mRNA folding energy; synonymous rates; tissue specific; tRNA copy number


    1. Introduction
 Top
 Abstract
 1. Introduction
 2. Materials and methods
 3. Results and discussion
 Supplementary Data
 Funding
 Acknowledgements
 References
 
The completed genome sequences of rice1Go (Oryza sativa) and Arabidopsis2Go (Arabidopsis thaliana) constitute a valuable resource for comparative genomic analysis, as they are representatives of the two major evolutionary lineages within the angiosperms: the monocotyledons and the dicotyledons. The divergence in codon usage patterns between rice and Arabidopsis genes has occurred since the evolutionary divergence of the dicots and monocots ~200 million years (My) ago, with increment in GC content of some rice genes.3Go,4Go The large scale variation in DNA base composition due to increment of GC revealed two gene classes, namely GC-rich and GC-poor in monocots, but not in dicots.5Go–8Go It is estimated that codon usage variation in monocots is mainly determined by spatial arrangement of genomic G + C-content, i.e. the isochore structure similar to mammals.9Go The biased gene distribution in the rice genome raised a question about the distribution of tissue-specific and widely expressed genes according to the GC level of the isochores. Several studies indicated that the distribution of widely expressed genes in human is not correlated with GC levels of isochores.10Go–13Go However, Lercher et al.14Go reported that there is a strong correlation between gene expression breadth and GC content in human, suggesting that there might be selective pressure favoring the concentration of housekeeping genes in GC-rich isochors. Evolutionary studies on housekeeping and tissue-specific genes in mammalian genome have recently gained much more interest.15Go–18Go Working on codon usage of tissue-specific genes in human, interestingly, Plotkin et al.19Go reported that there is a significant difference in synonymous codon usage between genes specifically expressed in different human tissues. The results suggest that selective constraint acts on synonymous codon usage to optimize translation by adapting to the pool of tRNAs available in each tissue for tissue-specific genes in human.19Go However, Semon et al.20Go by analyzing 2126 human tissue-specific genes expressed in 18 libraries demonstrated that there is no evidence for tissue-specific adaptation of synonymous codon usage in human.

Conversely, all the previous studies on housekeeping and tissue-specific genes have been done on human genome. Rice which is heterogeneous in base composition similar to human has not been explored till date. Rice–Arabidopsis pair is a well-known model to study codon usage divergence in plants.4Go,21Go Availability of whole genome expression data for both rice and Arabidopsis has made it possible to examine the pattern of evolutionary forces shaping codon usage in housekeeping and tissue-specific genes of these two plants. In the present study, we have traced the pattern of evolutionary forces shaping codon usage in both housekeeping and tissue-specific genes of rice and Arabidopsis and discussed the presence of contrasting selective constraint affecting the evolution of these sets of genes.


    2. Materials and methods
 Top
 Abstract
 1. Introduction
 2. Materials and methods
 3. Results and discussion
 Supplementary Data
 Funding
 Acknowledgements
 References
 
2.1. Sequence data
The genomes of rice and Arabidopsis were downloaded, respectively, from RiceGAAS Rice Genome Automated Annotation System ftp://ftp.dna.affrc.go.jp/pub/RiceGAAS/current/ and Arabidopsis Information Resource (TAIR) http://www.arabidopsis.org/. All sequences having <100 codons were ignored from our data set. Also, genes containing internal stop codons were removed and thus data set comprising a total of 18 658 rice genes was taken for further analysis.

Homologous genes between rice and Arabidopsis genomes were identified using gapped BLASTP searches using cut-off expects of 10.0 x 10–6.22Go Pairs of coding sequences which have at least 30% amino acids positives and overlaps over at least 80% of their length were retained for the analysis. The maximum gap size allowed between a pair of sequence is 5%. Owing to presence of much multi-copy genes both in Arabidopsis and rice, some sequences from one species showed high levels of sequence similarity with more than one sequence from the other species. In those cases, the sequence pairs that produced higher degree of sequence similarity were retained.23Go We also eliminated pseudo genes and mitochondrial protein from the homologous gene set. Finally, our data set consists of 4065 homologous gene pairs (Supplementary Table S1 contains rice–Arabidopsis homologous genes pairs).

2.2. Expression profile
The public domain MPSS (massively parallel signature sequencing) expression data for rice24Go (http://mpss.udel.edu/rice/) and Arabidopsis25Go (http://mpss.udel.edu/at/) present more accurate estimation of gene transcript levels and are easily accessible.25Go The expression level of a gene expressed in a single library is estimated by counting the number of individual 17-base signature sequences representing each gene.26Go It should be noted that current MPSS data set for rice is based on the TIGR rice genome annotation. We retrieved expression level of individual rice genes with RiceGAAS ID using Rice MPSS: Query by Sequence tool that basically extract all possible tags from the sequence and compare them against their database. The expression levels of a gene expressed in different expression libraries were estimated by calculating average expression values in all libraries considered (Supplementary Tables S2 and S3 contain library information). We sorted the expression values in each library in an ascending order, and then divided them into five groups, each containing 20% of the population.26Go Individual genes were assigned an expression rank from 1 (low expression) to 5 (high expression) according to the increase in average expression level.

Tissue specificity of a gene is measured by using tissue specificity index {tau}.27Go,28Go The {tau} of gene i is defined by

Formula
where nH is the number of tissues examined and SH(i, max) is the highest expression of gene i across the nH tissues. The {tau} value ranges from 0 to 1, with higher values indicating higher variations in expressional level across tissues or higher tissue specificities. If a gene has expression in only one tissue, {tau} approaches 1. In contrast, if a gene is equally expressed in all tissues, {tau} = 0.

We assigned housekeeping and tissue-specific genes by sorting our data set (4065 rice–Arabidopsis homologous genes) according to increase in {tau} value and taking out genes from extreme 20% of population from both ends. Using the above criteria, we obtained 787 housekeeping and 770 tissue-specific genes. All our analysis were performed using 787 housekeeping and 770 tissue-specific genes of rice with its corresponding counterpart in Arabidopsis (Supplementary Tables S4 and S5 contain rice–Arabidopsis housekeeping and tissue-specific homologous gene-pairs).

2.3. Sequence analysis
Pair-wise synonymous (Ks) and non-synonymous (Ka) distance between the homologous genes of rice and Arabidopsis was calculated by using the method of Yang and Neilsen.29Go

The genetic robustness at codon level has been measured using CUB available at http://users.ox.ac.uk/~zool0643/codon/CUB.html.30Go According to this method proposed by Archetti, we have measured dissimilarity (DAA/AA*) between original (AA) and mutant amino acid (AA*) for each synonymous codon based on the McLachlan’s matrix of chemical similarity.31Go Dissimilarity of a single amino acid (AA) is given by: DAA/AA*={omega}AA/AA –{omega}AA/AA*, where {omega}AA/AA is the similarity of the amino acid AA to itself and {omega}AA/AA* is the similarity of AA to the mutant amino acid AA* obtained after an error at one of the positions of the original codon. Since {omega}AA/AA>{omega}AA/AA* for every amino acid, DAA/AA* is always positive, and since there are three possible mutants for each position, there are nine possible measures of DAA/AA* for each codon, corresponding to nine possible mutant codons. Their mean value is taken as a measure of distance (dissimilarity) between the original codon and its possible mutants. This mean value of dissimilarity is the measure of mean distance (MD) for each codon to its possible mutants. To calculate the degree of error minimization of a coding sequence, the correlation between the MD values and the corresponding relative synonymous codon usage (RSCU) is calculated for each synonymous family. If N is the number of degenerate synonymous codon families on which the correlation is calculated, and R is the sum of the correlations, the degree of error minimization is measured by RN = R/N (RN ranging between –1 and +1). The RN measures genetic robustness with the assumption that all the amino acids are weighted equally, irrespective of their frequency on the protein. If the value of each correlation is weighted (multiplied) by the frequency of the corresponding amino acid, then the measure is denoted by wRN. Since MD is a measure of dissimilarity, the lower the value of RN and wRN, the higher the degree of error minimization.

The Zipfold program was used to predict free-folding energies for each native mRNA sequence available at http://dinamelt.bioinfo.rpi.edu/zipfold.php.

The transfer RNA gene copy number necessary to determine the major codons32Go for each amino acid in rice were taken from Xiyin et al.33Go and tRNA copy number for Arabidopsis was taken from http://lowelab.ucsc.edu/GtRNAdb/Athal/.

The Student’s t-test was used to evaluate the significance of all the pair-wise differences. The statistical tests were performed using the SPSS (13.0) package.


    3. Results and discussion
 Top
 Abstract
 1. Introduction
 2. Materials and methods
 3. Results and discussion
 Supplementary Data
 Funding
 Acknowledgements
 References
 
3.1. Influence of expression level in modulating synonymous substitution rates for both housekeeping and tissue-specific genes in rice
Analysis of synonymous substitution patterns (Ks) between rice and Arabidopsis homologous genes pairs for both housekeeping and tissue-specific classes reveals that housekeeping genes are under stronger selective constraint as observed from their significantly lower average synonymous substitution rates (Ks = 3.27) (P < 0.001) when compared with tissue-specific genes (Ks = 3.45). Similar trend in evolutionary rates have been observed in earlier studies on mammalian genome.15Go–17Go It has already been demonstrated that housekeeping and tissue-specific genes comprise of both highly and lowly expressed genes.18Go In order to investigate the influence of expression level in modulating synonymous substitution rates of housekeeping and tissue-specific genes in rice, we measured synonymous substitution rates for highly and lowly expressed genes of both housekeeping and tissue-specific classes (Table 1). From the Table 1, it is obvious that synonymous substitution rate of highly expressed housekeeping genes (Ks = 3.12) is significantly (P < 0.001) lower than that of highly expressed tissue-specific genes (Ks = 3.74). In contrast, there is no significant difference in average synonymous substitution rate between lowly expressed housekeeping (Ks = 3.34) and lowly expressed tissue-specific genes (Ks = 3.41) (Table 1). The results imply that in rice genome selective constraint shaping synonymous codon usage of highly expressed genes varies depending on whether they belong to housekeeping or tissue-specific genes. Non-significant difference in average synonymous substitution rate between lowly expressed housekeeping and tissue-specific genes suggest that lowly expressed genes have been conserved during divergence between rice and Arabidopsis. However, while comparing synonymous substitution rates between highly and lowly expressed tissue-specific and housekeeping genes, an unusual trend have been observed. In housekeeping genes (Table 1), we observed significantly lowered synonymous substitution rate in highly expressed genes (Ks = 3.12) (P < 0.05) (number of genes = 209) than lowly expressed genes (Ks = 3.34) (number of genes = 203). Interestingly, in tissue-specific genes of rice (Table 1), the average synonymous substitution rates were significantly lower in lowly expressed genes (Ks = 3.41) (P < 0.005) (number of genes = 512) when compared with highly expressed genes (Ks = 3.74) (number of genes = 99). It has been shown in previous studies that the synonymous substitution rate between Escherichia coli and Salmonella typhimurium is lower in highly than in weakly expressed genes, and it has been suggested that this is due to stronger selection for translational efficiency in highly expressed genes.34Go Recently, Drummond et al.,35Go working on yeast, demonstrated that expression level governs the rate of synonymous substitution and protein sequence evolution. In rice tissue-specific genes, our data suggest that high expression does not necessarily lead to lower synonymous substitution rates when compared with low expression. However, this also prompts us to explore relationship between expression level and translation selection for both housekeeping and tissue-specific genes in plants. Possibly, there may be some other selective force determining the synonymous substitution rate of highly expressed tissue-specific genes in rice.


View this table:
[in this window]
[in a new window]

 
Table 1. Average values of synonymous substitution rates for housekeeping and tissue-specific classes of genes in highly expressed (HEG) and lowly expressed genes (LEG) of rice

 
3.2. Co-adaptation of synonymous codon usage with the tRNA pool of housekeeping and tissue-specific homologous genes in rice and in Arabidopsis
In an attempt to investigate the nature of selective constraint shaping synonymous codon usage of housekeeping and tissue-specific genes, we analyzed preferred codons in both the gene classes of rice (Table 2) and Arabidopsis (Table 3). Preferred codons are those that generally correspond to the most abundant tRNA species and they provide fitness benefits to highly expressed genes by enhancing translational efficiency.36Go The co-adaptation of tRNA content and codon usage for the optimal translation of highly expressed genes is well known in Caenorhabditis elegans.37Go To test translational selection in rice and Arabidopsis genome, we have identified those codons in both housekeeping and tissue-specific gene classes whose RSCU values are significantly higher in highly expressed genes than lowly expressed genes. We then investigated the correspondence between codon preferences in highly expressed genes and tRNA gene copy number in both rice and Arabidopsis. We obtained ten preferred codons in both housekeeping and tissue-specific gene classes (Table 2) in rice. We even considered revised wobble rules for eukaryotic genomes to estimate preferred codons in highly expressed housekeeping and tissue-specific genes.38Go These rules assume that GNN tRNAs pair with both C-ending and U-ending codons, whereas ANN tRNA genes are modified to inosine and decode both U-ending and G-ending codons. Following revised wobble rule, we observed 14 preferred codons in housekeeping rice genes. Similarly, in tissue-specific rice genes, there are 16 preferred codons that correspond to most abundant tRNA copy number. Our result indicates that translational selection driven by tRNA copy number to optimize synonymous codon usage of highly expressed genes equally influences both housekeeping and tissue-specific genes in rice which does not corroborate with unexpected lowering (Table 2) of synonymous substitution rates in lowly expressed tissue-specific genes. Same analysis was performed in Arabidopsis and it has been observed that in housekeeping genes, there are 10 codons that correspond to most abundant tRNA copy number, whereas in tissue-specific genes, there are only five codons that show perfect match with most abundant tRNA copy number (Table 3). However, after following revised wobble rules for eukaryotic genomes,38Go we obtained 17 codons in housekeeping genes that correspond to most abundant tRNA copy number, whereas in tissue-specific class, we observed only eight preferred codons that correspond to most abundant tRNA copy number (Table 3). Therefore, in Arabidopsis translational selection driven by tRNA copy number to optimize synonymous codon usage of highly expressed genes has a greater influence in housekeeping Arabidopsis genes.


View this table:
[in this window]
[in a new window]

 
Table 2. RSCU values of highly expressed (HEG) and lowly expressed (LEG) housekeeping and tissue-specific genes in rice

 


View this table:
[in this window]
[in a new window]

 
Table 3. RSCU values of highly expressed (HEG) and lowly expressed (LEG) housekeeping and tissue-specific genes in Arabidopsis

 
3.3. Selective constraint acting on mRNA secondary structure is responsible for regulating synonymous substitution rates in rice tissue-specific genes
It has already been demonstrated that there is a selection for local RNA secondary structures in coding regions and this nucleic acid structure resembles the folding profiles of the coded proteins.39Go Further, it has been observed in E. coli the decrease of the stability of mRNA structure contributes to the increase of mRNA expression40Go suggesting possible relationships between synonymous codon usage and presence of some constraints upon mRNA secondary structure that subsequently regulate the gene expression levels. A significant increase (P < 0.005) of average mRNA folding energy was observed only in highly expressed tissue-specific genes, whereas there is no significant difference of mRNA folding energy between highly and lowly expressed housekeeping genes in rice. In order to determine whether selection acts on mRNA secondary structure to modulate synonymous substitution rates of tissue-specific genes, we performed correlation analysis between synonymous substitution rates of each gene with its corresponding mRNA folding energy. A significant strong positive correlation (Rs = 0.307, P < 0.001) indicates constraints on mRNA secondary structure influencing synonymous substitution rates in tissue-specific class of genes in rice. Thus, the influence of constraints acting on mRNA secondary structure modulates synonymous substitution rates in rice tissue-specific genes.

3.4. Mutational bias regulates error minimization in both rice and Arabidopsis homologous set
It is clear from our result that selective constraint shaping synonymous codon usage has taken a different turn in both housekeeping and tissue-specific highly expressed genes. Therefore, it is quite interesting to explore evolutionary forces acting on synonymous codon usage to optimize error minimization capacity of highly expressed housekeeping and tissue-specific genes in both the plants. The evolution of genetic code took place in such a way so that it can minimize errors due to mutation and mistranslation. The theory of error minimization for the evolution of genetic codes postulates that the codons are arranged in such a way that reduces errors.41Go,42Go Thus synonymous codons differ in their capacity to minimize the effects of errors due to mutation or mistranslation. In Drosophila melanogaster, the degree of error minimization is correlated with the degree of codon usage bias.43Go Later, it was reported that the codon usage pattern of highly expressed genes in E. coli has been selected in such a way that mistranslation would have the minimum possible effects on the structure and function of the related proteins. Furthermore, according to Najafabadi et al.44Go frequencies of codons in highly expressed genes that correspond to most abundant tRNA copy number may have been under selection pressure for error minimization. For rice genome, we have calculated the error minimization capacity (wRn) of housekeeping and tissue-specific genes. We observed significant lowering of wRn (P < 0.001) for housekeeping genes (wRn = –0.3322) with respect to tissue-specific genes (wRn = –0.2458). This result indicates the presence of stronger selective constraint on codon usage of housekeeping genes to achieve greater degree of error minimization capacity. We compared wRn between highly and lowly expressed genes of housekeeping and tissue-specific categories of rice genome (Table 4). We observed significantly (P < 0.001) greater error minimizing capacity for highly expressed housekeeping genes than lowly expressed housekeeping genes. Surprisingly, in tissue-specific genes, we observed no significant difference of error minimization between highly and lowly expressed genes in rice. Thus, selection on codon usage for error minimization has hardly had any role in distinguishing highly and lowly expressed tissue-specific genes. Our observations for housekeeping genes are in consistent with the previous findings that highly expressed genes are those having a strong preference for codons to minimize the effect of errors by mutation and mistranslation.30Go,44Go–47Go We also performed the same analysis for Arabidopsis genes and observed that highly expressed genes in both housekeeping and tissue-specific categories have significantly (P < 0.001) greater error minimizing capacity than lowly expressed genes (Table 5). Therefore, selection acting on synonymous codon usage to optimize error minimization capacity of highly expressed genes equally influences both housekeeping and tissue-specific homologous genes of Arabidopsis. However, it is noteworthy that there is no significant difference in error minimizing capacity between highly expressed housekeeping and tissue-specific Arabidopsis genes. This discrepancy between translational selection driven by tRNA copy number and genetic robustness in both plants indicate that error minimizing capacity of highly expressed genes does not depend on selection based on tRNA abundance for both rice and Arabidopsis as observed in E. coli.44Go,45Go It is reasonable to assume from our results that frequencies of codons in highly expressed genes that correspond to most abundant tRNA copy number may not be under selection pressure for error minimization.


View this table:
[in this window]
[in a new window]

 
Table 4. Average error minimization values (wRn) of housekeeping and tissue-specific classes of genes in highly expressed (HEG) and lowly expressed genes (LEG) of rice

 


View this table:
[in this window]
[in a new window]

 
Table 5. Average error minimization values (wRn) of housekeeping and tissue-specific classes of genes in highly expressed (HEG) and lowly expressed genes (LEG) of Arabidopsis

 
However, according to Archetti43Go if genetic robustness is correlated with GC composition then mutational bias is a reason behind the observed pattern of error minimization. In order to investigate if observed pattern of error minimization in rice and Arabidopsis is due to mutational bias, we measured GC3 level for both highly and lowly expressed homologous genes of housekeeping and tissue-specific genes in rice and Arabidopsis. A significant difference in average GC3 (P < 0.001) level has been observed between highly and lowly expressed genes of both housekeeping and tissue-specific homologous genes of Arabidopsis (Table 6). Correlation analysis was performed between GC content and error minimization capacity of both housekeeping and tissue-specific genes of Arabidopsis. A significant strong negative correlation has been observed between error minimization capacity and GC content of both housekeeping (Rs = –0.541, P < 0.001) and tissue-specific genes (Rs = –0.499, P < 0.001) in Arabidopsis (Supplementary Tables S6–S9 contain Arabidopsis housekeeping and tissue-specific homologous genes and their corresponding GC3 and error minimization values). However, in rice, there is no significant difference of GC3 between highly and lowly expressed tissue-specific genes (Table 7). Rather, we observed a significant difference in average GC3 level only between highly and lowly expressed housekeeping genes in rice (Table 7). There is a significant (P < 0.001) increment of GC content in highly expressed housekeeping genes of rice genome; consistent with this, we found that synonymous substitution rate of GC-rich rice housekeeping genes (Ks = 2.54) is significantly (P < 0.001) lower than GC-poor housekeeping genes (Ks = 3.63). In addition, it has been further estimated that the synonymous substitution rate (Ks) is negatively correlated (Rs = –0.216, P < 0.01) with GC content at third codon position in housekeeping set of genes in rice. The result suggests that increment of GC in highly expressed housekeeping genes is under selection to optimize synonymous substitution rates.


View this table:
[in this window]
[in a new window]

 
Table 6. Average GC3 values for housekeeping and tissue-specific classes of genes in highly expressed (HEG) and lowly expressed genes (LEG) of Arabidopsis

 


View this table:
[in this window]
[in a new window]

 
Table 7. Average GC3 values for housekeeping and tissue-specific classes of genes in highly expressed (HEG) and lowly expressed genes (LEG) of rice

 
Correlation analysis was again performed between GC content and error minimization capacity of housekeeping genes in rice. A significant strong negative correlation (Rs = –0.606, P < 0.001) has been observed between error minimization capacity and GC content of housekeeping genes in rice. These lead us to conclude that in plants it is the mutational bias that regulates error minimization of highly expressed genes.

3.5. Conclusion
In this work, we studied how selective constraint shape synonymous codon usage of housekeeping and tissue-specific homologous genes in both rice and Arabidopsis. We observed that there is difference in codon usage pattern between housekeeping and tissue-specific genes in both rice and Arabidopsis genes. Although, previous studies on Drosophila and rodents favor selectionist model for error minimization at protein level,30Go we demonstrated that mutational bias is responsible for the observed pattern of error minimization. We argue that error minimization at protein level has taken a different turn after the divergence of plants and animals. Moreover, our results show that housekeeping genes are under stronger selective constraint than that of the tissue-specific genes. Translational selection driven by tRNA copy number is responsible for optimizing codon usage variation in housekeeping genes. On the contrary, in housekeeping genes, selection acting on mRNA secondary structural stability of tissue-specific genes has a greater influence to modulate codon usage variation. Lavner and Kotlar48Go argued that selection may act on codon bias to reduce elongation rate by favoring non-optimal codons in lowly expressed genes. In the present study, influence of mRNA secondary structural stability on codon usage variation of tissue-specific genes might be the consequence of favoring non-optimal codons in lowly expressed tissue-specific genes. Thus, our study unambiguously suggests that two sets of genes in rice and Arabidopsis (housekeeping and tissue specific) have evolved under contrasting evolutionary constraints.


    Supplementary Data
 Top
 Abstract
 1. Introduction
 2. Materials and methods
 3. Results and discussion
 Supplementary Data
 Funding
 Acknowledgements
 References
 
Supplementary data are available online at www.dnaresearch.oxfordjournals.org.


    Funding
 Top
 Abstract
 1. Introduction
 2. Materials and methods
 3. Results and discussion
 Supplementary Data
 Funding
 Acknowledgements
 References
 
Authors are thankful to Department of Biotechnology, Government of India for financial help.


    Acknowledgements
 Top
 Abstract
 1. Introduction
 2. Materials and methods
 3. Results and discussion
 Supplementary Data
 Funding
 Acknowledgements
 References
 
Authors are also thankful to Dr Nakai Kenta and two anonymous reviewers for their fruitful constructive comments in improving the manuscript.


    Footnotes
 
* To whom correspondence should be addressed. Fax. +91 33-2355-3886. E-mail: tapash{at}boseinst.ernet.in

Edited by Kenta Nakai


    References
 Top
 Abstract
 1. Introduction
 2. Materials and methods
 3. Results and discussion
 Supplementary Data
 Funding
 Acknowledgements
 References
 

  1. International Rice Genome Sequencing Project. The map-based sequence of the rice genome. Nature (2005) 436:793–800.[CrossRef][Medline]
  2. The Arabidopsis Genome Initiative. Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature (2000) 408:796–815.[CrossRef][Medline]
  3. Bernardi G. Structural and Evolutionary Genomics: Natural Selection in Genome Evolution (2004) The Netherlands: Elsevier Amsterdam.
  4. Wang H. C., Hickey D. A. Rapid divergence of codon usage patterns within the rice genome. BMC Evol. Biol. (2007) 7:1–10.[CrossRef][Medline]
  5. Montero L. M., Salinas J., Matassi G., Bernardi G. Gene distribution and isochore organization in the nuclear genome of plant. Nucleic Acids Res. (1990) 18:1859–1867.[Abstract/Free Full Text]
  6. Carels N., Bernardi G. Two classes of genes in plants. Genetics (2000) 154:1819–1825.[Abstract/Free Full Text]
  7. Guo X., Bao J., Fan L. Evidence of selectively driven codon usage in rice: implications for GC content evolution of Gramineae genes. FEBS Lett. (2007) 581:1015–1021.[CrossRef][Web of Science][Medline]
  8. Wong G. K., Wang J., Tao L., et al. Compositional gradients in Gramineae genes. Genome Res. (2002) 12:851–856.[Abstract/Free Full Text]
  9. Sharp P. M., Averof M., Lloyd A. T., Matassi G., Peden J. F. DNA sequence evolution: the sounds of silence. Philos. Trans. R. Soc. Lond. Ser. B Biol. Sci. (1995) 349:241–247.[Abstract/Free Full Text]
  10. Ponger L., Duret L., Mouchiroud D. Determinants of CpG islands: expression in early embryo and isochore structure. Genome Res. (2001) 11:1854–1860.[Abstract/Free Full Text]
  11. D’Onofrio G. Expression patterns and gene distribution in the human genome. Gene (2002) 300:155–160.[CrossRef][Web of Science][Medline]
  12. Vinogradov A. E. Isochores and tissue-specificity. Nucleic Acids Res. (2003) 31:5212–5220.[Abstract/Free Full Text]
  13. Arhondakis S., Auletta F., Torelli G., D’Onofrio G. Base composition and expression level of human genes. Gene (2004) 325:165–169.[CrossRef][Web of Science][Medline]
  14. Lercher M. J., Urrutia A. O., Pavlicek A., Hurst L. D. A unification of mosaic structures in the human genome. Hum. Mol. Genet. (2003) 12:2411–2415.[Abstract/Free Full Text]
  15. Duret L., Mouchiroud D. Determinants of substitution rates in mammalian genes: expression pattern affects selection intensity but not mutation rate. Mol. Biol. Evol. (2000) 17:68–74.[Abstract/Free Full Text]
  16. Hastings K. E. Strong evolutionary conservation of broadly expressed protein isoforms in the troponin I gene family and other vertebrate gene families. J. Mol. Evol. (1996) 42:631–640.[CrossRef][Web of Science][Medline]
  17. Hughes A. L., Hughes M. K. Self peptides bound by HLA class I molecules are deprived from highly conserved regions of a set of evolutionary conserved proteins. Immunogenetics (1995) 41:257–262.[Web of Science][Medline]
  18. Zhang L., Li W. H. Mammalian housekeeping genes evolve more slowly than tissue-specific genes. Mol. Biol. Evol. (2004) 21:236–239.[Abstract/Free Full Text]
  19. Plotkin J. B., Robins H., Levine A. J. Tissue-specific codon usage and the expression of human genes. Proc. Natl. Acad. Sci. USA (2004) 101:12588–12591.[Abstract/Free Full Text]
  20. Semon M., Lobry J. R., Duret L. No evidence for tissue-specific adaptation of synonymous codon usage in humans. Mol. Biol. Evol. (2006) 23:523–529.[Abstract/Free Full Text]
  21. Mukhopadhyay P., Basak S., Ghosh T. C. Nature of selective constraints on synonymous codon usage of rice differs in GC-poor and GC-rich genes. Gene (2007) 400:71–81.[CrossRef][Web of Science][Medline]
  22. Altschul S. F., Madden T. L., Schaffer A. A., et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. (1997) 25:3389–3402.[Abstract/Free Full Text]
  23. Banerjee T., Gupta S. K., Ghosh T. C. Compositional transitions between Oryza sativa and Arabidopsis thaliana genes linked to the functional change of encoded proteins. Plant Sci. (2006) 170:267–273.
  24. Nakano M., Nobuta K., Vemaraju K., Tej S. S., Skogen J. W., Meyers B. C. Plant MPSS databases: signature-based transcriptional resources for analyses of mRNA and small RNA. Nucleic Acids Res. (2006) 34:D731–D735.[Abstract/Free Full Text]
  25. Meyers B. C., Tej S. S., Vu T. H., et al. The use of MPSS for whole-genome transcriptional analysis in Arabidopsis. Genome Res. (2004) 14:1641–1653.[Abstract/Free Full Text]
  26. Ren X. -Y., Vorst O., Fiers M. W. E. J., Stiekema W. J., Nap P. In plants, highly expressed genes are the least compact. Trends Genet. (2006) 22:528–532.[CrossRef][Web of Science][Medline]
  27. Liao B. Y., Zhang J. Low rates of expression profile divergence in highly expressed genes and tissue-specific genes during mammalian evolution. Mol. Biol. Evol. (2006) 23:1119–1128.[Abstract/Free Full Text]
  28. Yanai I., Benjamin H., Shmoish M., et al. Genome-wide midrange transcription profiles reveal expression level relationships in human tissue specification. Bioinformatics (2005) 21:650–659.[Abstract/Free Full Text]
  29. Yang Z., Nielsen R. Estimating synonymous and nonsynonymous substitution rates under realistic evolutionary models. Mol. Biol. Evol. (2000) 17:32–43.[Abstract/Free Full Text]
  30. Archetti M. Selection on codon usage for error minimization at the protein level. J. Mol. Evol. (2004) 59:400–415.[CrossRef][Web of Science][Medline]
  31. McLachlan A. D. Tests for comparing related amino-acid sequences Cytochrome c and cytochrome c 551. J. Mol. Biol. (1971) 61:409–424.[CrossRef][Web of Science][Medline]
  32. Kotlar D., Lavner Y. The action of selection on codon bias in the human genome is related to frequency, complexity, and chronology of amino acids. BMC Genom. (2006) 7:67.[CrossRef]
  33. Xiyin W., Xiaoli S., Bailin H. The transfer RNA genes in Oryza sativa L. ssp. Indica. Sciences in China Series C (2002) 45:504–511.[CrossRef]
  34. Berg O. G., Martelius M. Synonymous substitution-rate constants in Escherichia coli and Salmonella typhimurium and their relationship to gene expression and selection pressure. J. Mol. Evol. (1995) 41:449–456.[CrossRef][Web of Science][Medline]
  35. Drummond D. A., Raval A., Wilke C. O. A single determinant dominates the rate of yeast protein evolution. Mol. Biol. Evol. (2006) 23:327–37.[Abstract/Free Full Text]
  36. Ikemura T. Transfer RNA in protein synthesis. Hatfield D. L., Lee B. J., Pirtle R. M., eds. (1992) Boca Raton, FL: CRC. 87–111.
  37. Duret L. tRNA gene number and codon usage in the C. elegans genome are co-adapted for optimal translation of highly expressed genes. Trends Genet. (2000) 16:287–289.[CrossRef][Web of Science][Medline]
  38. Percudani R. Restricted wobble rules for eukaryotic genome. Trends Genet. (2001) 17:133–135.[Web of Science][Medline]
  39. Biro J. C. Indications that "codon boundaries" are physico-chemically defined and that protein-folding information is contained in the redundant exon bases. Theor. Biol. Med. Model (2006) 3:28.[CrossRef][Medline]
  40. Jia M., Li Y. The relationship among gene expression, folding free energy and codon usage bias in Escherichia coli. FEBS Lett. (2005) 579:5333–5337.[CrossRef][Web of Science][Medline]
  41. Woese C. R. On the evolution of the genetic code. Proc. Natl. Acad. Sci. USA (1965) 54:1546–1552.[Free Full Text]
  42. Epstein C. J. Role of the amino-acid ‘code’ and of selection for conformation in the evolution of proteins. Nature (1966) 210:25–28.[CrossRef][Medline]
  43. Archetti M. Genetic robustness and selection at the protein level for synonymous codons. J. Evol. Biol. (2006) 19:353–365.[CrossRef][Web of Science][Medline]
  44. Najafabadi H. S., Goodarzi H., Torabi N. Optimality of codon usage in Escherichia coli due to load minimization. J. Theor. Biol. (2005) 237:203–209.[CrossRef][Web of Science][Medline]
  45. Najafabadi H. S., Lehmann J., Omidi M. Error minimization explains the codon usage of highly expressed genes in Escherichia coli. Gene (2007) 387:150–155.[CrossRef][Web of Science][Medline]
  46. Bulmer M. The selection-mutation-drift theory of synonymous codon usage. Genetics (1991) 129:897–907.[Abstract]
  47. Akashi H. Synonymous codon usage in Drosophila melanogaster: natural selection and translational accuracy. Genetics (1994) 136:927–935.[Abstract]
  48. Lavner Y., Kotlar D. Codon bias as a factor in regulating expression via translation rate in the human genome. Gene (2005) 345:127–138.[CrossRef][Web of Science][Medline]

Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?



This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow Supplementary Data
Right arrowOA All Versions of this Article:
15/6/347    most recent
dsn023v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Request Permissions
Google Scholar
Right arrow Articles by Mukhopadhyay, P.
Right arrow Articles by Ghosh, T. C.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Mukhopadhyay, P.
Right arrow Articles by Ghosh, T. C.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?