Characterization of Mononucleotide Repeats in Sequenced Prokaryotic Genomes
Laboratorium voor Microbiologie, Universiteit Gent K.L. Ledeganckstraat 35, B-9000 Gent, Belgium
Received 17 January 2005; revised 7 July 2005
| Abstract |
|---|
|
|
|---|
The increasing availability of prokaryotic genome sequences has shown that simple sequence repeats (SSRs) are widespread in prokaryotes and that there is extensive variation in their length, number and distribution. Considering their potential importance in generating genomic diversity, we determined the distribution of a specific group of SSRs, mononucleotide repeats of size between 5 and 13 nt, in 157 sequenced prokaryotic genomes. The data obtained in the present study show that (i) a large number of mononucleotide SSRs is present in all prokaryotic genomes investigated, (ii) shorter repeats are much more abundant than longer repeats, and (iii) in the majority of the genomes, longer mononucleotide SSRs are excluded from coding regions although we identified several organisms where mononucleotide SSRs are not excluded from the coding regions. We also observed that some genomes contain more mononucleotide SSRs than expected, while others contain significantly less. Bacterial genomes that contain much less mononucleotide SSRs than expected are generally larger and more GC-rich, while bacterial genomes that contain much more mononucleotide SSRs than expected are in general smaller and more AT-rich. Finally, we also noted that genomes that contain a high fraction of horizontally transferred genes have a lower mononucleotide SSR density and that A and T are generally overrepresented in mononucleotide SSRs.
Key words: mononucleotide repeats; simple sequence repeats; comparative genomics; genome evolution
| 1. Introduction |
|---|
|
|
|---|
Repetitive DNA consists of homopolymeric tracts of single nucleotides or of small or large numbers of multimeric classes of repeats. These can either be homogeneous (i.e. built from identical units) or heterogeneous (i.e. built from mixed units).1
Several functional roles have been proposed for SSRs. First of all, they are thought to be involved in various mechanisms of regulation of gene expression.1
,2
,4
It is thought that variation in SSR numbers can alter the spacing between structurally important domains (like the 35 and 10 promoter regions) and in this way can affect promoter strength.2
Differences in bending of the DNA molecule caused by unusual repeat structures can also affect promoter strength.7
Another example is the blocking of DNA replication elongation by transcription through SSRs.8
Second, the loss or gain of repeats can affect the integrity of ORFs (open reading frames) and can result in phase variation, in which shifting in and out of frame (relative to the start of translation) leads to onoff switching of the associated gene products.9
The observation of a high number of short close repeats in 296 Escherichia coli genes related to repair, recombination and physiological adaptations to different stresses suggests that these repeats may reflect a strategy to cope with stress.10
In addition, rearrangements between tandemly repeated sequences are a major source of genomic change as they can result in deletion or duplication of the DNA flanked by the repeated sequences11
and it has been shown that genomes containing a higher repeat density have higher rates of rearrangements leading to an accelerated loss of gene order.12
Finally, it has been shown that regions bordering SSR loci are more susceptible to mutagenic events.13
It has been known for over 30 years that eukaryotic genomes contain a significant fraction of repeated sequences14
17
and SSRs have also been known to exist in prokaryotes for quite some time.3
,18
21
Early analysis22
showed that nearly all SSRs were considerably overrepresented in genomic sequence fragments compared with that in their randomized counterparts; it also appeared that they were most frequent in eukaryotes and more rare in prokaryotes (probably due to more economical use of DNA, i.e. a higher coding density, in the latter). This was later confirmed by more in-depth analyses.20
,21
,23
The increasing availability of prokaryotic genome sequences has shown that SSRs are also widespread in prokaryotes and that there is extensive variation in their length, number and distribution, in for example Escherichia coli, Shigella sp. and Ralstonia solanacearum.20
,21
,24
26
However, to date there are no studies which have systematically addressed and compared the distribution of SSRs in a large number of bacterial and/or archaeal genomes. Considering the potential importance of SSRs in generating genomic diversity and regulating gene expression, we determined the distribution of a specific group of SSRs, mononucleotide repeats (i.e. sequences consisting of a single repeated nucleotide) of size between 5 and 13 nt, in 157 sequenced prokaryotic genomes.
| 2. Material and Methods |
|---|
|
|
|---|
2.1. DNA sequences
We included the sequences of 157 prokaryotic genomes in the present study. These included 19 archaeal and 138 bacterial genomes (Tables 1 and 2). Most were downloaded from the GenBank database, but we also included a few unpublished genomes that were produced by the Pathogen Sequencing Group at the Sanger Institute (http://www.sanger.ac.uk/Projects/Microbes/) or the University of Oklahoma (http://www.genome.ou.edu/gono.html).
|
|
2.2. Determination of the number of mononucleotide SSRs
We used the software developed by Gur-Arie et al.24
2.3. Calculation of the expected number of mononucleotide SSRs
The expected number of homo-oligomer tracts of t bases in a sequence of length N was calculated as given by De Wachter27
:
(summed over i = 1 4), with pi being the frequency of each base in the sequence. Considering the large number of genomes involved in this study, we did not perform permutation tests (i.e. did not count the number of mononucleotide SSRs in permutated genomes in order to assess statistical significance) but assessed deviations from the expected number of mononucleotide SSRs by representation plots.6
2.4. Statistical analysis
All statistical analyses were carried out using the SPSS 11.0.1. software (SPSS Inc.). Data were compared using one-way analysis of variance (ANOVA) by applying Levene's test of homogeneity of variance. When appropriate for paired samples, t-tests were used. The correlation coefficients calculated included the Pearson correlation coefficient (r) and the non-parametric Spearman's (
) and Kendall's (T) correlation coefficients. Whenever appropriate, we also calculated partial correlation coefficients (i.e. the correlation that remains between two variables after removing the correlation that is due to their mutual association with another variable). We calculated the correlation between the number of mononculeotide SSRs, the mononucleotide SSR density, the ratio (R) between the observed (O) and expected (E) number of mononucleotide repeats and a number of other features of the genomes. These features included genome size, GC content, coding density, fraction of the genome located in duplicated segments and fraction of the genome located in horizontally transferred segments. The genome size, the GC content and the coding density were determined using Artemis 4.0.28
The number of putatively duplicated or horizontally transferred genes for each genome were taken from data published by Gevers et al.29
and Nakamura et al.,30
respectively.
| 3. Results |
|---|
|
|
|---|
3.1. Number and density of mononucleotide SSRs in prokaryotic genomes
An overview of the total number of mononucleotide SSRs observed and the repeat density (expressed as percentage of the genome that is occupied by mononucleotide repeats) for archaeal and bacterial genomes are shown in Tables 1 and 2, respectively. For archaeal genomes, the number of mononucleotide SSRs with repeat size >5 nt ranged from 3797 (Methanopyrus kandleri) to 19 856 (Methanosarcina acetivorans). For bacterial genomes, the number of mononucleotide SSRs with repeat size >5 nt ranged from 2761 (Bifidobaterium longum) to 58 065 (Leptospira interrogans). The mononucleotide SSR density ranged from 0.0114% (Halobacterium sp.) to 0.0741% (Methanococcus maripaludis) for archaea, and from 0.0043% (Rhodopseudomonas palustris) to 0.1241% (Wigglesworthia glossinidia) for bacteria. There were no significant differences between archaea and bacteria with regard to number of SSRs (P = 0.211) or SSR density (P = 0.592).
3.2. Length distribution of mononucleotide SSRs
The distribution of mononucleotide SSRs over different length categories was approximately the same in all prokaryotic genomes, with an almost linear or slightly sigmoid relationship between the logio of the number of repeats in a particular class and the size of the repeat unit of that class (Fig. 1). Deviations from this general model were most obvious for longer repeats, but this is most likely due to the small number of these larger repeats (data not shown). Complete tables with the exact number of observed and expected number of mononucleotide SSRs for each class are available as Supplementary Data (Table S1).
|
3.3. Distribution of mononucleotide SSRs over coding and non-coding regions
No clear general trend can be observed in the distribution of mononucleotide SSRs over coding and non-coding regions in archaeal (Fig. S1, available in Supplementary Data) and bacterial (Fig. 2) genomes. Nevertheless, there appears to be a tendency in the majority of the genomes to exclude longer mononucleotide SSRs from coding regions. Notable exceptions include the
-Proteobacteria (including Helicobacter pylori and Campylobacter jejuni), Haemophilus ducreyi, Neisseria meningitidis and Synechocystis sp., in which mononucleotide SSRs are almost exclusively located in coding regions, irrespective of length. The opposite is true for the genome of the [ß-proteobacterium Chromobacterium violaceum: in this genome mononucleotide SSRs are almost exclusively located in non-coding regions, irrespective of the length of the repeat (Fig. 2)].
|
3.4. Comparison between observed and expected number of mononucleotide repeats
When we compared the total number of mononucleotide repeat observed with the total number expected based on the equation proposed by De Wachter27
|
3.5. Presence of longer mononucleotide SSRs in prokaryotic genomes
Sixty prokaryotic genomes (5 archaeal and 55 bacterial genomes) investigated contained mononucleotide repeats >13 nt (Table S2, available in Supplementary Data). The total number of mononucleotide SSRs >13 nt ranged from 1 (24 genomes) to 24 (Helicobacter hepaticus). The largest mononucleotide SSRs are found in the genome of Thermoanaerobacter tengcongensis, containing a poly(A) mononucleotide SSR of 54 nt and one of 58 nt. Only in 18 genomes, one or more of these longer mononucleotide SSRs were located in coding regions (Table S2). The maximum number of longer mononucleotide SSRs located in coding regions was five (in Helicobacter hepaticus and Helicobacter pylori 26695). The longest mononucleotide SSR located in a coding region is the 22 nt poly(G) repeat in the Pirelulla sp. genome. There was no significant difference in the number of long mononucleotide repeats between archaea and bacteria (P = 0.207).
3.6. Correlation between number of mononucleotide SSRs and other genome parameters
Pearson correlation coefficients between number of mononucleotide SSRs in a genome, repeat density, the ratio observed/expected number of mononucleotide SSRs, genome size and GC content are shown in Table S3 (available in Supplementary Data). Similar trends were observed when we calculated the non-parametric Spearman and Kendall correlations (data not shown). From these data, it appears that the number of mononucleotide SSRs is weakly correlated with genome size and more strongly, but negatively, correlated with GC content. The repeat density and R are both negatively correlated with genome size and GC content. Graphical representations of the relationship between a number of variables investigated are shown in Fig. 4. However, genome size and GC content are correlated, as larger genomes tend to be more GC-rich and smaller genomes tend to be more AT-rich (see also Table S3).31
For that reason, we calculated partial correlations between a number of variables while removing the influence caused by the relation between genome size and GC content. We found that the number of mononucleotide SSRs observed correlated well with genome size (r2 = 0.6741, P < 0.001) and GC content (r2 = 0.7301, P < 0.001). The repeat density correlated very well with the GC content (r2 = 0.7712, P < 0.001), while the relation between repeat density and genome size was less pronounced (r2 = 0.2260, P = 0.005). The correlation between R and GC content was also high (r2 = 0.5761, P < 0.001), but no significant correlation was detected between R and genome size following the correction for the confounding effect of GC content. These data show that (i) larger genomes contain more mononucleotide SSRs than smaller genomes with the same GC content, (ii) larger genomes have a lower mononucleotide SSR density than smaller genomes with the same GC content, (iii) genomes with a high GC content contain less mononucleotide SSRs (and thus have a lower mononucleotide SSR density) than similar-sized genomes with a low GC content, and (iv) genomes with high GC content have a lower R value (i.e. contain less mononucleotide SSRs than expected) than similar-sized genomes with a low GC content. Also, the higher the mononucleotide SSR density, the larger the R value, meaning the larger the deviation from expected total number of mononucleotide SSRs.
|
We also investigated the correlation between the observed and the expected mononucleotide SSR density and the GC content. As can be seen from Fig. S2 (available in Supplementary Data), there is a strong relationship between the GC content and the expected repeat density. This can be explained by the fact that, as the GC content shifts away from 50%, the four-letter nucleotide alphabet changes towards a two-letter alphabet, thereby increasing the probability of repeated sequences.32
In addition, we calculated correlations between number and density of mononucleotide SSRs and R, and the fraction of the genome located in duplicated segments and in horizontally transferred segments. Following correction for genome size (as the fraction of duplicated genes in a genome correlates positively with genome size29
), no correlation between number and density of mononucleotide SSRs and R, and the fraction of duplicated genes in the genome could be observed. However, we found a significant negative relationship between the fraction of the genome localized in putatively horizontally transferred genes and SSR density (r2 = 0.592, P < 0.001). This correlation remained highly significant after correcting for the confounding effect of genome size (larger genomes tend to contain a higher fraction of putatively horizontally transferred genes30
) (r2 = 0.4130, P < 0.001) (Fig. 5).
|
3.7. Correlation between the mean GC content of mononucleotide SSRs, the mean GC content of the genome and repeat density
The relationship between genomic GC content and the mean GC content of mononucleotide SSRs in a given genome is shown in Fig. S3 (available in Supplementary Data). This relationship is best described by a least-squares fit with the following equation: GCSSR = 2E07 (GCgenome)4.6365 (r2 = 0.899). Our data clearly show that there is a tremendous overrepresentation of A and T in mononucleotide SSRs in AT-rich genomes (i.e. genomes in which the GC content is <50 mol%), while G and C are only overrepresented in a few GC-rich genomes (GC content >65 mol%). The relationship between repeat density and the mean GC content of mononucleotide SSRs in a given genome is also shown in Fig. S3. This relationship is best described by a least-squares fit with the following equation: GCSSR = 4890.7 (repeat density)1.5515 (r2 = 0.695). This indicates that the GC content of mononucleotide SSRs is highest when the repeat density is lowest.
| 4. Discussion |
|---|
|
|
|---|
In the present study, we investigated the presence of mononucleotide SSRs in sequenced prokaryotic genomes. The data obtained in the present study show that (i) a large number of mononucleotide SSRs is present in all prokaryotic genomes investigated (belonging both to the bacteria and to the archaea), (ii) shorter repeats are much more abundant than longer repeats, and (iii) in the majority of the genomes, longer mononucleotide SSRs are excluded from coding regions (although there are several exceptions to this general rule). Bacterial genomes that contain much less (i.e. R < 0.5) mononucleotide SSRs than expected are generally larger (mean genome size = 5 505 212 bp) and more GC-rich (mean GC content = 66.4 mol%), while bacterial genomes that contain much more (i.e. R > 2.0) mononucleotide SSRs than expected are in general smaller (average genome size = 2 254 190 bp) and more AT-rich (average GC content = 39.3 mol%) (the average genome size for all bacterial genomes = 3 336 435 bp, average GC-content = 46.6%). We also noted that genomes that contain a high fraction of horizontally transferred genes, in general have a lower mononucleotide SSR density.
It is now generally accepted that SSRs, through a variety of molecular processes, are involved in generating genomic and phenotypic diversity in eukaryotes as well as in prokaryotes. Our analyses performed on a large number of prokaryotic genomes clearly indicate that, due to the presence of this large number of mononucleotide SSRs, prokaryotes have an enormous potential for generating this genomic and phenotypic diversity.
Longer mononucleotide SSRs have more opportunity to undergo slipped-strand mispairing and there will be more mutability in their length than in shorter mononucleotide SSRs. This could help to explain why these are overrepresented in non-coding regions of the genome as selection has ample opportunity to operate against these larger repeats that would cause frameshift and non-sense mutations in coding regions. The observation that in some genomes (including the genomes of the
-Proteobacteria Campylobacter jejuni, Helicobacter pylori, Helicobacter hepaticus, Wolinella succinogenes, and those of Haemophilus ducreyi, Neisseria meningitidis and Synechocystis sp.) larger mononucleotide SSRs are not (or to a lesser extent) excluded from coding regions, suggest they may play an important role in phase variation33
as this process has been observed in these organisms.9
,34
36
Interestingly, we also observed that the repeat density was significantly higher (one-way ANOVA, P < 0.001) in organisms with an intracellular or strictly parasitic lifestyle (including Brucella spp., Treponema spp., Chlamydia spp., Mycoplasma spp., Bartonella spp., Mesoplasma florum, Phytoplasma asteris, Coxiella burnettii, Wigglesworthia glossinidia, Buchnera aphidicola and Blochmannia floridanus) (see Table 2). This seems to be contradicting the consistent genome reduction observed in most of these organisms (e.g. there are no repeated sequences >200 bp in members of the genus Buchnera37
). For at least some of these organisms (e.g. Buchnera species), this probably can be attributed to the loss of genes involved in DNA repair functions,38
,39
as it has been shown that the loss of DNA polymerase proofreading activity or post-replication mismatch repair can greatly enhance the rate of SSR tract alterations.3
,40
,41
However, other intracellular bacteria do contain proofreading activity, so most probably other (as yet unidentified) factors are involved as well.
Besides their likely involvement in phase variation, variation in mononucleotide SSR length at or near regulatory elements can have a significant influence on gene expression by affecting the binding of regulatory proteins, the distance between regulatory elements, the bendability, coiling, packaging and phasing of the DNA, or the formation of unusual DNA structures.24
,42
44
In contrast to phase variation, these processes can affect gene expression in a more subtle, quantitative way.42
,45
Computer simulations performed by King and co-workers46
,47
have shown that the mutational properties of SSRs could be exploited by evolving populations. The mutation rate in populations which are under selection pressure (e.g. imposed by gradually changing environmental conditions) can increase as an indirect consequence of selection. When mutations are suitably constrained, frequent mutations allow the population to adapt efficiently. In this light, it has been hypothesized that SSRs have become frequent because selection has favoured their intrinsic instability, which allows them to work as adjustable tuning knobs that facilitate the process of evolutionary change.46
The large differences between expected and observed number of mononucleotide SSRs could be attributable to the presence of alternative mechanisms for generating diversity in genomes with a low number of mononucleotide SSRs. The observation that there is a significant negative correlation between the fraction of the genome localized in putatively horizontally transferred genes and mononucleotide SSR density fits into this hypothesis as well.
We would like to point out that the number of putatively horizontally transferred genes for each genome were taken from data published by Nakamura et al.30
In this study, putatively horizontally transferred genes were identified by a Bayesian method based on differences in nucleotide composition. While this method has several advantages over more direct phylogenetic methods, a disadvantage is that it may preferentially detect recent horizontally transferred genes, while anciently transferred genes may be indistinguishable because the nucleotide composition of the transferred gene is converging with that of the recipient genome by mutational pressure (a process called amelioration48
). It will be interesting to see whether the intriguing relationship between the fraction of the genome localized in putatively horizontally transferred genes and SSR density holds up if more ancient horizontal gene transfer events are taken into account.
The overrepresentation of A and T in mononucleotide SSRs in the majority of prokaryotic genomes might be explained by the fact that slipped-strand mispairing is more likely for poly(A) or poly(T) repeats, as strand separation for these polynucleotides is energetically favourable compared with strand separation of poly(C) or poly(G) repeats. It is thus easier to generate variability with poly(A) and poly(T) mononucleotide SSRs than with poly(C) or poly(G) repeats. This is confirmed by the observation that the loss of DNA polymerase proofreading activity or post-replication mismatch repair specifically enhances the rate of poly (A) tract alterations.49
However, our data also clearly indicate that the GC content of mononucleotide SSRs is highest when the repeat density is lowest. This suggest that there could be other reasons for the tremendous overrepresentation of poly(A) and poly(T) mononucleotide SSRs. It has been suggested that the higher energy cost of G and C over A and T/U could be the reason for the high variation seen in genomic G+C content.31
,50
Indeed, the synthesis of GTP requires an additional NAD compared with AMP, while the synthesis of CTP from UTP requires an additional ATP molecule.50
In addition, due to its central role in metabolism, ATP is abundantly present in the cell. The observation that more GC-rich mononucleotide SSRs are predominantly found in genomes with a low mononucleotide SSR density suggest that the same underlying reason might be responsible for the marked differences observed in G+C content of these mononucleotide SSRs, as it would be too costly to have many poly(G) and/or poly(C) SSRs in genomes with a high density of mononucleotide SSRs. This hypothesis is corroborated by the observation that the G+C content of mononucleotide SSRs of intracellular bacteria is extremely low (i.e. the mononucleotide SSRs of these organisms arejust like the rest of the genome50
more AT-rich as a result of selection by competition for scarce resources). Another possible reason is the involvement of repeated sequences in the formation of non-canonical DNA structures, including triple-stranded H-DNA. These structures are more easily formed in GA-rich regions51
,52
and could block transcription by RNA polymerase.53
As such, the presence of many G (or A) mononucleotide SSRs in bacteria with a high GC content would result in a selective disadvantage.
| Supplementary Data |
|---|
|
|
|---|
Supplementary Data are available online at http://users.ugent.be/~tcoenye/index_files/Page444.htm and at http://dnaresearch.oxfordjournals.org or can be requested from Tom Coenye.
| Acknowledgements |
|---|
|
|
|---|
T.C. and P.V. are indebted to the Fund for Scientific ResearchFlanders (Belgium) for the position as postdoctoral fellow and research grants, respectively. We thank the anonymous reviewers for helpful comments that drastically improved our manuscript.
| Footnotes |
|---|
*To whom correspondence should be addressed. Tel. +32-9-2645128, Fax. +32-9-2645092, Email: Tom.Coenye{at}UGent.be
Communicated by Naotaka Ogasawara
| References |
|---|
|
|
|---|
- van Belkum, A., Scherer, S., van Alphen, L., Verbrugh, H. 1998, Short-sequence DNA repeats in prokaryotic genomes, Microbiol. Mol. Biol. Rev., 62, 275293.
[Abstract/Free Full Text] - Yeramian, E. and Buc, H. 1999, Tandem repeats in complete bacterial genomes sequences. Sequence and structural analyses for comparative studies, Res. Microbiol., 150, 745754.[Medline]
- Levinson, G. and Gutman, G. A. 1987, Slipped-strand mispairing: a major mechanism for DNA sequence evolution, Mol. Biol. Evol., 4, 203221.[Abstract]
- van Belkum, A., van Leeuwen, W., Scherer, S., Verbrugh, H. 1999, Occurrence and structurefunction relationship of pentameric short sequence repeats in microbial genomes, Res. Microbiol., 150, 617626.[Medline]
- Zhu, Y., Strassmann, J. E., Queller, D. C. 2000, Insertions, deletions, and the origin of microsatellites, Genet. Res., 76, 227236.[CrossRef][ISI][Medline]
- Dieringer, D. and Schlotterer, C. 2003, Two distinct modes of microsatellite mutation processes: Evidence from the complete genomic sequences of nine species, Genome Res., 13, 22422251.
[Abstract/Free Full Text] - Perez-Martin, J., Rojo, F., de Lorenzo, V. 1994, Promoters responsive to DNA bending: a common theme in prokaryotic gene expression, Microbiol. Rev., 58, 268290.
[Abstract/Free Full Text] - Krasilnikova, M. M., Samadashwily, G. M., Krasilnikov, A. S., Mirkin, S. M. 1998, Transcription through a simple DNA repeat blocks replication elongation, EMBO J., 17, 50955102.[CrossRef][ISI][Medline]
- Henderson, I. R., Owen, P., Nataro, J. P. 1999, Molecular switchesthe ON and OFF of bacterial phase variation, Mol. Microbiol., 33, 919932.[CrossRef][ISI][Medline]
- Rocha, E. P. C., Matic, I., Taddei, F. 2002, Over-representation of repeats in stress response genes:a strategy to increase versatility under stressful conditions?, Nucleic Acids Res., 30, 18861894.
[Abstract/Free Full Text] - Bzymek, M. and Lovett, S. T. 2001, Instability of repetitive DNA sequences: The role of replication in multiple mechanisms, Proc. Natl Acad. Sci. USA, 98, 83198325.
[Abstract/Free Full Text] - Rocha, E. P. C. 2003, An appraisal of the potential for illegitimate recombination in bacterial genomes and its consequences: from duplications to genome reduction, Genome Res., 13, 11231132.
[Abstract/Free Full Text] - Orti, G., Pearse, D. E., Avise, J. C. 1997, Phylogenetic assignment of length variation at a microsatellite locus, Proc. Natl Acad. Sci. USA, 94, 1074510749.
[Abstract/Free Full Text] - Sivolap, Y. M. and Bonner, J. 1971, Association of chromosomal RNA with repetitive DNA, Proc. Natl. Acad. Sci. USA, 68, 387389.
[Abstract/Free Full Text] - Locker, J., Rabinowitz, M., Getz, G. S. 1974, Tandem inverted repeats in mitochondrial DNA of petite mutants of Saccharomyces cerevisiae, Proc. Natl. Acad. Sci. USA, 71, 13661370.
[Abstract/Free Full Text] - Hamada, H., Petrino, M. G., Kakunaga, T. 1982, A novel repeated element with Z-DNA-forming potential is widely found in evolutionarily diverse eukaryotic genomes, Proc. Natl Acad. Sci. USA, 79, 64656469.
[Abstract/Free Full Text] - Katti, M. V., Ranjekar, P. K., Gupta, V. S. 2001, Differential distribution of simple sequence repeats in eukaryotic genome sequences, Mol. Biol. Evol., 18, 11611167.
[Abstract/Free Full Text] - Tautz, D. 1989, Hypervariability of simple sequences as a general source for polymorphic DNA markers, Nucleic Acids Res., 17, 64636471.
[Abstract/Free Full Text] - Tautz, D. and Schlotterer, C. 1994, Simple sequences, Curr. Opin. Genet. Dev., 4, 832837.[CrossRef][Medline]
- Cox, R. and Mirkin, S. 1997, Characteristic enrichment of DNA repeats in different genomes, Proc. Natl Acad. Sci. USA, 94, 52375242.
[Abstract/Free Full Text] - Field, D. and Wills, C. 1998, Abundant microsatellite polymorphisms in Saccharomyces cerevisiae, and the different distributions of microsatellites in eight prokaryotes and S. cerevisiae, result from strong mutation pressures and a variety of selective forces, Proc. Natl Acad. Sci. USA, 95, 16471652.
[Abstract/Free Full Text] - Tautz, D., Trick, M., Dover, G. A. 1986, Cryptic simplicity is a major source of genetic variation, Nature, 322, 652656.[CrossRef][Medline]
- Dechering, K. J., Cuelenaere, K., Konings, R. N. H., Leunissen, J. A. M. 1998, Distinct frequency-distributions of homopolymeric DNA tracts in different genomes, Nucleic Acids Res., 26, 40564062.
[Abstract/Free Full Text] - Gur-Arie, R., Cohen, C. J., Eitan, Y., Shelef, L., Hallerman, E. M., Kashi, Y. 2000, Simple sequence repeats in Escherichia coli: abundance, distribution, composition, and polymorphism, Genome Res., 10, 6271.
[Abstract/Free Full Text] - Coenye, T. and Vandamme, P. 2003, Simple sequence repeats and compositional bias in the bipartite Ralstonia solanacearum GMI1000 genome, BMC Genomics, 4, 10.[CrossRef][Medline]
- Yang, J., Wang, J., Chen, L., Yu, J., Dong, J., Yao, Z., Shen, Y., Jin, Q., Chen, R. 2003, Identification and characterisation of simple sequence repeats in the genomes of Shigella species, Gene, 322, 8592.[CrossRef][ISI][Medline]
- De Wachter, R. 1981, The number of repeats expected in random nucleic acid sequences and found in genes, J. Theor. Biol., 91, 7198.[CrossRef][ISI][Medline]
- Rutherford, K., Parkhill, J., Crook, J., Horsnell, T., Rice, P., Rajandream, M. A., Barell, B. 2000, Artemis Sequence visualisation and annotation, Bioinformatics, 16, 944945.
[Abstract/Free Full Text] - Gevers, D., Vandepoele, K., Simillon, C., Van de Peer, Y. 2004, Gene duplication and biased functional retention of paralogs in bacterial genomes, Trends Microbiol., 12, 148154.[CrossRef][ISI][Medline]
- Nakamura, Y., Itoh, T., Matsuda, H., Gojobori, T. 2004, Biased biological functions of horizontally transferred genes in prokaryotic genomes, Nature Genet., 36, 760766.[CrossRef][ISI][Medline]
- Bentley, S. D. and Parkhill, J. 2004, Comparative genomic structure of prokaryotes, Annu. Rev. Genet., 38, 771791.[CrossRef][ISI][Medline]
- Hallin, P. P., Coenye, T., Staerfeldt, H. H., Binnewies, T. T., Jarmer, H., Ussery, D. W. 2004, Genome update: Correlation of bacterial genomic properties, Microbiology, 150, 38993903.
[Free Full Text] - van der Woude, M. W. and Baumler, A. J. 2004, Phase and antigenic variation in bacteria, Clin. Microbiol. Rev., 17, 581611.
[Abstract/Free Full Text] - Saunders, N. J., Peden, J. F., Hood, D. W., Moxon, E. R. 1998, Simple sequence repeats in the Helicobacter pylori genome, Mol. Microbiol., 27, 10911098.[CrossRef][ISI][Medline]
- Salaun, L., Linz, B., Suerbaum, S., Saunders, N. J. 2004, The diversity within an expanded and redefined repertoire of phase-variable genes in Helicobacter pylori, Microbiology, 150, 817830.
[Abstract/Free Full Text] - Linton, D., Karlyshev, A. V., Wren, B. W. 2001, Deciphering Campylobacter jejuni cell surface interactions from the genome sequence, Curr. Opin. Microbiol., 4, 3540.[CrossRef][ISI][Medline]
- Frank, A. C., Amiri, H., Adersson, S. G. 2002, Genome deterioration: loss of repeated sequences and accumulation of junk DNA, Genetica, 115, 112.[CrossRef][ISI][Medline]
- Dybvig, K. and Voelker, L. L. 1996, Molecular biology of mycoplasmas, Annu. Rev. Microbiol., 50, 2557.[CrossRef][ISI][Medline]
- Wernegreen, J. J. 2002, Genome evolution in bacterial endosymbionts of insects, Nature Rev. Gen., 3, 850861.[CrossRef][ISI][Medline]
- Wells, R. D. 1996, Molecular basis of genetic instability of triplet repeats, J. Biol. Chem., 271, 28752878.
[Free Full Text] - Iran, H. T., Keen, J. D., Kricker, M., Resnick, M. A., Gordenin, D. A. 1997, Hypermutability of homonucleotide runs in mismatch repair and DNA polymerase proofreading yeast mutants, Mol. Cell. Biol., 17, 28592865.[Abstract]
- Kashi, Y., King, D., Soller, M. I. 1997, Simple sequence repeats as a source of quantitative genetic variation, Trends Genet., 13, 7478.[CrossRef][ISI][Medline]
- Pedersen, A. G., Jensen, L. J., Brunak, S., Staerfeldt, H. H., Ussery, D. W. 2000, A DNA structural atlas for Escherichia coli, J. Mol. Biol., 299, 907930.[CrossRef][ISI][Medline]
- Ussery, D. W., Soumpasis, D. M., Brunak, S., Staerfeldt, H. H., Worning, P., Krogh, A. 2002, Bias of purine stretches in sequenced chromosomes, Comp. Chem., 26, 531541.
- Pennisi, E. 1998, How the genome readies itself for evolution, Science, 281, 11311134.
[Free Full Text] - King, D. G., Soller, M., Kashi, Y. 1997, Evolutionary tuning knob, Endeavour, 21, 3640.
- King, D. G. 2000, Indirect selection on the mutational landscape: An evolved role for the mutability of repetitive DNA?, Evolution 2000, Indiana University Conferences, Bloomington, IN.
- Lawrence, J. G. and Ochman, H. 1997, Amelioration of bacterial genomes: Rates of change and exchange, J. Mol. Evol., 44, 383397.[CrossRef][ISI][Medline]
- Sia, E. A., Jinks-Robertson, S., Petes, T. D. 1997, Genetic control of microsatellite stability, Mutat.Res., 383, 6170.[ISI][Medline]
- Rocha, E. P. and Danchin, A. 2002, Base composition bias might result from competition for metabolic resources, Trends Genet., 18, 291294.[CrossRef][ISI][Medline]
- Schroth, G. P. and Ho, P. S. 1995, Occurrence of potential cruciform and H-DNA forming sequences in genomic DNA, Nucleic Acids Res., 23, 19771983.
[Abstract/Free Full Text] - Hoyne, P. R., Edwards, L. M., Viari, A., Maher, L. J. 2000, Searching genomes for sequences with the potential to form intrastrand triple helices, J. Mol. Biol., 302, 797809.[CrossRef][ISI][Medline]
- Ashely, C. and Lee, J. S. 2000, A triplex-mediated knot between separated polypurine-polypyrimidine tracts in circular DNA blocks transcription by Escherichia coli RNA polymerase, DNA Cell Biol., 19, 235341.[CrossRef][ISI][Medline]
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||




