DNA Research Advance Access originally published online on October 27, 2009
DNA Research 2009 16(6):325-343; doi:10.1093/dnares/dsp021
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
A Study in Entire Chromosomes of Violations of the Intra-strand Parity of Complementary Nucleotides (Chargaff's Second Parity Rule)


1 Department of Mathematical Sciences, Tezpur University, Tezpur, Assam 784 028, India
2 Department of Computer Science and Engineering, Tezpur University, Tezpur, Assam 784 028, India
3 Department of Molecular Biology and Biotechnology, Tezpur University, Tezpur, Assam 784 028, India
Received 11 June 2009; accepted 24 September 2009.
| Abstract |
|---|
|
|
|---|
Chargaff's rule of intra-strand parity (ISP) between complementary mono/oligonucleotides in chromosomes is well established in the scientific literature. Although a large numbers of papers have been published citing works and discussions on ISP in the genomic era, scientists are yet to find all the factors responsible for such a universal phenomenon in the chromosomes. In the present work, we have tried to address the issue from a new perspective, which is a parallel feature to ISP. The compositional abundance values of mono/oligonucleotides were determined in all non-overlapping sub-chromosomal regions of specific size. Also the frequency distributions of the mono/oligonucleotides among the regions were compared using the Kolmogorov–Smirnov test. Interestingly, the frequency distributions between the complementary mono/oligonucleotides revealed statistical similarity, which we named as intra-strand frequency distribution parity (ISFDP). ISFDP was observed as a general feature in chromosomes of bacteria, archaea and eukaryotes. Violation of ISFDP was also observed in several chromosomes. Chromosomes of different strains belonging a species in bacteria/archaea (Haemophilus influenza, Xylella fastidiosa etc.) and chromosomes of a eukaryote are found to be different among each other with respect to ISFDP violation. ISFDP correlates weakly with ISP in chromosomes suggesting that the latter one is not entirely responsible for the former. Asymmetry of replication topography and composition of forward-encoded sequences between the strands in chromosomes are found to be insufficient to explain the ISFDP feature in all chromosomes. This suggests that multiple factors in chromosomes are responsible for establishing ISFDP.
Key words: chromosome; nucleotide composition; Chargaff's second parity rule; intra-strand frequency distribution parity; DNA replication
| Introduction |
|---|
|
|
|---|
Chargaff's first parity rule based on the nucleotide composition of double-stranded DNA states that the complementary nucleotides have the same abundance values.1
Theoretically, under no strand bias in terms of mutation and selection, the base complementary relationship easily explains the presence of ISP in chromosomes.14
,15
However, several evidences now prove that both the strands are not identical in terms of mutation/selection.16
This results into violation of ISP in sub-chromosomal regions. Longer the sub-chromosomal region, smaller is the violation of ISP observed.17
The mechanisms that are responsible to cause violation are defined under three categories.18
First, DNA replication: leading strand (LeS) is found to be composed of more K nucleotides (G and T) than the complementary M (A and C) nucleotides and the reverse holds true for the lagging strand (LaS).19
This is due to the fact that the LeS which functions as the template for Okazaki fragment synthesis (functions as template for LaS) remains exposed more as single-stranded than the LaS (functions as template for LeS) during replication that results into higher deamination of the cytosine residues20
,21
in LeS (cytosine gets deaminated 140 times faster in ssDNA than in dsDNA22
). In addition, the influence of Okazaki fragments and the sliding DNA clamp proteins associated with the synthesis of LaS create functional asymmetry of the mismatch repairing system on DNA.23
Second, transcription: genes are preferentially located in the LeS than in the LaS to avoid head on collision between the machineries of replication and transcription.24
During transcription, the non-template strand remains more exposed as single-stranded than the template strand, which causes asymmetry in cytosine deamination between the strands.22
The transcription-coupled repair system also acts only upon the template strand and thereby contributes to the strand asymmetry.25
Third, translation: uses of synonymous codons are influenced by differential abundance of tRNA molecules which results into the differential abundance of complementary nucleotides at the third position of family box codons. This causes parity violation.14
In spite of these factors favoring violations of the parity in chromosomes, ISP is observed in an entire chromosome due to the cancellation effect of the local violations in opposite directions.14
Evolutionary biologists are more interested to understand the role of mutation and/or selection in the violation of ISP by analyzing the weakly selected or selectively neutral regions (third position of family box codons and non-coding regions) in chromosomes.14
,26
Whether any specific feature(s) is/are associated with chromosomes exhibiting ISP is yet to be understood. Shioiri and Takahata27
studied ISP by finding out the total AT skew (ATS) and GC skew (GCS) in the chromosomes of several bacteria. In their study, out of 36 bacterial chromosomes, Xylella fastidiosa exhibited maximum ATS and GCS. They observed variable ATS/GCS among chromosomes of different strains of a species as well as chromosomes within a bacterial cell. They also observed ATS and GCS may be different from each other within a chromosome. Since, they did not do any statistical analysis of the skew, the significance of the variability observed among chromosomes was not discussed by them. The usual statistical tool used to find out ISP in chromosomes is a correlation analysis of oligonucleotides abundance described by Prabhu.6
The ISP study between the complimentary mononucleotides is important because it has been proven that oligonucleotide parity and mononucleotide parity are independent.8
Baisnée et al.8
studied parity in chromosomes by measuring the S1 index which is defined as the sum of the absolute values of the differences between complementary oligonucleotides (n mer) frequencies (n varies from 1 to 9 mer). Both these methods do not measure the statistical significance of differences between the abundance values of a mono/oligonucleotide and its reverse complement. For example, if a chromosome carries significant similarity between the abundance values of A and T but carries significant difference between the abundance values of G and C, this will not be identified separately. Similarly, the above methods are unable to find out parity violations in chromosomes with respect to the abundance values of an oligonucleotide and its reverse complement. We have developed a methodology here that can independently study ISP between S nucleotides (any oligonucleotide and its reverse complement) as well as between W nucleotides using the abundance values of mononucleotides. We use the well-known Kolmogorov–Smirnov (KS) test to study the frequency distribution of the compositional abundance values of the mononucleotides in a chromosome sequence, which gives the statistical significance of the similarity between the distributions of complementary nucleotides. This we called as intra-strand frequency distribution parity (ISFDP), which has been used here to study the chromosomes of bacteria, archaea and eukaryotes.
| Materials and methods |
|---|
|
|
|---|
Frequency distribution calculation
Chromosome sequences of different bacteria, archaea and eukaryotes (Tables 1
Angular replication asymmetry of the chromosomes was calculated with the help of the information on ori (origin) and ter (termination) cited in the websites (http://www.cbs.dtu.dk/services/GenomeAtlas/suppl/origin/ and http://pbil.univ-lyon1.fr/software/Oriloc/oriloc.html). The chromosomal region starting from ori to ter was considered as the leading region in the Watson strand (Ws) and the remaining portion of the chromosome as the lagging region. For a circular chromosome, the angular replication asymmetry was calculated as the amount of angular distance of leading region deviating from 180°.
Proportionate distribution of forward- and reverse-encoded sequences in a DNA strand
From the DDBJ site, only coding sequences were downloaded. A continuous stretch of the nucleotide sequence was made from all the sequences by removing the gene names. This resembled a DNA strand only composed of forward-encoded sequences. Frequency distribution analysis was done on this. In another approach, 50% of the above strand was made reverse complement by in silico followed by joining with the rest. This resembled a DNA strand composed of 50% forward-encoded and 50% reverse-encoded sequences. Frequency-distribution study was carried out as described above.
Identification of leading and LaS region
ATS and GCS analyses of the chromosome sequences were done as described earlier.21
This was used to find out the tentative leading and lagging portions in a DNA strand.
Relative proportion of coding sequence distribution
This was found out by deducting ORF numbers between Ws (top strand) and Crick strand (Cs: bottom strand) followed by dividing that with the total number of ORFs. Gene orientation information was obtained from the website (http://cmr.jcvi.org/tigr-scripts/CMR/CmrHomePage.cgi).
| Results |
|---|
|
|
|---|
ISFDP in chromosomes of bacteria
In this study, a total of 112 bacterial chromosomes were considered, which includes different lineages of bacteria such as protobacteria, cyanobacteria, firmicutes, actinobacteria etc. Samples from each group were taken randomly. The bacteria included in the sample comprised a GC% variation from a minimum of 28% to a maximum of 75% and chromosome size variation from 580 kb to a maximum of 9105 kb. We have studied the frequency distributions of the abundance values of mononucleotides in the uniform sub-chromosomal length of 1000 nucleotides. A collective analysis of the nucleotide abundance values from all the segments of a chromosome was done by frequency distribution smooth curves using Microsoft Excel, and the similarity of the distributions of two complementary nucleotides was tested using the KS test (XL-Stat; http://www.xlstat.com/en/download). Figure 1A(i), B(i), C(i), D(i) and E(i) represents the smooth curves of frequency distributions of nucleotides in chromosomes Campylobacter jejuni RM1221 (30.31%), Escherichia coli K12 MG1655 (50.79%), Xanthomonas campestris pv. campestris (Xcc; 65.07%), X. fastidiosa 9a5c (52.68%) and X. fastidiosa Temecula (51.78%). Smooth curves of complementary nucleotides overlap with each other in the first three chromosomes, whereas those of non-complementary ones do not. In the fourth chromosome, none of the curves overlap with each other. In E. coli chromosome [Fig. 1B(i)], all the four smooth frequency curves are close to each other due to the closeness of the abundance values of the nucleotides, whereas in the graphs of C. jejuni and Xcc, the smooth frequency curves of W (A and T) and S (G and C) nucleotides are distinctly separated as GC% the chromosome are toward both extremes. The distribution was studied by the KS test and the results of the four chromosomes are shown in Fig. 1A(ii, iii), B(ii, iii), C(ii, iii), D(ii, iii) and E(ii, iii). The graphs generated by the KS test suggest the complete overlapping between the complementary nucleotides in the chromosomes except the one of X. fastidiosa strain, which is in concordant with the smooth frequency curve. The distributional similarity between the complementary nucleotides is called as ISFDP. A total of 112 bacterial, 49 archaea and 18 eukaryotic chromosomes (Tables 1
|
|
|
|
Out of 112 bacterial chromosomes, 60 chromosomes exhibited ISFDP, 16 chromosomes exhibited violation between S nucleotides as well as between W nucleotides, 30 chromosomes exhibited violation only between S nucleotides and 7 chromosomes exhibited violation only between W nucleotides (Table 4). Chromosomes of Alkaliphilus oremlandii OhILAs (36.26%), Agrobacterium tumefaciens C58 (circular; 59.38%), Mycobacterium ulcerans Agy99 (65.47%), Staphylococcus epidermidis ATCC 12228 (32.1%) and X. fastidiosa 9a5c (52.68%) exhibited strong violations between S nucleotides as well as between W nucleotides. Chromosomes of the three Bacillus anthracis (35.35%) strains, Lactobacillus reuteri F275 (38.87%), Magnetococcus sp. MC-1 (54.17%), Mycobacterium leprae TN (57.8%), Rhizobium leguminosarum bv. viciae 3841 (61.09%) and Rickettsia bellii RML369-C (31.65%) exhibited strong violation between S nucleotides as well as weak violation between W nucleotides. Chromosomes of Coxiella burnetii Dugway 7E9-12 (42.44%) and Staphylococcus haemolyticus JCSC1435 (32.79%) exhibited weak violation between S nucleotides as well as between W nucleotides. Chromosome of Vibrio cholerae O395 (47.78%) exhibited strong violation of ISFDP only between W nucleotides. Similarly, there are six chromosomes where weak violations only between W nucleotides were observed. Chromosomes of Bacillus thuringiensis serovar konkukian 97-27 (34.41%), Bordetella parapertussis 12822 (68.1%), Bordetella pertussis Tohama 1 (67.72%), Haemophilus influenzae PittGG (38.01%), Helicobacter hepaticus ATCC 51449 (35.93%), Lactobacillus acidophilus NCFM (34.72%), Lactobacillus brevis ATCC 367 (46.22%), Nitrobacter winogradskyi Nb-255 (62.05%), Ralstonia solanacearum GMI1000 chromosome (67.04%), Rhizobium etli CFN 42 (61.27%), Thermotoga maritima MSB8 (46.25%) and Thermotoga petrophila RKU-1 (46.09%) exhibited strong violation only between S nucleotides. Similarly there are 17 chromosomes exhibited weak violation only between S nucleotides. An interesting finding that came from this study is that violations of ISFDP within a chromosome with respect to S and W nucleotides may not be of similar magnitudes. This study suggests that although ISFDP is commonly observed among chromosomes, its violation is not as rare as described earlier.13
|
Usually, different strains within a species are found to be similar with respect to ISFDP such as the eight E. coli strains were observed to exhibit ISFDP between S nucleotides as well as between W nucleotides, the three B. anthracis strains are found to be similar in terms of their ISFDP violation (strong violation of ISFDP between S nucleotides as well as weak violations of ISFDP between W nucleotides). However, variation among the strains of a bacterial species with respect to ISFDP was observed as follows: out of the two strains of C. burnetii, Dugway 7E9-12 strain violated ISFDP, whereas RSA 493 strain exhibited ISFDP. Out of the four H. influenza strains, 86-028NP and PittEE exhibited violation of ISFDP, whereas PittGG and Rd KW20 exhibited strong and weak violations only between S nucleotides, respectively. Xylella fastidiosa 9a5c exhibited strong violation of ISFDP, whereas X. fastidiosa Temecula 1 exhibited weak violation of ISFDP only between S nucleotides. These are called as intra-species ISFDP violations. Chromosomes of four species of Mycobacterium genus exhibited a large difference among each other with respect to ISFDP. Chromosome of Mycobacterium sp. KMS (68.44%) exhibited parity between S nucleotides as well as between W nucleotides, whereas chromosome of M. ulcerans Agy99 (65.47%) exhibited strong violation of the parity between S nucleotides as well as between W nucleotides.
ISFDP in chromosomes of archaea and eukaryotes
Out of the 49 archaea chromosomes, 30 exhibited ISFDP, 6 exhibited violations of it between S nucleotides as well as between W nucleotides, 6 exhibited violations only between S nucleotides and 7 exhibited violations only between W nucleotides (Table 4). Chromosomes of Methanobrevibacter smithii ATCC 35061 (31.02%) and Staphylothermus marinus F1 (35.71%) exhibited strong violation of ISFDP between S nucleotides as well as between W nucleotides. Chromosomes of Metallosphaera sedula DSM 5348 (46.21%) and Pyrobaculum aerophilum IM2 (51.34%) exhibited strong violations between S nucleotides but weak violations between W nucleotides. Strong violation between W nucleotides and weak violation between S nucleotides were observed in chromosomes of Methanosarcina barkeri Fusaro (39.27%) and Nitrosopumilus maritimus SCM1 (31.15%). This suggests that within a chromosome, the magnitude of parity violation between S nucleotides may be different from that between W nucleotides in archaea also like that of bacteria. Intra-species parity violation was also observed in archaea in the case of Methanococcus maripaludis. The C5 strain exhibited ISFDP violation between W nucleotides but exhibited parity between S nucleotides. The C6, C7 and S2 strains exhibited ISFDP between S nucleotides as well as between W nucleotides.
Out of the 18 eukaryotic chromosomes belonging to five species, 15 chromosomes exhibited ISFDP (Table 4). Strong violation of ISFDP only between S nucleotides is observed in Leishmania major Friedlin chromosome 01 (62.84%). Plasmodium falciparum 3D7chromosome 05 exhibited weak violation of parity between S nucleotides as well as between W nucleotides, whereas chromosome 01 exhibited violation of parity only between W nucleotides. The other four chromosomes of P. falciparum exhibited parity between S nucleotides as well as between W nucleotides. Similarly, the eight chromosomes of Saccharomyces cerevisiae even though exhibited parity between S nucleotides as well as between W nucleotides, the P-values either for S nucleotides or for W nucleotides is of more than 10-fold difference among the chromosomes. This differential ISFDP violation observed among chromosomes of an organism suggests that there may not be any strict rule inside a cell to maintain ISFDP.
ISFDP between complementary oligonucleotides in chromosomes
ISP between compositional abundance values of complimentary oligonucleotides is well reported. We studied here the frequency distribution of complementary di- and trinucleotides in chromosomes as described for mononucleotides. The smooth curves of oligonucleotide frequencies have been shown in Supplementary data. In Supplementary Fig. S1a and b, the frequency distributions of dinucleotides have been shown for E. coli K12 MG1655 and Pseudomonas entomophila L48 chromosome (64.16%). Out of the 12 smooth frequency curves (four palindromic dinucleotides were excluded), overlapping of the curves between complementary dinucleotides is observed. In Fig. 2, though the abundance values of aa, tt, tg and ca dinucleotides in E. coli chromosome are close, the distributions between the complementary dinucleotides are found only overlapping and that of the non-complementary ones are different. The distributions for aa and tt follow a higher standard deviation (values not shown) than that of tg and ca. Similarly, gg and cc dinucleotides distributions exhibit a higher standard deviation (values not shown) than that of the dinucleotides tc and ga, although the abundance values of the four dinucleotides are close to each other. The significance of the similarity was studied by the KS test which suggested that the frequency distributions between complementary dinucleotides are statistically similar. Apart from this, dinucleotides distribution parity has been studied in three more bacterial chromosomes, two archaea chromosomes and one eukaryotic chromosome (data not shown) and similar result has been observed. In Supplementary Fig. S2i and ii, the distribution of 22 trinucleotides of E. coli K12 MG1655 chromosome is shown. Like dinucleotides, overlapping between the distributions of complementary trinucleotides is also observed. Distribution similarity between complementary trinucleotides was studied by the KS test for the 64 trinucleotides which suggested that the distributions of complementary trinucleotides within a strand are similar. The same study was done in one more bacterial chromosome (data not shown) and similar results were obtained. Although we did not analyzed the chromosomes of archaea and eukaryotes for trinucleotide distribution parity, it is expected to be there because these chromosomes had exhibited ISFDP for mononucleotides as well as dinucleotides.
|
ISFDP weakly correlates with Chargaff's second parity
Comparison of ISFDP was done with the ATS/GCS in chromosomes to find out whether one can define the other. GCS was compared with ISFDP violation between S nucleotides and ATS was compared with ISFDP violation between W nucleotides. Among the bacterial chromosomes, maximum GCS was found in X. fastidiosa 9a5c with the value 0.0529. All of the 16 chromosomes with GCS
0.01 were found to violate ISFDP (14 strongly violated and 2 weakly violated). Out of the 18 chromosomes with GCS
0.005 but <0.01, 6 exhibited insignificant violation, 7 exhibited strong violation and 5 exhibited weak violation of ISFDP. Similarly, out of 56 chromosomes with GCS
0.001 but <0.005, 5 exhibited strong violation, 11 exhibited weak violation and 40 exhibited insignificant violation. Out of the 22 chromosomes with GCS <0.001, except B. thuringiensis Al Hakam chromosome (with GCS value 0.00081 exhibited weak violation of ISFDP) all other exhibited insignificant violation. Maximum ATS was found in X. fastidiosa 9a5c with the value 0.04727. Out of the five chromosomes with ATS
0.01, four were found to violate ISFDP (two strongly violated and two weakly violated), whereas Mycoplasma hyopneumoniae J exhibited insignificant violation (with ATS 0.0102). Out of the 14 chromosomes with ATS
0.005 but <0.01, 6 exhibited insignificant violation, 3 exhibited strong violation and 5 exhibited weak violation of ISFDP. Out of the 67 chromosomes with ATS
0.001 but <0.005, 57 exhibited parity, 1 strongly violated and 9 violated weakly between the W nucleotides. All the 26 chromosomes with ATS
0.001 exhibited insignificant violation of ISFDP. These results suggest that chromosomes with high ATS/GCS (
0.01) have a stronger propensity to violate ISFDP and chromosomes with low ATS/GCS (
0.001) have a stronger propensity to exhibit ISFDP. However, chromosomes with intermediate ATS/GCS (
0.001 and
0.01) have the possibility of either exhibiting parity or violating the parity. Correlation analysis was done between the P-values (from the KS test between) of W nucleotides and ATS as well as between the P-values (from the KS test between) of S nucleotides and GCS. The r-values are –0.5572 and –0.4526 for W and S nucleotides, respectively. This suggests that the correlation between the two ISP features is weak. The correlation between ATS and GCS is 0.629, which suggests that parity violation between S nucleotides weakly correlates with parity violation between W nucleotides within a chromosome. Unlike ATS and GCS correlation, no correlation was found between the P-values (the KS test) of W nucleotides and that of S nucleotides, which supports that ISFDP and Chargaff's second parity are not the same.
In the case of the archaea chromosomes, the ISFDP analysis revealed similar results to that of bacterial chromosomes. Maximum GCS with the value 0.03768 was found in the chromosome of M. smithii ATCC 35061 (31.02%) followed by the value 0.02726 in S. marinus F1 (35.71%), in which significant ISFDP violation was also observed. In the GCS interval 0.005 < GCS
0.01, there were eight chromosomes out of which five exhibited weak violation and three exhibited insignificant violation of ISFDP. Out of the 24 chromosomes in the interval 0.001 < GCS
0.005, 2 exhibited weak violation and 22 exhibited insignificant violation of ISFDP. These results suggest that chromosomes with high ATS/GCS (
0.01) are most likely going to violate ISFDP and chromosomes with low ATS/GCS (
0.001) are most likely going to exhibit ISFDP. However, chromosomes with intermediate ATS/GCS ((
0.001 and
0.01) have the possibility of either exhibiting parity or violating the parity. Pearson's correlation coefficient between ATS and GCS was found to be 0.707847, which is similar to that of the bacterial analysis. The r-values between ATS and the P-values of KS (W) as well as GCS and the P-values of KS (S) were found to be –0.57495 and –0.47557, respectively, suggesting a weak correlation.
The chromosomes with asymmetric replication topography are more prone to ISFDP violation in bacteria
Bacterial chromosome is a single replicon. Owing to the bidirectional mode of replication, one part of a strand is synthesized as LeS whereas the other part is synthesized as LaS. In most of the chromosomes, the mutational strand asymmetry causes K nucleotides > M nucleotides in LeS and the reverse in (K nucleotides < M nucleotides) in LaS. In an ideal case where the termination site is located symmetrically with respect to the origin of replication in a chromosome, the excess of K nucleotides in LeS will be similar to the excess of M nucleotides in LaS and therefore will cancel each other to exhibit Chargaff's second parity in chromosomes. Potential replication origin and termination sites for different chromosomes based on ATS, GCS, coding sequence skew, nucleotide skew at the third position of codons and oligonucleotides skew in chromosomes have been reported,31
,32
which has been reviewed in detail.33
Out of the 112 bacterial chromosomes analyzed in this study, information regarding the potential site for the origin and termination of 56 chromosomes is available. ISFDP violation between S nucleotides was compared with the angular deviation of termination site because G > C in LeS is a more universal feature of chromosomes than T > A in LeS. Of the 112 chromosomes, maximum angular deviation of 71.28° is reported in B. pertussis Tohama 1. Out of the 14 chromosomes where
20° angular deviation was observed, 12 exhibited violation of ISFDP between S nucleotides. Pseudomonas putida F1 (61.86%) with 36.8° and C. burnetii RSA 493 (42.66%) with 31.14° angular deviations exhibited insignificant parity violation. Out of the 11 chromosomes with deviation
10° but <20°, 4 chromosomes exhibited ISFDP violation between S nucleotides. Out of the 30 strains with deviation
1.0° and
10°, 9 chromosomes exhibited parity violation between S nucleotides. Chlamydophila abortus S263 with angular deviation only 0.569°, parity violation was observed only between S nucleotides. This study indicates that chromosomes with higher asymmetric topography are more prone to violate the parity. However, chromosomes with symmetric replication topography were also observed to violate the parity.
The correlation coefficient between angular deviations and GCS as well as ATS values are 0.474 and 0.357, respectively, suggesting a weak correlation. The correlation between angular deviations and P-value of S (the KS test between S nucleotides) as well as that of W (the KS test between W nucleotides) are –0.259 and –0.048, respectively. The angular deviation in X. fastidiosa 9a5c is 62.96°, whereas the same in Temecula 1 is 6.44°. The difference in the magnitude of ISFDP violation between the strains might be attributed to the chromosome topography. Comparison for the four H. influenzae strains could not be done due to the unavailability of information for all the strains. The Rd KW20 chromosome (that violated ISFDP) has the angular deviation 46°, which might be an important factor to violate ISFDP. Archaea chromosomes have been reported to have more replication origin like eukaryotic chromosomes. Therefore, replication topography will not be applicable to study ISFDP violations in these cases.
Composition of forward- and reverse-encoded sequences within DNA strands might influence the parity
Most of the regions in prokaryotic chromosomes are composed of coding sequences. Presence of both forward- and reverse-encoded sequences in bacterial chromosomes has been proposed for the observation of Chargaff's second parity in chromosomes.8
,9
So we analyzed only coding sequences in chromosomes of bacteria and archaea to study ISFDP as follows (Fig. 2): in one way (Case I), a DNA strand is only composed of only forward-encoded sequences, and in the other way (Case II), a DNA strand is composed of 50% forward-encoded and 50% reverse-encoded sequences. The result is shown for E. coli chromosome (Fig. 3A and B). The smooth frequency curves of complementary nucleotides overlap in Fig. 3B, whereas in Fig. 3A, they do not overlap. The significance of these overlaps were studied by the KS test which suggests that the similarity between the distribution of complementary nucleotides in Case II. Similar results were obtained by the analysis of several bacterial (10) and archaea (15) chromosomes.
|
A comparative analysis between the Ws and Cs in a chromosome with respect to their composition of forward-encoded sequences was done in X. fastidiosa species as well as in H. influenza species. The relative differences in the compositional abundance values of forward sequences in Ws and Cs of X. fastidiosa 9a5c and X. fastidiosa Temecula 1 chromosomes are 0.078 and 0.015, respectively, which indicate that the proportion of forward- and reverse-encoded sequence in 9a5c strain is more disproportionate than that of Temecula 1 strain, which might be the reason for a stronger parity violation in the former. The relative differences of the compositional abundance values of forward-encoded sequences in Ws and Cs of H. influenzae 86-028NP (exhibits parity) and H. influenzae Rd KW20 (violates parity) chromosomes are 0.030 and 0.005, respectively, which suggest that the proportion of forward- and reverse-encoded sequences in 86-028NP strain is more disproportionate than that of Rd KW20 strain. This is in contrast to the result of X. fastidiosa, i.e. parity violation is observed in the strain (Rd KW20) with more proportionate gene distribution between Ws and Cs, whereas insignificant parity violation is observed in chromosome with disproportionate gene distribution between the strands. A quantitative estimation of the coding sequences in both the strands of the chromosomes was done in few other bacteria and archaea such as A. tumefaciens, B. subtilis, E. coli, M. smithii and S. marinus (Fig. 4). Maximum difference of ORF numbers between Ws and Cs was found in S. marinus, in which the parity violation was also observed. However, the relative difference of ORFs between the strands is found more in B. subtilis than in M. smithii. The former exhibited the parity whereas the latter violated it. Agrobacterium tumefaciens was shown to possess minimum relative difference of ORF numbers between the strands but violates parity. The results from this indicate that a higher disproportionate composition of forward- and reverse-encoded sequences within a strand has greater propensity to parity violation. However, proportionate composition of the sequences not necessarily implies the exhibition of parity.
|
| Discussion |
|---|
|
|
|---|
We have described in this study a new ISP feature in chromosomes, which is found in bacteria, archaea and eukaryotes. The methodology used to study this parity gives the statistical significance of similarity between the two distributions of complementary nucleotides/oligonucleotides. The basic qualitative feature of ISFDP is not changing for a chromosome even the segmentation is done at randomly taking any point out of the first 1000 nucleotides as the starting point. In other words, the sampling fluctuation is not affecting the feature. The correlation between the ISFDP and ISP is not strong, which is in accordance with the view that similarity in the total abundance values of two complementary nucleotides will not always yield similarity in their frequency distribution pattern. However, violation of ISP will definitely exhibit violation of ISFDP. Around 50% of the chromosomes in bacteria are found to exhibit ISFDP violations. Chromosomes of H. influenzae Rd KW20, M. tuberculosis F11, etc., which have been reported to exhibit ISP, are found to violate ISFDP.27
ISFDP violation observed in all possible combinations in chromosomes: (i) violation of parity between S nucleotides as well as between W nucleotides; (ii) only between S nucleotides and only between W nucleotides. The correlation between ATS and GCS is found to be not strong suggesting that parity violation between S nucleotides not necessarily always associate with parity violations between W nucleotides and vice versa. This can be called as intra-chromosomal parity violations. ISFDP violations of different magnitudes were found among chromosomes of different strains belonging to a species which can be referred as intra-species parity violations. Examples are C. burnetii, H. influenzae and X. fastidiosa. These intra-chromosomal and intra-species violations suggest that there may not be any strict rule existing in cells to maintain ISFDP in chromosomes. Differential ISP among chromosomes within a species and between chromosomes within a bacterium has already been reported in Chlamydophila pneumoniae strains and Deinococcus radiodurans R1 chromosomes,27
respectively. However, these were not considered significant in their study due to the lack of statistical proof. Oligonucleotide skew patterns also have been found to be variable among strains of Yersinia pestis. These intra-species variations in the chromosomal features are interesting and need in-depth analysis of the genome sequences to find out the reason that might reveal the reason for ISP/ISFDP violation in chromosomes and between the two ISP features.
Enrichment of LeS with K nucleotides over M nucleotides and the vice versa in LaS due to the mutational strand asymmetry is a general observation in chromosomes. Owing to the bidirectional replication, GCS/ATS in LeS is cancelled with GCS/ATS in LaS which results in the establishment of parity in chromosomes. The cancellation effect indirectly suggests that the compositional abundance values between the two complementary nucleotides even though they differ within a sub-chromosomal region. This is in support of the observation here that chromosomes with higher GCS/ATS values are violating ISFDP and chromosomes with lower GCS/ATS are exhibiting the parity. However, the chromosomes with intermediate range GCS/ATS are found to exhibit parity as well as violate parity and this violation is independent of genome GC%. For example, Streptococcus mutans UA159, Rickettsia conorii Malish 7, C. jejuni subsp. jejuni 81116, Campylobacter concisus 13826 and Lactococcus lactis subsp. cremoris MG1363, Helicobacter pylori J99 are (all AT-rich organisms) chromosomes with GCS
0.005 that exhibit ISFDP between S nucleotides, whereas chromosomes of B. anthracis strains (AT rich) with similar GCS (>0.005) violate ISFDP between S nucleotides. So ISFDP in these chromosomes is an interesting aspect of future research.
In concordant with the view of the bidirectional replication and establishment of parity in chromosomes, several chromosomes with higher asymmetric replication topography were found to violate ISFDP. The exceptions are P. putida F1 and C. burnetii RSA 493 chromosomes with 36° and 31° angular deviations, respectively. Chromosomes of C. abortus S263 and Magnetospirillum magneticum AMB-1, with very less angular deviations 0.57° and 2.14°, respectively, are found violating ISFDP. This indicates that features apart from the replication topography might contribute to the parity establishment in chromosomes. Proportionate composition of forward-encoded sequences between the two strands though thought to be responsible to establish the parity after the analysis of artificially constructed chromosomes, several observations went against it. The extreme case is A. tumefaciens where the composition is very much proportionate but violations of ISFDP are strong. So the two factors such as asymmetric replication topography and disproportionate composition of forward-encoded sequences between the strands in chromosomes that were assumed to play important roles in determining ISFDP violations were found to be insufficient.
In spite of different selection/mutation pressures on chromosomes as exemplified by codon usage,34
replication topography,31
isochores35
and GCS/ATS,21
the tendency of the chromosomes of all types toward maintaining the ISFDP is interesting. Since ISFDP and ISP are the outcomes of compositional abundance of nucleotides (mono/oligo), theories proposed for ISP might hold true for ISFDP. The Nussinov–Forsdyke hypothesis is that stem–loop potential has an adaptive advantage, and therefore an important factor driving the compositional symmetry (ISP) between the complementary oligonucleotides36
,37
has been challenged recently by Chen and Zhao38
for human chromosomes. This indicates that the stem–loop (recombination) hypothesis might not be the only explanation for ISP in chromosomes. Baisnée et al.8
have argued that the reverse complement symmetry does not result only from point mutation or from recombination, but from a combination effect of different mechanisms at different orders.8
Two independent reports have theoretically shown that multiple inversion events in chromosomes can establish ISP.10
,39
Though this hypothesis looks fine theoretically, frequent inversion unable to explain the universal observation of opposite GCS/ATS in LeS and LaS,40
gene distribution asymmetry between the strands41
and the maintenance of gene orders among different bacterial chromosomes.42
This hypothesis also does not describe any functional significance/advantage of the ISP/ISFDP feature, which is so wide spread in chromosomes. Theoretically, it has also been argued that the mismatch error repairing system is responsible to establish Chargaff's second parity rule in chromosomes.13
However, the intra-chromosomal parity violation observed in eukaryotes (this study) goes against this hypothesis.
We think the important factor that determines ISP/ISFDP in chromosomes is the bidirectional replication. This causes one part of a strand Ws/Cs as LeS and the other part as LaS. The strand mutational asymmetry and gene distribution asymmetry between LeS and LaS therefore cancel out each other within the strand to exhibit the parity. In the case of ssDNA/ssRNA viruses, gene distribution is restricted to one strand only depending on which these are called as either plus or – strand viruses. The genome size is also not large (<10 kb) in these phages43
,44
and during replication, one strand only acts as the template on which the other strand is made. Most likely these features are responsible for violating the parity in these genomes. The advantages of bidirectional replication in bacteria and archaea where the nucleus is absent are as follows: (i) quicker completion of replication than the unidirectional mode of replication and (ii) the meeting of the two replication forks might be sending some signal to the cell for the completion of chromosome replication where the nucleus is absent. Symmetric replication topography will help to terminate the replication from the origin in a lesser time in comparison with an asymmetric topography. So the selection pressure to maintain the symmetric replication topography in fast-growing bacteria is likely to be more than that in slow-growing bacteria. This proposition has similarity with the Selection Mutation Drift theory proposed for codon usage45
in bacteria. Our study of ISFDP of Vibrio species (the generation time is 0.2–0.3 h; fast-growing) in this context seems to be also not holding true here because its chromosomes violate ISFDP between W nucleotides. Moreover, comparison of generation time40
with asymmetry in replication topography of chromosomes32
exhibits no correlation (data not shown). More research on this aspect will give a conclusive result if the growth rate has any relation with parity establishment in chromosomes. In conclusion, our study has revealed an interesting aspect of ISP. Future research will reveal the reason for the presence of this parity in chromosomes.
| Supplementary data |
|---|
|
|
|---|
Supplementary data are available at www.dnaresearch.oxfordjournals.org.
| Acknowledgements |
|---|
|
|
|---|
A part of this work has been presented as a poster by S.K.R. in the International Conference ISMB2008 at Toronoto, Canada. Support to S.K.R. from DST (India), INSA (India) and Tezpur University for attending this conference is thankfully acknowledged. We thank the department of Biotechnology, Govt. of India for awarding MSc student fellowships to A.K. and P.K.J. The authors thank to J. R. Lobry for his critical comments on the work and are very much grateful to D. R. Forsdyke (the reviewer of the manuscript) for his comments on the manuscript.
| Footnotes |
|---|
* To whom correspondence should be addressed. Tel. +91 3712-267007/008/009. Fax. +91 3712-267005/6. E-mail: suven{at}tezu.ernet.in
Present address: Molecular Biophysics Unit, Indian Institute of Science, Bangalore 560012, India. ![]()
Present address: MS-04/603, Kendriya Vihar, Sector 56, Gurgaon, Haryana 122011, India. ![]()
| References |
|---|
|
|
|---|
- Chargaff E. Structure and function of nucleic acids as cell constituents. Fed. Proc. (1951) 10:654–9.[Web of Science][Medline]
- Forsdyke D.R., Mortimer J.R. Chargaff's legacy. Gene (2000) 261:127–37.[CrossRef][Web of Science][Medline]
- Watson J.D., Crick F.H.C. Molecular structure of nucleic acids: a structure for deoxyribonucleic acid. Nature (1953) 171:737–8.[CrossRef][Medline]
- Karkas J.D., Rudner R., Chargaff E. Separation of B. subtilis DNA into complementary strands, II. Template functions and composition as determined by transcription with RNA polymerases. Proc. Natl Acad. Sci. USA (1968) 60:915–20.
[Free Full Text] - Rudner R., Karkas J.D., Chargaff E. Separation of microbial deoxyribonucleic acids into complementary strands. Proc. Natl Acad. Sci. USA (1969) 63:152–9.
[Abstract/Free Full Text] - Prabhu V.V. Symmetry observed in long nucleotide sequences. Nucleic Acid Res. (1993) 21:2797–800.
[Free Full Text] - Qi D., Cuticchia A.J. Compositional symmetries in complete genomes. Bioinformatics (2001) 17:557–9.
[Free Full Text] - Baisnée P.F., Hampson S., Baldi P. Why are complementary DNA strands symmetric? Bioinformatics (2002) 18:1021–33.
[Abstract/Free Full Text] - Verma S.K., Das D., Satapathy S.S., Buragohain A.K., Ray S.K. Compositional Symmetry of DNA duplex in bacterial genomes. Curr. Sci. (2005) 89:374–84.[Web of Science]
- Albrecht-Buehler G. Asymptotically increasing compliance of genomes with Chargaff's second parity rules through inversions and inverted transpositions. Proc. Natl Acad. Sci. USA (2006) 103:17828–33.
[Abstract/Free Full Text] - Mitchell D., Bridge R. A test of Chargaff's second rule. Biochem. Biophys. Res. Commun. (2006) 340:1–5.[CrossRef][Web of Science][Medline]
- Nikolaou C., Almirantis Y. Deviation from Chargaff's second parity rule in organellar DNA insights into the evolution of organellar genomes. Gene (2006) 381:34–41.[CrossRef][Web of Science][Medline]
- Deng B. Mismatch repair error implies Chargaff's second parity rule, arXiv:0704.2191v2 [q-bio.GN], 1–9. (2007) [http://arxiv.org/PS_cache/arxiv/pdf/0704/0704.2191v2.pdf].
- Sueoka N. Intra-strand parity rules of DNA base composition and usage biases of synonymous codons. J. Mol. Evol. (1995) 40:318–25.[CrossRef][Web of Science][Medline]
- Lobry J.R. Properties of a general model of DNA evolution under no-strand-bias conditions. J. Mol. Evol. (1995) 40:326–30.[CrossRef][Web of Science][Medline]
- Frank A.C., Lobry J.R. Asymmetric substitution patterns: a review of possible underlying mutational or selective mechanisms. Gene (1999) 238:65–77.[CrossRef][Web of Science][Medline]
- Nikolaou C., Almirantis Y. A study on the correlation of nucleotide skews and the positioning of the origin of replication: different modes of replication in bacterial species. Nucleic Acid Res. (2005) 33:6816–22.
[Abstract/Free Full Text] - Sueoka N. Translation-coupled violation of parity rule 2 in human genes is not the cause of heterogeneity of the DNA G+C content of third codon position. Gene (1999) 238:53–8.[CrossRef][Web of Science][Medline]
- Rocha E.P.C., Danchin A., Viari A. Universal replication biases in bacteria. Mol. Microbiol. (1999) 32:11–6.[CrossRef][Web of Science][Medline]
- Lobry J.R., Sueoka N. Asymmetric directional mutation pressures in bacteria. Genome Biol. (2002) 3. research0058.1–0058.14.
- Grigoriev A. Analyzing genomes with cumulative skew diagrams. Nucleic Acid Res. (1998) 26:2286–90.
[Abstract/Free Full Text] - Francino M.P., Ochman H. Strand asymmetries in DNA evolution. Trends Genet. (1997) 13:240–5.[CrossRef][Web of Science][Medline]
- Johnson A., O'Donnell M. Cellular DNA replicases: components and dynamics at the replication fork. Annu. Rev. Biochem. (2005) 74:283–314.[CrossRef][Web of Science][Medline]
- Bell S.J., Forsdyke D.R. Accounting units in DNA. J. Theor. Biol. (1999) 197:51–61.[CrossRef][Web of Science][Medline]
- Francino M., Chao L., Riley M.A., Ochman H. Asymmetries generated by transcription-coupled repair in enterobacterial genes. Science (1996) 272:107–9.[Abstract]
- McLean M.J., Wolfe K.H., Devine K.M. Base composition skews, replication, and orientation in 12 prokaryote genomes. J. Mol. Evol. (1998) 47:691–6.[CrossRef][Web of Science][Medline]
- Shioiri C., Takahata N. Skew of mononucleotide frequencies, relative abundance of dinucleotides and DNA strand asymmetry. J. Mol. Evol. (2001) 53:364–76.[CrossRef][Web of Science][Medline]
- Nikiforov A.M. Algorithm AS 288: Exact two-sample Smirnov test for arbitrary distributions. Appl. Stat. (1994) 43:265–84.[CrossRef]
- Kolmogorov A. Confidence limits for an unknown distribution function. Ann. Math. Stat. (1941) 12:461–3.[CrossRef]
- Smirnov N.V. On the estimation of the discrepancy between empirical curves of distribution for two independent samples. Bull. Moscow Univ. (1939) 2:3–14.
- Frank A.C., Lobry J.R. Oriloc: prediction of replication boundaries in annotated bacterial chromosomes. Bioinformatics (2000) 16:560–1.
[Abstract/Free Full Text] - Worning P., Jensen L.J., Hallin P.F., Staerfeldt H-H., Ussery D.W. Origin of replication in circular prokaryotic chromosomes. Environ. Microbiol. (2006) 8:353–61.[CrossRef][Medline]
- Sernova N.V., Gelfand M.S. Identification of replication origins in prokaryotic genomes. Brief. Bioinform. (2008) 9:376–91.
[Abstract/Free Full Text] - Sharp P.M., Bailes E., Grocock R.J., Peden J.F., Sockett R.E. Variation in the strength of selected codon usage bias among bacteria. Nucleic Acid Res. (2005) 33:1141–53.
[Abstract/Free Full Text] - Duret L., Eyre-Walker A., Galtier N. A new perspective on isochore evolution. Gene (2006) 385:71–4.[CrossRef][Web of Science][Medline]
- Nussinov R. Strong doublet preferences in nucleotide sequences and DNA geometry. J. Mol. Evol. (1984) 20:111–9.[CrossRef][Web of Science][Medline]
- Forsdyke D.R. A stem–loop kissing model for the initiation of recombination and the origin of introns. Mol. Biol. Evol. (1995) 12:949–58.[Abstract]
- Chen L., Zhao H. Negative correlation between compositional symmetries and local recombination rates. Bioinformatics (2005) 21:3951–8.
[Abstract/Free Full Text] - Okamura K., Wei J., Scherer S.W. Evolutionary implications of inversions that have caused intra-strand parity in DNA. BMC Genomics (2007) 8:160.[CrossRef][Medline]
- Rocha E.P.C. The replication-related organization of bacterial genomes. Microbiology (2004) 150:1609–27.
[Abstract/Free Full Text] - Rocha E.P.C., Danchin A. Gene essentiality determines chromosome organization in bacteria. Nucleic Acid Res. (2003) 31:6570–7.
[Abstract/Free Full Text] - Rocha E.P.C. The organization of the bacterial genome. Annu. Rev. Genet. (2008) 42:211–33.[CrossRef][Web of Science][Medline]
- Adams M.J., Antoniw J.F. DPVweb: An open access internet resource on plant viruses and virus diseases. Outlooks on Pest Management (2005) 16:268–70.[CrossRef]
- Adams M.J., Antoniw J.F. DPVweb: a comprehensive database of plant and fungal virus genes and genomes. Nucleic Acids Res. (2006) 34. D382–5. http://www.dpvweb.net/.
- Bulmer M. The selection-mutation-drift theory of synonymous codon usage. Genetics (1991) 192:897–907.
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||


