DNA Research Advance Access originally published online on March 11, 2008
DNA Research 2008 15(2):93-102; doi:10.1093/dnares/dsn001
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sequence Level Analysis of Recently Duplicated Regions in Soybean [Glycine max (L.) Merr.] Genome
1 Department of Plant Science, Seoul National University, San 56-1, Sillim-dong, Gwanak-gu, Seoul 151-921, South Korea
2 National Institute of Crop Science, Suwon 441-857, South Korea
3 Research Institute for Agriculture and Life Sciences, Seoul National University, Seoul 151-921, South Korea
4 Corn Insect and Crop Genetics Research Unit, USDA-ARS, Iowa State University, Ames, IA 50011, USA
5 National Instrumentation Center for Environmental Management, Seoul National University, Seoul 151-921, South Korea
Received 15 November 2007; accepted 16 January 2008.
| Abstract |
|---|
|
|
|---|
A single recessive gene, rxp, on linkage group (LG) D2 controls bacterial leaf-pustule resistance in soybean. We identified two homoeologous contigs (GmA and GmA') composed of five bacterial artificial chromosomes (BACs) during the selection of BAC clones around Rxp region. With the recombinant inbred line population from the cross of Pureunkong and Jinpumkong 2, single-nucleotide polymorphism and simple sequence repeat marker genotyping were able to locate GmA' on LG A1. On the basis of information in the Soybean Breeders Toolbox and our results, parts of LG A1 and LG D2 share duplicated regions. Alignment and annotation revealed that many homoeologous regions contained kinases and proteins related to signal transduction pathway. Interestingly, inserted sequences from GmA and GmA' had homology with transposase and integrase. Estimation of evolutionary events revealed that speciation of soybean from Medicago and the recent divergence of two soybean homoeologous regions occurred at 60 and 12 million years ago, respectively. Distribution of synonymous substitution patterns, Ks, yielded a first secondary peak (mode Ks = 0.10–0.15) followed by two smaller bulges were displayed between soybean homologous regions. Thus, diploidized paleopolyploidy of soybean genome was again supported by our study.
Key words: BAC; divergence time; duplication; Ks; Rxp; soybean
| 1. Introduction |
|---|
|
|
|---|
Legumes have begun to draw much attention through recent genomic and phylogenetic studies.1
Most Papilionoids are diploids except Glycine. An ancient genome duplication occurred in Glycine, leading to 2n = 38, 40 or 78–80 depending on annual/perennials or geographic locations.6
,7
Polyploidy has had an evolutionary impact on the structure of the soybean genome.8
–10
Using restriction fragment-length polymorphism (RFLP) analysis with nine populations (Glycine max x G. soja and G. max x G. max) of the Glycine subgenus soja, it was shown that the soybean genome presents about 2.55 copies per digest. This suggests that an additional round of genome duplication might have occurred in at least one of the original genomes.8
Other studies have supported those observations. RFLP and simple sequence repeat (SSR) analyses showed that parts of linkage groups (LGs) B1/S, H, and F of soybean genome shared homoeologous regions.9
Other genetic mapping analysis suggested that extensive rearrangements and additional duplications were present in soybean genome.10
Also, high similarity in physical organization between soybean duplicated regions and a high percentage of microsynteny were shown by characterizing bacterial artificial chromosome (BAC) clones of soybean and other model plants.11
,12
In addition, BACs containing FAD2 genes also contained a number of syntenic genes and were positioned on LG I and O, again indicating duplication of soybean genome.13
Fluorescence in situ hybridization of BACs visualized segmental duplications within the soybean genome.14
M. truncatula genome also presents segmental duplications identified by high-throughput genome sequencing.15
The processes of genome evolution and patterns of divergence can be studied by duplicate gene analysis.16
Because the full genome sequence of many plants is not yet available, ESTs provide resources for studying evolutionary events such as ancient bursts of gene duplications. Because the accumulation of synonymous substitutions occurs stochastically over time, the level of divergence (age of duplication) is estimated by nucleotide substitution in coding sequences2
,17
Putative genome duplications events were identified with large EST collections from eight plant species using synonymous substitution measurements (Ks) of duplicated genes.18
Soybean was estimated to have had two major genome duplications events at 15 million years ago (MYA) and 44 MYA. A genome duplication event also was observed in M. truncatula at
58 MYA. With different calibration, duplications also were observed in both soybean and M. truncatula.17
A mutigene approach combined with a phylogenetic approach suggested soybean and Medicago shared a round of gene duplications, along with about 7000 other legume plants.19
Xanthomonas axonopodis pv. glycine (Xag) causes bacterial leaf pustule (BLP) in soybean that occurs in Korea and the southern United States, where hot and humid weather conditions are prevalent.20
Typically, small yellow-to-brown lesions with a raised lesion are formed in early development and develop into large necrotic lesions causing substantial losses in yield through premature defoliation.21
–24
Twenty consensus LGs of soybean genome, representing the 20 soybean chromosomes, were reported25
with a joined map from three different populations spanning 2400 cM in length. A total of 420 SSR markers were added to the integrated genetic linkage map and its length was expanded to 2523.6 cM of Kosambi map distance across 20 LGs.26
And, 1141 single-nucleotide polymorphism (SNP) markers were later located on the soybean genetic map.27
Among 20 LGs, we are interested in LG D2 because the recessive gene conditioning resistance to BLP, rxp,28
was mapped to LG D2 only 3.9 cM away from Satt372.29
Also, the Rxp locus linked to the malate dehydrogenase (Mdh) locus with an estimated recombination frequency of 15.2 ± 3.8%.30
In the process of BAC clone selection for chromosome walking around Rxp region, we were able to create two contigs, which represent homoeologous regions of the soybean genome. Here, we describe the consequences of the duplication events around the Rxp region. Annotation, gene arrangement, and evolution events estimated by Ks (the number of synonymous substitution per synonymous site) will also be presented.
| 2. Methods |
|---|
|
|
|---|
2.1. Primary BAC library screening
The constructed Iowa State BAC library of soybean Williams 82 (gmw1)31
Basic PCR protocols were followed as described with minor alternations,31
using a PTC-110 Peltier Thermal Cycler (MJ Research, Inc., Watertown, MA, USA). The components of the reaction mixture in 20 µL of total volume were 0.5 U of Taq polymerase (Invitrogen, Carlsbad, CA, USA) and the rests of components were the same.31
Cycling conditions started with initial denaturation at 94°C for 3 min, followed by 35 cycles of 94°C for 30 s, 48°C for 30 s, and 72°C for 30 s and the final step was at 72°C for 2 min. The amplified PCR products were analyzed in 1.5% ethidium bromide-stained agarose.
Several rounds of the BAC library were screened systematically with in order, full-plate super pool DNA, individual full-plate pools, row and column super pools, and row and column pools. All PCRs were performed as describe earlier and DNA of Williams 82 was used as positive control for all screening processes.
2.2. Shotgun plasmid library and DNA sequencing
After BAC DNAs were prepared by a Plasmid Midi Kit (Qiagen, Hilden, Germany) and the insert size of each selected BAC clone was estimated,31
the random plasmid library for shotgun sequencing was constructed with 10–15 µg of the extracted BAC clone DNA and pUC118 vector using Takara BKL Kit (Takara Bio, Inc., Otsu, Japan). The rest of methods were performed as described previously.32
Full sequencing of each BAC DNA was performed with the BigDye Terminator (v. 3.1) cycle sequencing kit (Applied Biosystems, Foster City, CA, USA). Cycle conditions for sequencing and analysis of BAC sequences were described.32
Also, the individual sequences were assembled with Phred/Phrap software and the remaining gaps of each clone were closed by direct sequencing, using plasmid DNA.32
Image v. 3.0 and FPC v. 4.7.9 were used for confirmation of BAC contig assignment.33
,34
2.3. Secondary BAC library screening and sequence analysis
After the sequences of each BAC clone were aligned,35
BAC end sequences (BES) were selected for extending BAC contigs. Primer3 program36
was used to design primers for secondary BAC library screening. With primers derived from BES, the BAC library screening was performed again as described earlier. Addition to Iowa State BAC library, Missouri soybean BAC library (gmw2) consisted of BstyI partially digested Williams 82 DNA was also used for screening.37
After BAC contigs were confirmed, alignment between BAC contigs and its alignment results were inspected with GBrowse (http://www.gmod.org/?q=node/71) and SynBrowse (http://www.synbrowse.org/). Also, gene annotation was conducted with the web-based gene prediction programs FGENESH (http://sun1.softberry.com/berry.phtml) and GeneMark (http://exon.gatech.edu/GeneMark/) against Medicago (legume plant) database. Putative amino acid sequences from the predicted genes were used as queries for searching similar known proteins using BLASTP. With each predicted gene of GmA at first, EST information was searched against G. max EST database at G. max Genome Database (http://bionary.agry.purdue.edu/GmaxGDB/index.php). Nucleotide blast or tblastn (http://www.ncbi.nlm.nih.gov/blast/Blast.cgi) was also used for searching EST information with each predicted gene of GmA' or MtA, if no ESTs corresponding to the predicted genes in GmA were identified.
The rate of non-synonymous nucleotide substitution (Ka) and the fraction of synonymous substitutions (Ks) were obtained with the CODEML program38
of the PAML package.39
Ks was used to estimate the divergence time between two sequences. So, coding sequences of the predicted gene from two contigs were used for analysis of Ks, as described.35
Divergence times (T) were calculated using a synonymous mutation rate of 6.1 x 10–9 substitutions per synonymous site per year18
,40
as T = Ks/(2 x 6.1 x 10–9).
2.4. SNP detection, SNP genotyping, and generation of linkage map
To locate each BAC contig in LGs, contig-specific regions longer than 4.5 kb were surveyed. Seven different primer sets (Supplementary Table S1) were designed from these contig-specific regions using Primer3 (http://primer3.sourceforge.net/). And, the detection of SNP in the contig-specific regions between two soybean genotypes, Pureunkong and Jinpumkong 2, was followed.3
SNP capture probe (5'-GTT TTT TCA TCA ATC TTC CTC TAA A-3') was designed to be complementary to the 5' region from the SNP site within an amplicon using SBEPrimer version 1.141 and single base extension reactions followed by fluorescence polarization (FP) measurements were performed on a Victor3 microplate reader (PerkinElmer Life Science, Boston, MA, USA).42
SNP primers were tested using genomic DNA of each parent and a mixture of both parents as artificial heterozygotes. The SNP primer was accepted as an SNP marker, only if the results of genotyping by AcycloPrime FP analysis were confirmed by sequencing data and then used for genotyping in a segregating population, an F2-derived soybean population of 90 recombinant inbred lines (RILs) from the cross of Pureunkong and Jinpumkong 2.43
Genotyping data were automatically transferred to Microsoft Excel and the genotypes of the segregation population were determined, if the clusters were separated at least 40 mp (thousandth of the polarization unit) apart, at least seven times higher than standard deviation of the negatives (>99% at significant level).44
The construction of the linkage map with SNP marker genotyping data and integration of these markers on LGs were followed.43
SNP genotyping data from the heterozygous line were considered as missing data. The five SSRs located on GmA' sequences were additionally used for accurate mapping of GmA', after the SSRs were identified by Sputnik: DNA microsatellite repeat search utility (http://cbi.labri.fr/outils/Pise/sputnik.html). LGs were designated according to the USDA genetic map,26
and MapChart v. 2.1 was used for generation of linkage map.45
2.5. Accession numbers
Accession nos gmw1-20O10, EU028328
[GenBank]
; gmw1-24M16, EU028329
[GenBank]
; gmw1-29F06, EU028330
[GenBank]
; gmw1-89M01, EU028331
[GenBank]
; gmw2-77P21, EU028332
[GenBank]
. Sequence data from this article can be found in the GenBank/EMBL data libraries.
| 3. Results |
|---|
|
|
|---|
3.1. Identification of soybean duplicated regions by BAC selection
To obtain BAC clones around the Rxp locus, we screened the gmw1 BAC library with three SSR markers, Satt372, Satt486, and Satt498. Among a total of six BAC clones identified by these SSR markers, we selected gmw1-29F06 and gmw1-24M16 for determining DNA sequences because they represented an SSR marker with long length (Fig. 1).
|
DNA sequences of gmw1-29F06 and gmw1-24M16 were aligned and created the GmA contig, comprised of 264 239 bp, including overlapped DNA sequences of
73 kb (Fig. 1). Primers were then designed from BES of GmA for extending the contig. Clone gmw1-20O10 (120 kb) was selected from BES of gmw1-29F06 and primers designed from BES from gmw1-24M16 selected gmw1-89M01 to create an apparent extension of 175 kb. We were able to extend the contig longer with full sequenced gmw1-20O10 and gmw1-89M01 clones. The full DNA sequences of gmw1-20O10 and gmw1-89M01 were compared with GmA, but DNA sequences of the expected overlapped regions showed only an approximate 90% match. To close the gap between gmw1-20O10 and gmw1-89M01, we were able to select gmw2-77P21 (94 kb) from another soybean BAC library, gmw2. After the DNA sequences of gmw2-77P21 were aligned with gmw1-20O10 and gmw1-89M01, the GmA' contig (gmw1-20O10_ gmw2-77P21_ gmw1-89M01, 292 895 bp) was formed with 100% match (Fig. 1).
3.2. Mapping of soybean duplicated regions
To locate GmA' on the soybean genetic linkage map, SNP genotyping was performed. First, unique regions longer than 4.5 kb in GmA' were surveyed. Seven different unique regions were identified, and seven different primer sets were randomly designed from these seven contig-specific regions (Supplementary Table S1). One SNP locus between Pureunkong (deletion) and Jinpumkong 2 (A) was identified by primers (forward, 5'-TTC GTG CTA AGT GGA ACT TCT G-3'; reverse, 5'-TAC AAC AAC GAT GTT CAT GAC G-3') designed between 159 723 and 160 465 bp of GmA'.
SNP genotyping of GmA' was conducted with the RIL population from the cross of Pureunkong and Jinpumkong 2. The SNP marker locus was incorporated into the frame map,43
placing GmA' to the top of LG A1, 1.9 cM away from Satt684 in LG A1 (Fig. 1). Five SSRs identified by Sputnik on GmA' were additionally analyzed. Only one designed SSR showed polymorphism between Pureunkong and Jinpumkong 2 (data not shown), and this was turned out to be Satt684, which was positioned between 64 495 and 64 682 bp of GmA'. On the basis of all genotyping and mapping data, we are able to determine that the duplicated regions are located on LG A1 for GmA' and LG D2 for GmA (Fig. 1).
3.3. Alignments, annotations, and Ks estimation
After BAC contigs were confirmed and inspected with GBrowse and SynBrowse, genes were annotated with FGENESH or GeneMark against the Medicago database. Fig. 2 shows a schematic representation of approximate gene lengths, gene locations, and homologous regions (linked by shaded lines). The 54 and 58 genes were predicated in GmA and GmA', respectively (Fig. 2). Gene density along these two sequenced BAC contigs was approximately one gene per 5.0 kb. A similar gene density (45 predicted genes along 219 028 bp) was also detected in Medicago along Contig 962B (MtA, as of January 2007 at http://www.medicago.org). The Medicago contig showed homology with the two soybean contigs. Gene order was conserved among syntenic blocks, except for one case (GmA_18 versus GmA'_10 and GmA_27 versus GmA'_10), and the same orientation between the predicted genes was observed. Gene order was maintained in Medicago, although linearity was fragmented (Fig. 2).
|
Supplementary Table S2 lists all pairwise comparisons of the predicted genes among homoeologous contigs. Segments showing no homology with known genes were excluded for this table. Also, the EST information corresponding to the predicted genes was included in Supplementary Table S2 after three BLAST programs, G. max Genome Database, nucleotide blast, and tblastn, were run. Eight regions were unique in GmA, whereas GmA' had 13 unique regions. But, 22 among 45 segments of MtA were not similar to any of segments from the two soybean homoeologous regions. Twenty seven pairwise comparisons between soybean homoeologous regions were also observed in Supplementary Table S2. After each pairwise comparison was performed against BLASTP, nine of the 27 comparisons between soybean homoeologous regions showed >90% identity with putative genes with pretty low e-value. However, wide range of the conservation level was detected in the putative promoter regions and the introns between GmA and GmA', averaging 59.7% and 54.9% in similarity for the putative promoter regions and the introns, respectively (Supplementary Table S2). Many homoeologous regions contained kinases and proteins related to signal transduction pathway. Interestingly, some unique segments from GmA and GmA' showed homology with transposase (GmA'_07) or integrase (GmA_13, 14 and 15) (Supplementary Table S2). Among twenty seven pairwise comparisons between soybean homoeologous regions, nine comparisons showed alignment with MtA (Supplementary Table S2).
Using the maximum-likelihood method in the CODEML program,39
synonymous (Ks) and non-synonymous (Ka) distance were estimated. This method was based on the F3 x 4 model of codon substitution,38
explaining both transition/transversion and codon usage biases. Supplementary Table S3 shows the results of analysis between homologous gene pairs from each contig, along with percent identity of amino acid and cDNA sequences. The median Ka value (0.0426) between two soybean contigs was about 3.5 times smaller than the median Ks value (0.1472). Only one case in the Ka/Ks ratio was higher than 1 (1.5479, GmA_39 versus GmA'_40) (Supplementary Table S3). This Ka/Ks ratio might be non-significant because of moderate length of exons (282, 51, and 159 bp). When the substitutions per synonymous site (Ks) were plotted against the fraction of duplication events, secondary peaks were observed in the distribution for two contigs (data not shown). The first secondary peak (mode Ks = 0.10–0.15) followed by two smaller bulges (mode Ks = 0.20–0.25 and 0.30–0.35) was displayed, indicating a burst of gene duplications in soybean. For GmA versus MtA, the median Ka value was 0.2888, which is 2.7 times smaller than the median Ks value (0.7755). MtA_12 showed homology with both GmA_21 and GmA_23. The Ks value for GmA_23 versus MtA_12 was extremely high because they aligned along 107 amino acids (Supplementary Table S3). The median Ka value (0.2876) between GmA' and MtA was about 2.8 times smaller than the median Ks value (0.8003).
To determine the timing of the duplication event giving rise to the two contigs, the Ks value was used. Synonymous substitutions are thought to be evolutionarily neutral because the mutations cause no amino acid change35
and therefore accumulate stochastically over time. Ks values less than 0.05 and greater than 1 were not included for searching for mixtures of normal distributions.19
Divergence times (T) were estimated with Ks value and assumption of a mutation rate of 6.1 x 10–9 substitutions per synonymous site per year.18
,40
T ranged from 5.55 to 39.92 MYA between GmA and GmA', and the median T was 12.3 MYA with low Ks value (0.1498) (Fig. 3, Supplementary Table S3). With MtA only included 0.05 <Ks < 1.0, median Ks values were 0.7654 (0.5986 to 0.9342) and 0.6877 (0.4592 to 0.9128) for GmA and GmA', respectively. Therefore, MtA and the soybean homoeologous contigs diverged at 56.4–62.7 MYA (Fig. 3, Supplementary Table S3). The two soybean homoeologous contigs were duplicated more recently, agreeing with the previous study.17
|
| 4. Discussion |
|---|
|
|
|---|
4.1. Paleopolypoidy of the soybean genome
Diploidization or gene duplication is a process of switching from tetrasomic to disomic inheritance and a common process in plant genome evolution.13
Although BAC-end sequences were used for BAC by BAC selection, the alignment of our two contigs was not perfect and gaps in alignment were observed. To locate these two homologous contigs, SNP genotyping was performed with the one SNP between Pureunkong (deletion) and Jinpumkong 2 (A). This SNP marker locus for GmA' was located 1.9 cM away from Satt684 on LG A1, which SSR marker analysis was also able to be positioned between 64 495 and 64 682 bp of GmA' (Fig. 1). Linkage maps from the Soybean Breeders Toolbox (http://soybase.org) were compared to locate the duplicated region (GmA versus GmA'). A comparison between the soybean composite maps for LG A1 and D2 indicated that homoeologous regions exist between them. In addition to the two homoeologous contigs, five RFLP markers were also common between the two LGs. Therefore, it suggested that GmA and GmA' are indeed homoeologous.
4.2. Genome dynamics among homoeologous regions
Both soybean homoeologous contigs showed a gene density of 1 gene per 5 kb in this present study (Fig. 2). The gene density of soybean on LG G was estimated with 28 BAC ends11
and subclone sequences from two contigs, to be approximately 1 gene per 14 kb. Arabidopsis and tomato showed similar gene density with an average of 1 gene per 5 kb, although more than a sevenfold difference in genome size is present between these two species.11
,48
Wheat, barley, and rice were compared near the Lrk locus and showed maximal density as 1 gene per 4–5 kb.49
The unusual relationship between physical distance and genetic distance for the homoeologous region on LG A1 was revealed by the comparison of the physical map and the genetic map in the distal region of LG A1 (Fig. 1). The physical distance between two markers (gmw1-89M01-32 and Satt684) on GmA' was
100 kb, whereas its genetic distance was observed to be 1.9 cM on the physical map of LG A1. Exhibition of 52.6 kb/cM in a physical-to-genetic distance ratio in this homoeologous region might represent high recombination region because not only duplication of genes and dispersed gene duplicates, which was single-gene duplications that were on different chromosome, occurred more often in high-recombination regions50
but also the distal chromosomal regions showed high recombination rate.50
And, rice genome showed high recombination frequency with 244 kb as the average physical distance per centimorgan, although a physical-to-genetic distance ratio was different depending on position along the chromosome.51
High resolution mapping with various markers and genome sequencing would clarify the relationship between physical distance and genetic distance for this homoeologous region on LG A1.
Genic regions of the two soybean contigs and MtA retained gene structure in both order and orientation (Fig. 2). Similar conservations were also observed.13
,17
,52
Sequence similarity was >60% in intergenic regions of most soybean homologous regions. This high level of similarity was also seen in previous studies,11
,13
,52
suggestive of either a relatively recent duplication or concerted evolution.53
A homology search using BLASTP identified high homology with kinases and proteins related to signal transduction pathway within the homoeologous regions. Among them, the first aligned segment (GmA_01 versus GmA'_10) was very similar to putative receptor-like protein kinase INRPK1 (Ipomoea nil receptor-like protein kinase), which shows homology with Xa21, the rice pathogen recognition receptor.54
INRPK1 and Xa21, as typical RPKs, were composed of extracellular leucine-rich repeat domain, transmembrane domain, and cytoplasmic kinase domain. Twenty one dominant loci responsible for resistance to bacterial blight disease indicated that a multiple gene family was involved in resistance to Xanthomonas oryzae pv. oryzae. And, sequence analysis between two classes of Xa21 defined by the predicted amino acid sequence level suggested duplication was one of the roles in evolution of the Xa21 gene family.55
So, like Xa21, the soybean homoeologous regions contained important genes related to plant defense could be duplicated. However, it is difficult to say that Rxp encodes RPK for recognition of pathogen (Xag) by the extracellular domain and transduction of pathogen attack by an intracellular kinase because INRPK1 showed only 30% of homology with Xa21 at the amino acid sequence level.
Supplementary Table S2 provides the information on insertion/deletion of sequences in terms of genome dynamics in Medicago and soybean. Several blocks of segments from each contig showing no homology with any other homologous regions were identified, indicating insertion or deletion of sequences within homologous contigs. Some segments draw attention because they were similar to transposase from Medicago (GmA'_07) and integrase from Medicago (GmA_13, 14 and 15). Potential transposable element (TE) activity is interesting because TE could be used for transposon mutagenesis in gene cloning and functional genomics in plants.56
,57
Many researchers have put efforts into identifying and cloning of soybean TE via finding the mutable alleles because no active TE has yet been isolated from soybean.57
The w4-mutable line contained an autonomous TE at the W4 locus and the unstable k2 Mdh1-n y20 chromosomal region caused by a non-autonomous TE were identified in soybean.57
,58
Interestingly, this unstable chromosomal region was mapped on LG H and this Mdh gene also located on LG D2,30
near the Rxp locus studied in this research.
4.3. Genome evolution
With comparisons of the sequences of the same gene from two species or gene family, counting of the number of non-synonymous changes (amino acid sequences change) and synonymous change (no change of amino acids sequences) is a good indicator of the degree of divergence between two sequences.59
A total of 23 homologous regions between two soybean contigs showed Ka values ranging from 0.0168 to 0.2807 and Ks values ranging from 0.0318 to 0.4797 (Supplementary Table S3). It suggested that the Ka/Ks ratio could test for assessing the protein-coding potentials of genomic regions.60
Depending on Ks as the background rate of evolution, the selection pressure in protein-coding regions could be explained by deviations of this ratio.59
In this study, the ratio was less than 1 except for one comparison (GmA_39 versus GmA'_40) because of very low Ks. Short length of exons or highly divergent sequences (<70% nucleotide identity) could cause the Ka/Ks ratio to be higher than 1.60
Although their percentage of identity at nucleotide level was >95%, this homologous region (GmA_39 versus GmA'_40) showed very low Ks because of very short length of exon 2 (50 bp).
Paleopolyploid species like soybean indicate the presence of duplications by showing secondary peaks in the age distributions of paralogous pairs.17
Three secondary peaks were observed in this study. Distributions were indicated with a major and first peak at mode Ks = 0.10–0.15. These data were consistent, and the first secondary peak at the same mode also identified after comparisons of pairs of paralogous genes in 14 model plant species including soybean.17
In addition, two minor bulges were identified in our study, indicating additional duplication events in this homologous region between two contigs.
To estimate divergence time for gene duplication, the Ks values were used, assuming rates of synonymous substitution of 6.1 x 10–9 substitutions per synonymous site per year.18
,40
,61
This soybean homologous region was mainly duplicated at 12.3 MYA in this study and the speciation event of soybean from Medicago at 60 MYA was also suggested (Fig. 3), agreeing with the rapid diversification between 50–60 MYA in legumes.2
,61
However, estimated ages of the secondary peaks could be different depending on the assumed substitution rates. It estimated a rate of silent-site substitution of 6.1 per silent site per billion years.18
,40
,61
But, a synonymous rate of 1.5 x 10–8 substitutions per synonymous site per year for dicots was used to calculate the absolute date for duplication events.17
,62
With a synonymous rate of 1.5 x 10–8 substitutions, the average divergent time was 6.3 MYA in this study (median = 5.0 MYA), similar to estimation of the recent duplication (3.3–5.0 MYA) in soybean.17
However, these estimates are only approximations because the rate of synonymous substitution is different among genes and species and generation time is also the factor for controlling mutational rate.17
Our study provides additional evidence of the paleopolyploidy of the soybean genome. We also showed that organization and sequence homology between duplicated segments were very similar. In this study, homoeologous regions were so similar that the contig on LG A1 was originally sequenced instead of that on LG D2, even though BAC-end sequences located near Rxp locus on LG D2 were used for BAC selection in genome sequencing. Thus, in future studies, to avoid walking in the wrong direction, BAC by BAC soybean genome sequencing should be performed in concert with whole-genome physical mapping because of high level of similarity between homologous contigs.
| Supplementary Data |
|---|
|
|
|---|
Supplementary data are available online at www.dnaresearch.oxfordjournals.org.
| Acknowledgements |
|---|
|
|
|---|
This research was supported by a grant for genome sequencing funded by Agricultural R&D Promotion Center, Technology Development Program for Agriculture and Forestry, the Ministry of Agriculture and Forestry, the Republic of Korea, and in part by a grant (code no. CG3121) for genetic mapping from the Crop Functional Genomics Center of the 21st Century Frontier Research Program funded by the Ministry of Science and Technology, the Republic of Korea. Dr K. Van and Mr Kim are the recipients of a fellowship from the BK21 program granted by the Ministry of Education and Human Resources Development (ME and HRD), the Republic of Korea. We also thank the National Instrumentation Center for Environmental Management at Seoul National University in Korea.
| Footnotes |
|---|
* To whom correspondence should be addressed. Tel. +82 2-880-4545. Fax. +82 2-873-2056. E-mail: sukhalee{at}snu.ac.kr
| References |
|---|
|
|
|---|
- Young N. D., Shoemaker R. C. Genome studies and molecular genetics, Part 1: model legumes exploring the structure, function and evolution of legume genomes. Curr. Opin. Plant Biol. (2006) 9:95–98.[CrossRef][Web of Science][Medline]
- Shoemaker R. C., Schlueter J., Doyle J. J. Paleopolyploidy and gene duplication in soybean and other legumes. Curr. Opin. Plant Biol. (2006) 9:104–109.[CrossRef][Web of Science][Medline]
- Van K., Hwang E.-Y., Kim M. Y., Kim Y.-H., Cho Y.-I., Cregan P. B., Lee S.-H. Discovery of single nucleotide polymorphisms in soybean using primers designed from ESTs. Euphytica (2004) 139:147–157.[CrossRef][Web of Science]
- Sato S., Tabata S. Lotus japonicus as a platform for legume research. Curr. Opin. Plant Biol. (2006) 9:128–132.[CrossRef][Web of Science][Medline]
- Town C. D. Annotating the genome of Medicago truncatula. Curr. Opin. Plant Biol. (2006) 9:122–127.[CrossRef][Web of Science][Medline]
- Doyle J. J., Doyle J. L., Rauscher J. T., Brown A. H. D. Diploid and polyploid reticulate evolution throughout the history of perennial soybeans (Glycine subgenus Glycine). New Phytol. (2004) 161:121–132.[CrossRef][Web of Science]
- Zhu H., Choi H.-K., Cook D. R., Shoemaker R. C. Bridging model and crop legumes through comparative genomics. Plant Physiol. (2005) 137:1189–1196.
[Free Full Text] - Shoemaker R. C., Polzin K., Labate J., Spechet J., Brummer E. C., Olson T., Young N., Concibido V., Wilcox J., Tamulonis J. P., Kochert G., Boerma H. R. Genome duplication in soybean (Glycine subgenus soja). Genetics (1996) 144:329–338.[Abstract]
- Lee J. M., Bush A., Specht J. E., Shoemaker R. C. Mapping duplicate genes in soybean. Genome (1999) 42:829–836.
- Lee J. M., Grant D., Vallejos C. E., Shoemaker R. C. Genome organization in dicots. II. Arabidopsis as a bridging species to resolve genome duplication events among legumes. Theor. Appl. Genet. (2001) 103:765–773.[CrossRef][Web of Science]
- Foster-Hartnett D., Mudge J., Larsen D., Danesh D., Yan H., Denny R., Peñuela S., Young N. D. Comparative genomic analysis of sequence sampled from a small region on soybean (Glycine max) molecular linkage group G. Genome (2002) 45:634–645.[Medline]
- Yan H. H., Mudge J., Kim D.-J., Larsen D., Shoemaker R. C., Cook D. R., Young N. D. Estimates of conserved microsynteny among the genomes of Glycine max, Medicago truncatula and Arabidopsis thaliana. Theor. Appl. Genet. (2003) 106:1256–1265.[Medline]
- Schlueter J. A., Vasylenko-Sanders I. F., Deshpande S., Yi J., Siegfried M., Roe B. A., Schlueter S. D., Scheffler B. E., Shoemaker R. C. The FAD2 gene family of soybean: insights into the structural and functional divergence of a paleopolyploid genome. The Plant Genome (A supplement to Crop Sci.) (2007) 47.
- Pagel J., Walling J. G., Young N. D., Shoemaker R. C., Jackson S. A. Segmental duplication within the Glycine max genome revealed by fluorescence in situ hybridization of bacterial artificial chromosomes. Genome (2004) 47:764–768.[Medline]
- Zhu H., Kim D.-J., Baek J.-M., Choi H.-K., Ellis L. C., Küester H., McCombie W. R., Peng H.-M., Cook D. R. Syntenic relationships between Medicago truncatula and Arabidopsis reveal extensive divergence of genome organization. Plant Physiol. (2003) 131:1018–1026.
[Abstract/Free Full Text] - Gaut B. S., Doebley J. F. DNA sequence evidence for the segmental allotetraploid origin of maize. Proc. Natl. Acad. Sci. USA (1997) 94:6809–6814.
[Abstract/Free Full Text] - Blanc G., Wolfe K. H. Widespread paleopolyploidy in model plant species inferred from age distributions of duplicate genes. Plant Cell (2004) 16:1667–1678.
[Abstract/Free Full Text] - Schlueter J. A., Dixon P., Granger C., Grant D., Clark L., Doyle J. J., Shoemaker R. C. Mining EST databases to resolve evolutionary events in major crop species. Genome (2004) 47:868–876.[Medline]
- Pfeil B. E., Schlueter J. A., Shoemaker R. C., Doyle J. J. Placing paleopolyploidy in relation to taxon divergence: a phylogenetic analysis in legumes using 39 gene families. Syst. Biol (2005) 54:441–454.[CrossRef][Web of Science][Medline]
- Narvel J. M., Jakkula L. R., Phillips D. V., Wang T., Lee S. H., Boerma H. R. Molecular mapping of Rxp conditioning reaction to bacterial pustule in soybean. J. Hered. (2001) 92:267–270.
[Abstract/Free Full Text] - Hartwig E. E., Johnson H. W. Effect of the bacterial pustule disease on yield and chemical composition of soybeans. Agron. J. (1953) 45:22–23.
[Free Full Text] - Weber C. R., Dunleavy J. M., Fehr W. R. Effect of bacterial pustule on closely related soybean lines. Agron. J. (1966) 58:544–545.
[Abstract/Free Full Text] - Kennedy B. W., Tachibana H. Bacterial diseases, In. In: Soybeans: Improvement, Production, and Uses—Caldwell B. E., ed. (1973) Madison, WI: American Society of Agronomy. 491–504.
- Groth D. E., Braun E. J. Growth kinetics and histopathology of Xanthomonas campestris pv. glycines in leaves of resistant and susceptible soybeans. Phytopathology (1986) 76:959–965.[Web of Science]
- Cregan P. B., Jarvik T., Bush A. L., Shoemaker R. C., Lark K. G., Kahler A. L., Kaya N., VanToai T. T., Lohnes D. G., Chung J., Specht J. E. An integrated genetic linkage map of the soybean genome. Crop Sci. (1999) 39:1464–1490.
[Abstract/Free Full Text] - Song Q. J., Marek L. F., Shoemaker R. C., Lark K. G., Concibido V. C., Delannay X., Specht J. E., Cregan P. B. A new integrated genetic linkage map of the soybean. Theor. Appl. Genet. (2004) 109:122–128.[CrossRef][Web of Science][Medline]
- Choi I.-Y., Hyten D. L., Matukumalli L. K., Song Q., Chaky J. M., Quigley C. V., Chase K., Lark K. G., Reiter R. S., Yoon M.-S., Hwang E.-Y., Yi S.-I., Young N. D., Shoemaker R. C., van Tassell C. P., Specht J. E., Cregan P. B. A soybean transcript map: gene distribution, haplotype and single-nucleotide polymorphism analysis. Genetics (2007) 176:685–696.
[Abstract/Free Full Text] - Bernard R. L., Weiss M. G. Qualitative genetics. In: Soybeans Improvement, Production, and Uses—Caldwell B. E., ed. (1973) Madison, WI: American Society of Agronomy.
- Van K., Ha B.-K., Kim M. Y., Moon J. K., Paek N.-C., Heu S., Lee S.-H. SSR mapping of genes conditioning soybean resistance to six isolates of Xanthomonas axonoppodis pv. glycines. Kor. J. Genetics (2004) 26:47–54.
- Palmer R. G., Lim S. M., Hedges B. R. Testing for linkage between the rxp locus and nine isozyme loci in soybean. Crop Sci. (1992) 32:681–683.
[Abstract/Free Full Text] - Marek L. F., Shoemaker R. C. BAC contig development by fingerprint analysis in soybean. Genome (1997) 40:420–427.[Medline]
- Choi S.-H., Kim I.-C., Kim D.-S., Kim D.-W., Chae S.-H., Choi H.-H., Choi I., Yeo J.-S., Song M.-N., Park H.-S. Comparative genomic organization of the human and bovine PRNP locus. Genomics (2006) 87:598–607.[CrossRef][Web of Science][Medline]
- Soderlund C., Longden I., Mott R. FPC: a system for building contigs from restriction fingerprinted clones. Bioinformatics (1997) 13:523–535.
[Abstract/Free Full Text] - Sulston J., Mallett F., Staden R., Rurbin R., Horsnell T., Coulson A. Software for genome mapping by fingerprinting techniques. Comput. Appl. Biosci. (1998) 4:125–132.[CrossRef]
- Yang T.-J., Kim J. S., Kwon S.-J., Lim K.-B., Choi B.-S., Kim J.-A., Jin M., Park J. Y., Lim M.-H., Kim H.-I., Lim Y. P., Kang J. J., Hong J.-H., Kim C.-B., Bhak J., Bancroft I., Park B.-S. Sequence-level analysis of the diploidization process in the triplicated FLOWERING LOCUS C region in Brassica rapa. Plant Cell (2006) 18:1339–1347.
[Abstract/Free Full Text] - Rozen S., Skaletsdy H. Primer3 on the WWW for general users and for biologist programmers. Methods Mol. Biol. (2000) 132:365–386.[Medline]
- Wu X., Lee G.-J., Blake S., Pyatek K., Huang S., Wan J., Stacey G., Nguyen H. T. Six-dimensional BAC DNA pools—a new resource for soybean genome mapping, In. In: Plant and Animal Genomes XIII Conference (2005) San Diego, CA, USA. Abstract 430.
- Goldman N., Yang Z. A codon-based model of nucleotide substitution for protein-coding DNA sequences. Mol. Biol. Evol. (1994) 11:725–736.[Abstract]
- Yang Z. PAML: a program package for phylogenetic analysis by maximum likelihood. Comput. Appl. Biosci. (1997) 13:555–556.
[Free Full Text] - Lynch M., Conery J. S. The evolutionary fate and consequences of duplicated genes. Science (2000) 290:1151–1155.
[Abstract/Free Full Text] - Kaderali L., Deshpande A., Nolan J. P., White P. S. Primer design for multiplexed genotyping. Nucleic Acids Res. (2003) 31:1796–1802.
[Abstract/Free Full Text] - Cai C. M., Van K., Kim M. Y., Lee S.-H. Optimization of SNP genotyping assay with fluorescence polarization detection. Kor. J. Crop Sci. (2005) 50:361–367.
- Kim M. Y., Ha B.-K., Jun T.-H., Hwang E.-Y., Van K., Kuk Y.-I., Lee S.-H. Single nucleotide polymorphism discovery and linkage mapping of lipoxygenase-2 gene (L
2) in soybean. Euphytica (2004) 135:169–177.[CrossRef][Web of Science] - Chen X., Levine L., Kwok P. Y. Fluorescence polarization in homogeneous nucleic acid analysis. Genome Res. (1999) 9:492–498.
[Abstract/Free Full Text] - Voorrips R. E. MapChart: software for the graphical presentation of linkage maps and QTLs. J. Hered. (2002) 93:77–78.
[Free Full Text] - Schlueter J. A., Scheffler B. E., Schlueter S. D., Shoemaker R. C. Sequence conservation of homeologous bacterial artificial chromosomes and transcription of homeologous genes in soybean (Glycine max L. Merr.). Genetics (2006) 174:1017–1028.
[Abstract/Free Full Text] - Cai C. M., Van K., Lee S.-H. Gene duplications revealed during the process of SNP discovery in soybean [Glycine max (L.) Merr.]. J. Crop Sci. Biotech. (2007) 10:237–242.
- Ku H. M., Vision T., Liu J. P., Tanksley S. D. Comparing sequenced segments of the tomato and Arabidopsis genomes: large-scale duplication followed by selective gene loss creates a network of synteny. Proc. Natl. Acad. Sci. USA (2000) 97:9121–9126.
[Abstract/Free Full Text] - Feuillet C., Keller B. High gene density is conserved at syntenic loci of small and large grass genomes. Proc. Natl. Acad. Sci. USA (1999) 96:8265–8270.
[Abstract/Free Full Text] - Gaut B. S., Wright S. I., Rizzon C., Dvorak J., Anderson L. K. Recombination: an underappreciated factor in the evolution of plant genomes. Nat. Rev. Genet. (2007) 8:77–84.[CrossRef][Web of Science][Medline]
- Chen M., Presting G., Barbazuk W. B., et al. An integrated physical and genetic map of the rice genome. Plant Cell (2002) 14:537–545.
[Abstract/Free Full Text] - Zhang X.-C, Wu X., Findley S., Wan J., Libault M., Nguyen H. T., Cannon S. B., Stacey G. Molecular evolution of lysine motif-type receptor-like kinases in plants. Plant Physiol. (2007) 144:623–636.
[Abstract/Free Full Text] - Wendel J. F., Schnabel A., Seelanan T. Bidirectional interlocus concerted evolution following allopolyploid speciation in cotton (Gossypium). Proc. Natl. Acad. Sci. USA (1995) 92:280–284.
[Abstract/Free Full Text] - Lee S.-W., Han S.-W., Bartley L. E., Ronald P. C. Unique characteristics of Xanthomonas oryzae pv. oryzae AvrXa21 and implications for plant innate immunity. Proc. Natl. Acad. Sci. USA (2006) 103:18395–18400.
[Abstract/Free Full Text] - Song W.-Y., Pi L.-Y., Wang G.-L., Gardner J., Holsten T., Ronald P. C. Evolution of the rice Xa21 disease resistance gene family. Plant Cell (1997) 9:1279–1287.[Abstract]
- Ramachandran S., Sundaresan V. Transposons as tools for functional genomics. Plant Physiol. Biochem. (2001) 39:243–252.[CrossRef][Web of Science]
- Xu M., Palmer R. G. Genetic analysis and molecular mapping of a pale flower allele at the W4 locus in soybean. Genome (2005) 48:334–340.[Medline]
- Xu M., Palmer R. G. Molecular mapping of k2 Mdh1-n y20, an unstable chromosomal region in soybean [Glycine max (L.) Merr.]. Theor. Appl. Genet. (2005) 111:1457–1465.[CrossRef][Web of Science][Medline]
- Hurst L. D. The Ka/Ks ratio: diagnosing the form of sequence evolution. Trends Genet. (2002) 18:486–487.[CrossRef][Web of Science][Medline]
- Nekrutenko A., Makova K. D., Li W.-H. The Ka/Ks ratio test for accessing the protein-coding potential of genomic regions: an empirical and simulation study. Genome Res. (2002) 12:198–202.
[Abstract/Free Full Text] - Lavin M., Herendeen P. S., Wojciechowski M. F. Evolutionary rates analysis of Leguminosae implicates as rapid diversification of lineages during the tertiary. Syst. Biol. (2005) 54:575–594.[CrossRef][Web of Science][Medline]
- Koch M. A., Haubold B., Mitchell-Olds T. Comparative evolutionary analysis of chalcone synthase and alcohol dehydrogenase loci in Arabidopsis, Arabis, and related genera (Brassicaceae). Mol. Biol. Evol. (2000) 17:1483–1498.
[Abstract/Free Full Text]
This article has been cited by other articles:
![]() |
R. W. Innes, C. Ameline-Torregrosa, T. Ashfield, E. Cannon, S. B. Cannon, B. Chacko, N. W.G. Chen, A. Couloux, A. Dalwani, R. Denny, et al. Differential Accumulation of Retroelements and Diversification of NB-LRR Disease Resistance Genes in Duplicated Regions following Polyploidy in the Ancestor of Soybean Plant Physiology, December 1, 2008; 148(4): 1740 - 1759. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||



