DNA Research Advance Access originally published online on February 12, 2007
DNA Research 2006 13(6):267-274; doi:10.1093/dnares/dsm001
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Identification and Mapping of Expressed Genes, Simple Sequence Repeats and Transposable Elements in Centromeric Regions of Rice Chromosomes
1 National Institute of Agrobiological Sciences, 1-2 Kannondai 2-chome, Tsukuba, Ibaraki 305-8602, Japan
2 Institute of the Society for Techno-innovation of Agriculture, Forestry and Fisheries 446-1 Ippaizuka, Kamiyokoba, Tsukuba, Ibaraki 305-0854, Japan
Received 2 November 2006; revised 14 January 2007
| Abstract |
|---|
|
|
|---|
The genomic sequences derived from rice centromeric regions were analyzed to facilitate the comprehensive understanding of the rice genome. A rice centromere-specific satellite sequence, RCS2/TrsD/CentO, was used to screen P1-derived artificial chromosome (PAC) and bacterial artificial chromosome (BAC) genomic libraries derived from Oryza sativa L. ssp. japonica cultivar Nipponbare. Physical maps of the centromeric regions were constructed by DNA fingerprinting methods and the aligned clones were analyzed by end sequencing. BLAST analysis revealed the composition of genes, centromeric satellites and other repetitive elements, such as RIRE7/CRR, RIRE8, Squiq, Anaconda, CACTA and miniature inverted-repeat transposable elements. Fiber-fluorescent in situ hybridization analysis also indicated the presence of distinct clusters of RCS2/TrsD/CentO satellite interspersed with other elements, instead of a long homogeneous region. Several expressed genes, sequences representative of ancestral organellar insertions, relatively long simple sequence repeats (SSRs), and sequences corresponding to 5S and 45S ribosomal RNA genes were also identified. Thirty-one gene sequences showed high-similarity to rice full-length cDNA sequences that had not been matched to the published rice genome sequence in silico. These results suggest the presence of expressed genes within and around the clusters of RCS2/TrsD/CentO satellites in unsequenced centromeric regions of the rice chromosomes.
Key words: genome; FL-cDNA; centromere; transposon
| 1. Introduction |
|---|
|
|
|---|
The centromere of a chromosome plays a key role during mitosis and meiosis. The centromeric region contains species-specific satellite sequences and various kinds of transposable elements1
Rice is considered to be a model monocot plant because of its relatively small genome size and its high-synteny with other cereal crops.10
As the first step in the functional characterization of the rice genome, the International Rice Genome Sequencing Project (IRGSP) completed a high-quality map-based sequencing of the genome of Oryza sativa L. ssp. japonica cv. Nipponbare.11
15
Even though it covers >95% of the estimated 390 Mb total genome sequence, the published rice genomic sequence includes the centromeric regions of chromosomes 4, 5 and 8, and only portions of the centromeric regions of the 9 remaining chromosomes. The overall sequence data corresponds to only 13% of the full set of rice centromeric regions.
The rice centromeric region consists mainly of repeats of 155165 bp RCS2/TrsD/CentO satellite DNA and the centromere-specific retroelement RIRE7/CRR.16
21
In addition to the repetitive elements, high-quality sequence data of the centromeres of chromosomes 4 and 8 revealed that genes are predicted to lie between and around the satellite sequences,16
,17
and 12 of these genes are expressed in leaf and root tissues.18
Do other rice centromeric regions contain active genes? Although the Rice Genome Research Program (RGP) attempted to construct sequence-ready PAC/BAC physical maps of the centromeric regions, interference with repetitive sequences prevented further chromosomal walking and subsequent genomic sequencing of the regions.11
Here we describe a comprehensive analysis that identifies and maps centromeric sequences derived from the 12 rice chromosomes. We reveal the occurrence of expressed genes, organellar insertions, relatively long simple sequence repeats (SSRs), and ribosomal RNA genes in rice centromeric regions. In addition, we describe the composition and distribution of centromeric repetitive sequences. Finally, we evaluate our PAC/BAC library and discuss strategies that might be used to complete the genomic sequencing of rice centromeric regions.
| 2. Materials and Methods |
|---|
|
|
|---|
2.1. PCR screening
PAC and BAC libraries were constructed from genomic DNA derived from the rice cultivar Nipponbare (JP 229579 in NIAS GenBank, Oryza sativa L. ssp. japonica) that was generated by the Rice Genome Research Program. PCR screening was performed as described previously.22
2.2. BAC/PAC end sequencing
The BAC and PAC clones were grown for 22 h with shaking. DNA of each BAC/PAC clone was prepared in a 96-well plate using a liquid-handling robot (Quadra 3, Tomtec, Hamden, CT, USA). Bacterial cells were harvested by centrifugation and lysed according to the standard alkaline lysis protocol. The insert DNA was purified using isopropanol precipitation, dissolved in deionized H2O and sheared into
1 kb fragments by sonication (Sonifier 450, Branson, Danbury, CT, USA). Sequencing was performed using the Big Dye terminator reaction mix (ABI, Foster City, CA, USA) on ABI3700 capillary sequencers (ABI). Repetitive elements were identified by BLAST analysis using the TIGR repetitive sequences library (http://www.tigr.org/tdb/e2k1/osa1/blastsearch.shtml). Sequence similarity search was performed against the rice full-length complementary DNA (FL-cDNA) sequences23
,24
including previously unmapped sequences (identity <95%, total coverage <90%).
2.3. DNA fingerprinting
Agarose-gel-based fingerprinting methodology25
was used to generate HindIII fingerprints of PAC/BAC clones. PAC/BAC DNAs were isolated using a REAL kit (Qiagen, Hilden, Germany) and then digested with HindIII. The restriction digestion fragments were electrophoresed in 1.2% agarose gels. The gels were then stained with SYBR Green I (Invitrogen, Carlsbad, CA, USA). Positions of DNA bands on the gel images were identified using the program Image25
and checked manually. DNA fragments derived from vector DNA (one fragment from pBeloBAC11 or four fragments from pCYPAC2) were removed manually. Fingerprints derived from PAC/BAC clones in pericentromeres were integrated to determine the chromosomal location of each contig. The initial fingerprint assembly was performed automatically using fpc V4.725
27
at a Sulston score cut-off of E12 and a tolerance of 2. To correct errors, the contigs were edited manually at cutoff of E8 to E13 and a tolerance of 2.
2.4. Genetic mapping
Genetic mapping was performed according to previously published methods28
with minor modifications. In brief, after a BLAST search against the TIGR rice repeat database and the RGP genomic sequence database, unique sequences were selected. Rice molecular linkage map using 186 F2 plants derived from a single cross between the japonica cultivar Nipponbare and the indica cultivar Kasalath was used.
2.5. Fluorescence in situ hybridization (FISH)
Fiber-FISH analysis was performed according to previously published methods29
with minor modifications. PCR fragments amplified from centromere specific repeats or retrotransposon sequences in the centromeric region of chromosome 8 were used as DNA probes. DNA fragments of
5 kb were amplified by PCR and then mixed to make a DNA probe for retrotransposons. Probes were labelled with digoxigenin- or biotin-labelled dUTP and detected with a fluorescein isothiocianate-conjugated anti-digoxigenin antibody or Cy3-conjugated avidin. All fluorescent images were captured using a fluorescence microscope (BX51, Olympus, Tokyo, Japan) with a charge-coupled device camera (CoolSNAP HQ, Roper Scientific, Tucson, AZ, USA).
| 3. Results and Discussion |
|---|
|
|
|---|
3.1. Screening of PAC/BAC clones that cover the 12 centromeric regions of rice
In this work we sought to comprehensively screen PAC/BAC clones from the 12 rice centromeric regions. Because all rice centromeres contain centromere-specific tandem repeats, RCS2/TrsD/CentO satellite sequences,19
|
In addition to the RCS2/TrsD/CentO satellite sequence, other rice centromere-specific motifs include the RCH1, RCH2, RCH3 and RCS1 sequences.19
3.2. Composition and structure of repetitive sequences in rice centromeric regions
We analyzed the sequence of both ends for a total of 545 rice centromere-specific clones (Supplementary Table 1). BLASTN analysis using various databases revealed the composition of centromere sequences. The analysis identified the RCS2/TrsD/CentO satellite sequence and various repetitive elements, including RIRE7/CRR, RIRE8, Squiq, miniature inverted-repeat transposable elements (MITEs), Anaconda and CACTA, in rice centromeric regions (Table 2). The RCS2/TrsD/CentO satellite accounted for 34.7% of the total region sequenced in this study, and RIRE7/CRR, a centromere-specific Ty3/gypsy type retroelement of rice,20
,21
accounted for 14.9% (Table 2). Thus, these 2 centromere-specific repetitive elements comprised 49.6% of the sequenced region. The MITE family30
,31
accounted for 12.7% of the total region sequenced (Table 2). MITEs have a pronounced bias toward euchromatic regions11
,13
,14
and are considered to be associated with genes.32
,33
This propensity suggests that genes might be present near the MITEs in rice centromeric regions.
|
A cytogenetic approach was applied to analyze the structure of these repetitive sequences. Fiber-FISH analysis using RCS2/TrsD/CentO and three repetitive sequences we described in Table 2 as probes indicated that RCS2/TrsD/CentO sequences are arranged in discrete clusters, instead of being distributed uniformly throughout the centromeric region of rice (Fig. 1A). This result is somewhat contrary to the previous finding from pachytene FISH analysis,21
|
3.3. Identification of expressed genes, SSRs and organellar insertions in rice centromeric regions
To identify expressed genes in rice centromeric regions, we conducted a sequence similarity search between the end sequences of the screened PAC/BAC clones and rice full-length cDNA (FL-cDNA) sequences23
|
We identified SSRs in rice centromeric regions. Among the 47 distinctive motif families of rice SSRs,11
The sequences we identified also include data suggestive of insertion of ancestral organellar DNA in the rice nuclear genome. P0698F05_F is similar to the DNA sequence that encodes mitochondrial NADH dehydrogenase subunit 1 (Supplementary Table 1). This result suggests that mitochondrial insertions occur even in centromeric regions and similar organellar insertions account for 0.180.19% of the published nuclear genome of rice.11
3.4. Mapping of PAC/BAC sequence data, genes and repetitive sequences
We constructed clone-based physical maps of the 12 rice centromeric regions. A total of 545 centromere-specific clones and 42 selected clones (position markers, Supplementary Table 2) were subjected to DNA fingerprinting analysis to make contigs. Contigs were mapped to a specific centromeric region if they contained marker clones. The locations of four newly mapped PAC/BAC sequences corresponded to genetically identify centromeric regions of the 12 rice chromosomes. In total, 13 contigs and 4 clones were mapped genetically or in silico (Supplementary Figure 1), corresponding to 2.98 Mb of the 12 rice centromeric regions. In addition, 2 contigs totaling 360 kb in size have not been anchored yet, and 2 FL-cDNA sequences (AK067951
[GenBank]
, AK063425
[GenBank]
) belonged to unmapped contigs (Table 3, Supplementary Figure 1). Our findings suggest that these contigs might exist as gene-containing islands in centromeric regions.
After we mapped the 13 contigs, we then mapped the genes and transposable elements described in section 3.2. Mapping the 13 contigs allowed us to map the genes and transposable elements described earlier to their respective chromosomes in silico so there was no need to map each gene or motif individually (Table 4). In addtion to the repetitive elements, the 5S rDNA genes were mapped on chromosome 11 (Table 4). It was previously reported that the 5S rDNA cluster lies close to the centromeric region on the short arm of chromosome 11.34
We confirmed the presence of the single long cluster of 5S rDNA near the RCS2/TrsD/CentO cluster by using fiber-FISH and found that the gap between these 2 clusters is <50 kb (Fig. 1E). The 45S rDNA consisting of 17S, 5.8S and 25S rDNA coding units was identified on chromosome 1 (Table 4). However, fiber-FISH and mitotic FISH both failed to identify the 45S rRNA cluster near the RCS2/TrsD/CentO satellite (data not shown); the 45S rDNA cluster previously had been detected at the end of the short arm of chromosome 9.35
Our finding suggests that 45S rDNA sequences are not clustered but are dispersed or fragmented in the centromeric region.
|
We encountered difficulty in constructing a complete physical map using PAC/BAC clones because the clones only minimally extended the contigs that had been constructed by chromosomal walking. Overall, 71% (427 of 598) of the PAC/BAC clones have not been mapped because they contain too few DNA fragments to facilitate assigning into contigs by DNA fingerprinting methods (Supplementary Figure 2). The distribution of insert length indicates that 47.2% (188 of 398) of centromeric PAC clones have inserts <60 kb (Fig. 2), compared with 13.6% in the total library.36
|
3.5. Toward the complete identification of genes in centromeric regions
In this study, we obtained PAC/BAC clones and constructed a partial physical map of unsequenced rice centromeric regions (Supplementary Figure 1), which could be useful for analysis of the centromeres by clone-by-clone genomic sequencing and subsequent prediction of genes. Comparison between the end sequences of the PAC/BAC clones and sequences of rice FL-cDNAs (Table 3) suggested that expressed genes exist within and around the RCS2/TrsD/CentO clusters in the centromeric regions. Previous studies have also identified new genes in the centromeres of choromosomes 416
It is important to note that our discussion is limited to the centromeric regions represented by the PAC/BAC clones we obtained. Although we tried to analyze rice centromeric sequences comprehensively, the sequences we obtained in this study may not completely cover the entire rice centromeric region. The percentage of RCS2/TrsD/CentO in the rice genome data is
1.8% (6.9 of 390 Mb) according to an estimate based on the signal intensity of RCS2/TrsD/CentO satellites in FISH analysis21
but only 0.46% in the PAC/BAC libraries used here (Table 1). This suggests that the PAC/BAC clones obtained is insufficient to cover the overall rice centromeric region. In addition, we sequenced only the ends of the clones. Because the average read length was 536 bp after trimming of the vector sequence, a total of 0.58 Mb (including redundancy) was sequenced, corresponding to 8.4% (0.58 of 6.9 Mb) of the estimated length of the 12 rice centromeric regions.
Complete genomic sequencing of the remaining centromeric regions, which are composed mainly of repetitive elements, clearly presents a considerable challenge. We consider that almost all remaining regions, typically long clusters of RCS2/TrsD/CentO satellites (Figure 1), would not be clonable because they contain few restriction enzyme sites and because the clones would be unstable in Escherichia coli (Fig. 2). Therefore, PAC/BAC clone-based sequencing and other essentially different methods for genome analysis, such as fiber-FISH, is likely needed to be used in concert to achieve a strategy that will be effective in completely characterizing the 12 rice centromeric regions.
| Supplementary data |
|---|
|
|
|---|
Supplementary data are available online at www.dnares.oxfordjournals.org.
| Acknowledgements |
|---|
|
|
|---|
The authors thank Dr B. A. Antonio for critical reading of the manuscript. This work was supported by Grants-in-Aid for Scientific Research for Young Scientists (B) 17710165 from the Ministry of Education, Culture, Sports, Science and Technology of Japan to H.M., and grants GD-2007 from the Ministry of Agriculture, Forestry and Fisheries of Japan.
| Footnotes |
|---|
*To whom correspondence should be addressed. Tel: +81 29 838 7441, Fax: +81 29 838 7468, E-mail: mat{at}nias.affrc.go.jp
Communicated by Satoshi Tabata
| REFERENCES |
|---|
|
|
|---|
- Henikoff, S. and Dalal, Y. 2005, Centromeric chromatin: what makes it unique? Curr. Opin. Genet. Dev., 15, 177184.[CrossRef][ISI][Medline]
- Hall, A. E., Keith, K. C., Hall, S. E., Copenhaver, G. P., and Preuss, D. 2004, The rapidly evolving field of plant centromeres, Curr. Opin. Plant Biol., 7, 108114.[CrossRef][ISI][Medline]
- Houben, A. and Schubert, I. 2003, DNA and proteins of plant centromeres, Curr. Opin. Plant Biol., 6, 554560.[CrossRef][ISI][Medline]
- Lamb, J. C. and Birchler, J. A. 2003, The role of DNA sequence in centromere formation, Genome Biol., 4, 214.[CrossRef][Medline]
- Dawe, R. K. 2005, Centromere renewal and replacement in the plant kingdom, Proc. Natl Acad. Sci. USA, 102, 1157311574.
[Free Full Text] - Feschotte, C., Jiang, N., Wessler, S. R. 2002, Plant transposable elements: where genetics meets genomics, Nat. Rev. Genet., 3, 329341.[CrossRef][ISI][Medline]
- Cooke, H. J. 2004, Silence of the centromeresnot, Trends Biotechnol., 22, 319321.[CrossRef][ISI][Medline]
- Kumekawa, N., Hosouchi, T., Tsuruoka, H., Kotani, H. 2000, The size and sequence organization of the centromeric region of Arabidopsis thaliana chromosome 5, DNA Res., 7, 315321.[Abstract]
- Kumekawa, N., et al. 2001, The size and sequence organization of the centromeric region of Arabidopsis thaliana chromosome 4, DNA Res., 8, 285290.[Abstract]
- Devos, K. M. 2005, Updating the crop circle, Curr. Opin. Plant Biol., 8, 155162.[CrossRef][ISI][Medline]
- International Rice Genome Sequencing Project. 2005, The map-based sequence of the rice genome, Nature, 436, 793800.[CrossRef][Medline]
- Sasaki, T., Matsumoto, T., Yamamoto, K., et al. 2002, The genome sequence and structure of rice chromosome 1, Nature, 420, 312316.[CrossRef][Medline]
- Feng, Q., Zhang, Y., Hao, P., et al. 2002, Sequence and analysis of rice chromosome 4, Nature, 420, 316320.[CrossRef][Medline]
- Rice Chromosome 10, Sequencing Consortium. 2003, In-depth view of structure, activity, and evolution of rice chromosome 10, Science, 300, 15661569.
[Abstract/Free Full Text] - Rice Chromosomes 11 and 12 Sequencing Consortia. 2005, The sequence of rice chromosomes 11 and 12, rich in disease resistance genes and recent gene duplications, BMC Biol., 3, 118.[CrossRef][Medline]
- Zhang, Y., Huang, Y., Zhang, L., et al. 2004, Structural features of the rice chromosome 4 centromere, Nucleic Acids Res., 32, 20232030.
[Abstract/Free Full Text] - Wu, J., Yamagata, H., Hayashi-Tsugane, M., et al. 2004, Composition and structure of the centromeric region of rice chromosome 8, Plant Cell, 16, 967976.
[Abstract/Free Full Text] - Nagaki, K., Cheng, Z., Ouyang, S., et al. 2004, Sequencing of a rice centromere uncovers active genes, Nat. Genet., 36, 138145.[CrossRef][ISI][Medline]
- Dong, F., Miller, J. T., Jackson, S. A., Wang, G. L., Ronald, P. C., Jiang, J. 1998, Rice (Oryza sativa) centromeric regions consist of complex DNA, Proc. Natl Acad. Sci. USA, 95, 81358140.
[Abstract/Free Full Text] - Kumekawa, N., Ohmido, N., Fukui, K., Ohtsubo, E., Ohtsubo, H. 2001, A new gypsy-type retrotransposon, RIRE7: preferential insertion into the tandem repeat sequence TrsD in pericentromeric heterochromatin regions of rice chromosomes, Mol. Genet. Genomics, 265, 480488.[CrossRef][ISI][Medline]
- Cheng, Z., Dong, F., Langdon, T., et al. 2002, Functional rice centromeres are marked by a satellite repeat and a centromere-specific retrotransposon, Plant Cell, 14, 16911704.
[Abstract/Free Full Text] - Wu, J., Mizuno, H., Hayashi-Tsugane, M., et al. 2003, Physical maps and recombination frequency of six rice chromosomes, Plant J., 36, 720730.[CrossRef][ISI][Medline]
- Kikuchi, S., Satoh, K., Nagata, T., et al. 2003, Collection, mapping, and annotation of over 28 000 cDNA clones from japonica rice, Science, 301, 376379.
[Abstract/Free Full Text] - The Rice Annotation Project. 2007, Curated Genome Annotation of Oryza sativa ssp. japonica and Comparative Genome Analysis with Arabidopsis thaliana, Genome Res., 17, 175183.
[Abstract/Free Full Text] - Sulston, J., Mallett, F., Staden, R., Durbin, R., Horsnell, T., and Coulson, A. 1988, Software for genome mapping by fingerprinting techniques, Comput. Appl. Biosci., 4, 125132.
[Abstract/Free Full Text] - Soderlund, C., Humphray, S., Dunham, A., and French, L. 2000, Contigs built with fingerprints, markers, and FPC V4.7, Genome Res., 10, 17721787.
[Abstract/Free Full Text] - Ness, S. R., Terpstra, W., Krzywinski, M., Marra, M. A., Jones, S.J. 2002, Assembly of fingerprint contigs: parallelized FPC, Bioinformatics, 18, 484485.
[Abstract/Free Full Text] - Harushima, Y., Yano, M., Shomura, A., et al. 1998, A high-density rice genetic linkage map with 2275 markers using a single F2 population, Genetics, 148, 479494.
[Abstract/Free Full Text] - Mizuno, H., Wu, J., Kanamori, H., et al. 2006, Sequencing and characterization of telomere and subtelomere regions on rice chromosomes 1S, 2S, 2L, 6L, 7S, 7L and 8S, Plant J., 46, 206217.[CrossRef][ISI][Medline]
- Bureau, T. E. and Wessler, S. R. 1992, Tourist: a large family of small inverted repeat elements frequently associated with maize genes, Plant Cell, 4, 12831294.
[Abstract/Free Full Text] - Bureau, T. E. and Wessler, S. R. 1994, Stowaway: a new family of inverted repeat elements associated with the genes of both monocotyledonous and dicotyledonous plants, Plant Cell, 6, 907916.[Abstract]
- Bureau, T. E. and Wessler, S. R. 1994, Mobile inverted-repeat elements of the Tourist family are associated with the genes of many cereal grasses, Proc. Natl Acad. Sci. USA, 91, 14111415.
[Abstract/Free Full Text] - Mao, L., Wood, T. C., Yu, Y., et al. 2000, Rice transposable elements: a survey of 73 000 sequence-tagged-connectors, Genome Res., 10, 982990.
[Abstract/Free Full Text] - Kamisugi, Y., Nakayama, S., Nakajima, R., Ohtsubo, H., Ohtsubo, E., Fukui, K. 1994, Physical mapping of the 5S ribosomal RNA genes on rice chromosome 11, Mol. Gen. Genet., 245, 133138.[ISI][Medline]
- Shishido, R., Sano, Y., Fukui, K. 2000, Ribosomal DNAs: an exception to the conservation of gene order in rice genomes, Mol. Gen. Genet., 263, 586591.[CrossRef][ISI][Medline]
- Baba, T., Katagiri, S., Tanoue, H., et al. 2000, Construction and characterization of rice genomic libraries: PAC library of Japonica Variety, Nipponbare and BAC library of Indica Variety, Kasalath, Bull. Natl Inst. Agrobiol. Resour., 14, 4152.
This article has been cited by other articles:
![]() |
H. Mizuno, J. Wu, Y. Katayose, H. Kanamori, T. Sasaki, and T. Matsumoto Chromosome-Specific Distribution of Nucleotide Substitutions in Telomeric Repeats of Rice (Oryza sativa L.) Mol. Biol. Evol., January 1, 2008; 25(1): 62 - 68. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||


