DNA Research Advance Access originally published online on September 16, 2006
DNA Research 2006 13(3):111-121; doi:10.1093/dnares/dsl003
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Whole-Genome Microarray in Arabidopsis Facilitates Global Analysis of Retained Introns
Department of Plant Sciences, Weizmann Institute of Science Rehovot 76100, Israel
Received 2 May 2006; revised 19 July 2006
| Abstract |
|---|
|
|
|---|
Alternative splicing (AS) is an important post-transcriptional regulatory mechanism that can increase protein diversity and affect mRNA stability. Different types of AS have been observed; these include exon skipping, alternative donor or acceptor site and intron retention. In humans, exon skipping is the most common type while intron retention is rare. In contrast, in Arabidopsis, intron retention is the most prevalent AS type (
40%). Here we show that direct transcript expression analysis using high-density oligonucleotide-based whole-genome microarrays (WGAs) is particularly amenable for assessing global intron retention in Arabidopsis. By applying a novel algorithm retained introns are detected in 8% of the transcripts examined. A sampling of 14 transcripts showed that 86% can be confirmed by RTPCR. This rate of detection predicts an overall total AS rate of 20% for Arabidopsis compared with 1022% based on EST/cDNA-based analysis. These findings will facilitate monitoring constitutive and dynamic whole-genome splicing on the next generation WGA slides.
Key words: microarray; TILING; alternative splicing; Arabidopsis; intron retention; NPR-1; GIGANTEA
| 1. Introduction |
|---|
|
|
|---|
Alternative RNA processing pathways are a result of combining different splice junctions that are present in pre-mRNA transcripts. In this way, a variety of mRNA and proteins can be created from the same gene. Alternative splicing (AS) is thought to play a major role in expanding the potential informational content of eukaryotic genomes. Recent evidence indicates a high incidence (3260%) of AS in the human genome, predominantly in the form of exon skip while a minor form was intron retention (516%).1
Plants are thought to exhibit less AS (1022%).3
,7
,8
Computational analysis of AS in Arabidopsis by EST-pair alignment identified 436 alternatively spliced genes.9
Unexpectedly, intron retention was found to be the most common type (45%). Retained introns were shown to be present in RNA derived from polyribosomes, demonstrating that these intron retention events are not the byproduct of incomplete splicing but are found in a translatable context in the cytoplasm.9
Iida et al.7
aligned 248 514 RAFL (RIKEN Arabidopsis Full-Length) cDNA/EST sequences to the Arabidopsis genome using a BLAST-based method.10
They identified 15 214 transcription units (TUs) containing at least two sequences each and observed AS for 11.6% of these TUs of which 44% were retained introns.7
In a recent study using a large collection of EST/cDNA, 4707 (21.8%) of the transcripts showed 8264 AS events. Approximately 56% of these events were of the intron retention type.8
These studies confirmed that a low percentage of genes (1022%) are alternatively spliced and that intron retention is the most prevalent AS type in Arabidopsis.
Calculating the rate of AS using expressed sequence data (ESTs and cDNAs) can underestimate the amount of AS for the following reasons. Collectively, the EST and cDNA database in plants is relatively small and, thus, may lack low-abundance transcripts. In addition, AS predictions based on cDNA data are in many cases biased towards the termini of transcripts owing to the preponderance of end-sequence reads among ESTs and oligo(dT)-based priming for reverse transcription. Recently, high-density oligonucleotide-based whole-genome microarrays (WGAs) have emerged as a novel platform for genomic analysis beyond gene expression profiling. Unbiased WGAs can be used for epigenomic mapping11
and have been used to monitor exon skipping in humans.12
,13
By comparing the expression levels of all the exons within genes, it was shown that a large fraction (
80%) of expressed genes on chromosomes 21 and 22 exhibit exon skipping. This estimate is significantly higher than previous predictions, which was based on computational analysis of ESTs and cDNAs.4
The array-based approach to analyzing AS also has limitations. For example, the detection of relatively rare splicing events is difficult. Furthermore, in a manner analogous to obtaining multiple EST libraries from different tissue, a large number of distinct tissues or cell populations must be surveyed to obtain sufficient confidence to call a splice variant at a specific splice junction.
In this work by using published WGA information, we monitor intron retention in Arabidopsis and show that it is readily detectable, generally leading to transcripts containing new reading frames that would result in truncated gene products.
| 2. Materials and methods |
|---|
|
|
|---|
2.1. Data sources
Arabidopsis genes, introns and coding sequences annotations and their genomic location were obtained from TAIR 6.0 release databases ftp://ftp.arabidopsis.org/home/tair/Sequences/blast_datasets/TAIR6_seq_20051108 ftp://ftp.arabidopsis.org/home/tair/Sequences/blast_datasets/TAIR6_intron_20060221 ftp://ftp.arabidopsis.org/home/tair/Sequences/blast_datasets/TAIR6_cds_20051108 ftp://ftp.arabidopsis.org/home/tair/home/tair/Sequences/blast_datasets/TAIR6_3_UTR_20060126 ftp://ftp.arabidopsis.org/home/tair/home/tair/Sequences/blast_datasets/TAIR6_5_UTR_20060126 ftp://ftp.arabidopsis.org/home/tair/Maps/seqviewer_data/sv_gene_feature.data In order to allocate each probe to its TUs, programs were written in PERL. Those programs analyzed the BLAST results of the probes and the different databases. Programs are available upon request from H.N-G.
2.2. Normalization of the whole-genome array-WGA
The hybridization value from a set of 12 oligonucleotide arrays representing 94% of the Arabidopsis genome sequence (110 Mb) was used.14
Each array contains 835 000 oligos of 25 mer size. Four RNA populations were hybridized to these arrays.14
The probe sequences and the hybridization values for the 12 slides were taken from GEO Accession viewer http://www.ncbi.nlm.nih.gov/projects/geo/query/acc.cgi? using the GEO accession GPL slide numbers (GPL432
[NCBI GEO]
-GPL443). Hybridization data were normalized by taking the log2 of the data of all probes and then by dividing each probe intensity value in a given experiment by the median intensity value of all probes in that experiment. As reported, the median intensity value of all probes were determined to be a good estimate of the background noise level in a given experiment.14
2.3. Retained-intron analysis
An intron was defined as candidate constitutive or as a retained intron based on its combined probes hybridization values relative to the mean hybridization value of the other introns and exons in the transcript. In order to define retained introns in an expressed transcript the probes of its exons were treated as one group, while each intron, which contained at least three probes, was treated as different group. A balanced one-way ANOVA for comparing the means of all the groups in a transcript was performed. The ANOVA test evaluates the hypothesis that the samples all have the same mean against the alternative hypothesis that the means are not all the same. To insure a False Discovery Rate (FDR) of at most 0.05, the Benjamini-Hochberg (BH) method was used.15
Transcripts in which the exons mean was at least 1.09 and the introns mean was less than 1.09 were ranked according to their P ANOVA value. The P ANOVA value was then compared with the BH critical value to select significant values. The BH critical values and the P-value for each ANOVA test are shown in Supplementary Table S2 available at www.dnaresearch.oxfordjournals.org. Using these results a test was then performed that determines which pairs of means are significantly different and which are not. This test employs the Tukey's least significant difference procedure. PERL programs were written in order to differentiate between groups that have the same mean to groups that are different from each other. Programs are available upon request from H.N-G.
2.4. Gene ontology
Gene ontology classification of transcripts was carried out using the first level categories of the Gene Ontology classification of the biological processes of their protein products (ftp://ftp.arabidopsis.org/home/tair/Ontologies/Gene_Ontology/ATH_GO_GOSLIM.20060121.txt). Gene ontology categories that deviate from cumulative binomial distribution with the probability for retained-intron genes P = 0.08 and 1 P for constitutive introns genes, with significant P-value set at 0.05, are summarized.
2.5. Growth of plants, cell culture, RNA extraction and RTPCR
Arabidopsis thaliana (ecotype Columbia) seeds were surface sterilized and placed onto sterilized MS containing Petri dishes. The seeds were cold-treated at 4°C for 2 days and then incubated 7 days at 16 h light/8 h dark cycle. Flower buds were collected from mature Arabidopsis plants (57 weeks old) grown on soil. Arabidopsis suspension-culture cells were a generous gift from Dr Gideon Grafi. The culture was maintained at 16 h light/8 h dark cycle with constant shaking and was subcultured every week and harvested then. Root tissue was obtained by incubating surface-sterilized seeds in flasks containing 100 ml Gamborg's liquid media. The seedlings were incubated for 2 weeks with constant agitation under 16 h light/8 h dark cycle, and root tissue was collected and frozen.
RNA was extracted from the four different tissues, using RNeasy (Qiagen, Hilden, Germany). RTPCR was conducted as previously published.16
Oligonucleotide primers were designed for simultaneous amplification of both the spliced and unspliced variants or by using an internal intron probe in which the opposing primer was from the next non-adjacent exon. A complete list of the PCR primers used in this study is shown as Supplementary Table S6 available at www.dnaresearch.oxfordjournals.org.
| 3. Results |
|---|
|
|
|---|
3.1. Division of probes in WGA to accommodate gene structure-criteria
A unique set of whole-genome arrays representing
94% of the Arabidopsis genome sequence are composed of a set of 12 microarrays (6 for the forward strand and 6 for the reverse complement), each containing
835 000 oligos of 25 mer size. Each array contains an end-to-end tile of oligonucleotides (from head to tail) with no overlap in their DNA sequences. In this array, only perfect match probes were included in the whole-genome array design. The standard Affymetrix Arabidopsis gene expression controls are present on each of these chips. Recently, RNA from four sources were used to probe this array to determine by empirical analysis the transcriptional activity of the Arabidopsis genome.14We wished to ascertain the possibility of refining from this data knowledge about alternative transcripts. In order to parse the data into established annotation of genes and their exons and introns structure the genes structures predicted by TAIR (The Arabidopsis Information Resource; http://www.arabidopsis.org/) were used. The gene structure includes the genomic location and sequence of each exon and intron. In cases in which one gene includes more then 1 transcript (e.g. known AS), the transcript structure that included the most exons was used so that a gene was divided into as many distinct groups as possible. The oligos were aligned to the exons and introns sequences in order to allocate oligos to exons and introns. The rest of the oligos that include control probes, probes with more than 1 hit in the genome, intergenic probes as well as probes that fall on intron/exon borders were removed. Thus, in total, 1 682 698 unique probes of the introns or exons were found to be amenable to analysis (Supplementary Table S1 is available at www.dnaresearch.oxfordjournals.org).
3.2. Criteria for differentiating introns and exons in the whole-genome screen
Our aim was to devise statistical tests for each transcriptional unit so that aberrant behavior of introns and exons could be registered. Hybridization values were normalized by dividing each probe intensity value in a given experiment by the median intensity value of all probes in that experiment as done previously.14
As hybridization values are distributed normally in a log scale, we transformed all the values to log2 and then normalized by dividing each probe intensity value in a given experiment by the median intensity value of all probes in that experiment.
We first examined the possibility of assessing retained introns by comparing the expression data of individual introns compared with a transcripts exon hybridization value. An intron was defined as candidate constitutive or as retained intron based on its mean hybridization value relative to the mean hybridization value of all introns and exons in a TU. To establish global statistical parameters that can be used to distinguish between intron and exon scores in a transcriptional unit, we first explored the hybridization values of introns and exons in the full genome expression database. Genes that do not contain introns or that have multiple non-unique probes in their introns are not included in this analysis. The hybridization values of 18 275 genes were used (Table 1). The mean hybridization values of all the probes that belong to the exons of a specific gene were calculated, as well as the mean hybridization values of all the probes that belong to the gene introns. This assumes that the vast majority of plant introns are constitutively spliced. The mean hybridization values distribution of all the transcripts exons and introns is shown in Fig. 1. The two samples are statistically different based on the goodness-of-fit hypothesis test (Two-sample KolmogorovSmirnov, P < 10100). A cut-off minimum for the mean hybridization value of exons of a transcript of 1.09 was chosen for analysis as at that level 58% (10 574) of the transcripts could be analyzed but only 1.5% of the introns mean hybridization values were above this level.
|
|
3.3. Examining intron retention in whole-genome arrays
In order to identify retained introns in the 10 574 transcripts, the probes of the exons in each transcript were treated as one group, while each intron, which contained at least three probes, was treated as a different group. The three probes limitation was used to increase the significance of the statistical tests. A balanced one-way ANOVA for comparing the means of all the groups in a transcript was performed. To insure a FDR of at most 0.05, the BH method was used.15
In Class 1 transcripts, all the introns hybridization values are different from the exon group (Fig. 2A). In the example shown, the AT2G36530 transcript is defined as containing only constitutive introns as the exons group is significantly different from all the introns. In Class 2 transcripts, one intron is statistically similar to the exons group. Thus, in the case of transcript AT2G47470, intron 5 is defined as a retained intron (Fig. 2B). Intermediate cases are exemplified in Class 3 (intermediate retained). In this case the exceptional intron is expressed at a level that is between the exons group and all the other introns. In the example shown in Fig. 2C, the AT5G11200 transcript shows that intron 6 hybridization values are between the exon and intron groups.
|
Examination of the rest of the transcripts shows that 8206 out of 10 270 (80%) fall into one of the intron classes. In the remaining 2064 cases, there is no clear statistical differentiation of the intron and exon groups. In these cases, the transcript exon and intron groups are compared again using a lower confidence level (a = 0.1 instead of 0.05). This causes the interval around the mean of each group to contract. At this probability level conclusions could be drawn for an additional 620 cases. In the remaining 14% of transcripts the statistical variance was too great and no clear definition of the intron/exon expression boundaries could be made. In these cases, the transcripts were termed undifferentiated. The algorithms above as applied to the whole-genome arrays are summarized in the Table 1. Thus,
8% rate of the transcripts that could be measured contained retained introns. This is slightly higher than previous EST and cDNA-derived estimates for retained introns of between 3 to 6% of total transcripts.3
3.4. Confirmation of retained introns by RTPCR analysis
To test the veracity of using WGA expression data to characterize AS, a sample of 14 AS events and 5 constitutive-type introns was examined by RTPCR. The genes were selected such that they represent low- and high-expression value transcripts, retained introns in different parts of the transcripts and contain intermediate and retained-type introns. Each PCR was carried out using RNA that was extracted from four different samples including; seedlings, flowers, suspension-culture cells and roots that were combined together. Retained introns were traced either by primers that flank the intron or by using an internal intron probe in which the opposing primer was from the next non-adjacent exon. In this way, retained (I) and the smaller spliced fragment (S) could be recovered. However, if a fragment larger than that predicted by I was detected, i.e. a fragment that included the non-adjacent but normally constitutively spliced intron was detected, this indicated that the RTPCR amplification used pre-mRNA as its template and was rejected. All fragments containing putative introns were confirmed by sequencing (data not shown). The results shown in Fig. 3 and summarized in Table 2 indicate that 12 out of 14 introns (86%) that were predicted to be retained by the expression analysis were detected by RTPCR. In a similar manner, when constitutive introns were examined 4 out of 5 could be verified as being constitutive.
|
|
3.5. Feasibility of detecting exon skip in whole-genome arrays
In theory, a modification of the same algorithm should be applicable to estimating exon skip. However inspection of Fig. 1 shows that the distribution of the introns mean hybridization value is much narrower than the exons mean hybridization value distribution. Thus, the lower hybridization values expected from potentially skipped exons would fall well within the normal exons distribution values, making the direct application of WGA difficult. The limitation of this application is exemplified when the same statistical methods used for detection of retained introns are applied to detect exon skip. Thus, to detect exon skip in a transcript, the probes of its introns were treated as one group, while each exon, which contained at least three probes, was treated as a different group. Exon skip of the first or last exons were excluded as these are probably due to 5' and 3' transcription start/finish differences and do not necessarily arise from splicing. Furthermore in that case, any data retrieved would be impossible to verify owing to the lack of flanking sequence for RTPCR priming. The results of the search for exons skips were divided as for the introns. Examination of internal exons using the statistical tests and parameters as those applied for examination of the introns showed an exon skip level of more than 8% (Table 1). A sample of previously un-annotated AS events in Arabidopsis was experimentally tested by RTPCR. Each PCR was done using RNA that was extracted from four different samples as carried out for intron retention analysis. However, in this case only a low rate verification was achieved, 3 out of 8 (37.5%; Supplementary Table S3 available at www.dnaresearch.oxfordjournals.org), Thus, owing to the difficulty of separating the exons distribution from the introns distribution in the available database, the detection of exon skip is unreliable.
3.6. Comparison of the results for intron retention to EST-derived estimates
Identification of alternative transcripts have relied on comparison of sequences from EST and cDNA databases, a method that is inherently different from the type of analysis carried out here, and, therefore, it is worthy to compare the results. The TAIR6 release contains 31 407 putative genes of which 3159 (10%) have annotated splice variants. Using TAIR annotation (ftp://ftp.arabidopsis.org/home/tair/Maps/seqviewer_data/SV_gene_feature.data) the type of each splice variant was determined. Out of the 3159 genes, 2251 can be analyzed in the WGA experiments and were composed of two transcripts that could be compared to one another. Of these 468 were shown to be of the retained-intron type, a rate that is considerably lower than the more than 40% that was previously observed.3
,7
9
In order to compare the efficiency of both methods the limitations applied to the WGA analysis were applied to the retained introns in the TAIR database. Of the 468 retained introns, 257 met the criteria employed to analyze the existing WGA data. Of these 45 or 18% were detected in common by the WGA method (Supplementary Table S4 is available at www.dnaresearch.oxfordjournals.org). These results indicate the overlapping but complementary nature of both approaches as a tool to determine AS.
3.7. Distribution of biological functionalities of transcripts with retained introns
Examining the assigned global functional activity of transcripts with alternatively spliced retained introns will contribute to understanding their function in the biology of the plant. Genes that are analyzed in this study are specific to flowers, root, cell culture or seedling and as a result their gene ontology will be different from the total genes ontology. Therefore, in our case, changes in gene ontology distribution were sought by comparing the retained introns group with the constitutive introns group analyzed in this work rather than to total genes ontology. As nearly 8% of the genes contain retained introns we expect a similar distribution to be emulated in the gene ontology. Deviations from the expected binomial distribution with the P-value set at 0.05 are summarized in Table 3. Strikingly, the groups that are enriched in the retained introns group include mostly RNA processing and signal transduction. In contrast, transcripts related to protein metabolism are particularly underrepresented in the intron retention group.
|
3.8. Physical characterization of retained introns
Differences in the physical characteristics of introns, such as size or flanking sequence information, may serve to characterize this class of AS. The characteristics that were measured include; position of the intron in the transcript, i.e. present in the UTR or CDS, whether it contains an open reading frame (ORF), the intron size, its GC content and borders of the intron. Interestingly as a group, the median size of the retained introns is significantly smaller when compared with the size of the constitutive introns irrespective of the presence of an ORF (117 and 176 bp, respectively). The size distributions are statistically different based on the goodness-of-fit hypothesis test (Two-sample KolmogorovSmirnov, P = 9.5 x 1044).
We note that the median size of introns calculated from the TAIR database (ftp://ftp.arabidopsis.org/home/tair/Sequences/blast_datasets/TAIR6_intron_20060221) is 99 bp. The higher means in all types of introns measured here is a direct result of the requirement that introns be represented by at least three probes (i.e. 3 probes x 25 bp size/probe). Interestingly, the GC content of retained introns is higher than that of constitutive introns (37% compare to 33%) but lower than the GC content of coding regions (44%).17
The two GC content distributions are statistically different based on the goodness-of-fit hypothesis test (Two-sample KolmogorovSmirnov P = 1.5 x 10139; Supplementary Figure S1 is available at www.dnaresearch.oxfordjournals.org). Higher GC contents have been recently detected in retained introns that were identified by genome EST/cDNA alignments.8
The possibility exists that the higher GC content reflects an artifact that is associated with enhanced non-specific hybridization of probes, i.e. they may be more sensitive to background noise. In order to assess this possibility, we examined the distribution of the GC content of the intron probes whose hybridization values were above or equal 1.09 (Supplementary Figure S2 is available at www.dnaresearch.oxfordjournals.org) In this case, the R2 for the correlation of GC content and hybridization value was found to be 0.06. Hence for the range of hybridization values used in our survey non-specific hybridization is not correlated to GC content and the values measured are a result of true specific hybridization.
Additional qualities of note in the characterization of retained introns are that, relatively more retained introns originate from the 5'- and 3'-UTR region compared with the constitutive intron distribution (13.4% compared with 7.3%; Table 4). By comparison, the global analysis predicts that at least 9% of Arabidopsis genes have introns in their UTRs.18
,19
Strikingly, inspection of Table 4 shows that 609 (86.6%) of the retained introns are inside the CDS. Yet of these, only 4.6% maintain an in-frame ORF (Table 4). Interestingly this rate, while low, is still more than 3-fold higher than that noted for constitutive introns (1.5%). Finally, Intron borders can play a crucial role in splicing efficiency.20
However a comparison between the borders of the constitutive and retained introns showed no differences (Supplementary Figure S3 is available at www.dnaresearch.oxfordjournals.org). Thus, border sequence cannot be used to predict alternative intron splicing.
|
| 4. Discussion |
|---|
|
|
|---|
We demonstrate that direct transcript expression analysis using 'high-density oligonucleotide-based WGA is particularly amenable for assessing global intron retention in Arabidopsis. However, the detection of exon skip is unreliable owing to the difficulty in separating the distribution of the expression values that distinguish exons from introns. We note that detection of alternative 5' and 3' splicing sites by analysis of WGA would be limited to those alternative sites that comprise a number of probe lengths (in the present case a minimum of 75 bp). As alternative 5' and 3' splice choices are usually of shorter length, they would not generally be detected.
Genome-wide detection of retained introns in Arabidopsis reveals that at least 8% of genes that can be surveyed by microarray include a transcript with a retained intron. A sample of these transcripts was tested using RTPCR and 86% could be verified. In our analysis, results of the WGA data were combined to enable the generation of statistically reliable outcomes. Thus, tissue-specific intron retention may be lost as combining RNA from diverse biological material can obscure the AS signal. In the future, the use of sample repeats from the same tissue will undoubtedly improve the statistical veracity of the results and optimize the detection of aberrant intron behavior. Nonetheless, as intron retention comprises above 40% of all AS events3
,7
,9
this indicates that overall the rate of AS in Arabidopsis is 20%. This rate is considerably higher than 1014%, which was garnered from compilation of EST and cDNA data from different sources3
,7
but similar to the one that was recently reported from a large collection of EST/cDNA.8
Similarly, the recent application of WGA to human chromosome 21 and 22 showed a much higher rate of AS than previously detected.13
The retained-intron type is distinct from the other AS types because it represents an absence or failure of spliceosome action rather than alternative choice of splice junctions. Our analysis detects several features that are specific to retained introns. They are smaller, have higher GC content, are relatively enriched in the UTR and those in the CDS tend to have a slightly higher chance of containing an ORF when compared with a constitutive-type introns. The size of introns has been shown to play a role in AS biology. In Drosophila, the size and location of the flanking introns control the frequency and the type of AS that a pre-mRNA transcript undergoes.21
Furthermore, it is possible that the attribute of higher GC content could contribute to a weaker U-rich splicing signal as a result of less efficient AU protein binding.22
This is because intronic U-rich or AU-rich elements can influence splice site selection and splicing efficiency particularly in plants (reviewed in Simpson and Filipowicz23
). These sequences are likely to bind proteins (analogous to animal hnRNP proteins), which may permit intron recognition and recruit or allow access to spliceosomal components and the formation of commitment splicing complexes.
Of particular note is that 82.6% (581 out of 703) of the transcripts containing retained introns are in the CDS and would shift reading frames and have an early stop codon. Of these, in 68% (477 out of 703) of the cases the intron that is retained is not the last one. (data not shown). The position of these stop codons is important in the context of potential nonsense-mediated decay (NMD) effects. NMD serves as a surveillance mechanism that removes erroneous mRNAs containing a premature termination codon.24
,25
Thus, if NMD mechanism exists in plants, its selective power of degradation is incomplete as the transcripts with AS events are readily detected. The existence of limited use of NMD in plants has recently been shown in the analysis of transcripts containing premature termination codon.26
However, the coupling of AS and NMD could be an important post-transcriptional regulation mechanism to adjust the level of transcript isoforms.27
29
Inspection of the intron retention database provides biological insight. RNA processing and signal transduction transcripts appear to be differentially enriched in the retained-introns group. A serine/arginine-rich (SR) family of proteins is implicated in constitutive and AS of pre-mRNAs. They appear in alternatively spliced forms30
,31
and at least one of them, SR1, contains a retained intron.32
Transcription factors that show AS variants were described for five genes and in one of them, At1g26260, the first intron is retained.33
Numerous studies have contributed to the view that SR proteins play a general role in splicing and can modulate splice site selection in a concentration-dependent manner (Ref. 34
and reviewed by Manley and Tacke35
). One imaginable consequence of this is that cells may regulate the expression or activity of individual SR proteins, or their antagonists, to control the expression of one or more target genes in a tissue-specific and/or developmentally regulated fashion. Transcript that contains intron may serve as a negative regulator of the SR protein concentration.
The potential regulatory power of transcripts with retained introns can be gleaned from examination of the database (Supplementary Table S2 is available at www.dnaresearch.oxfordjournals.org). Two examples are NPR1 (At1g64280) and GIGANTEA (At1g22770). NPR1 is a key regulator of the salicylic acid (SA)-mediated systemic acquired resistance (SAR) pathway and it confers resistance to pathogens. Known mutations of this gene cause loss of SAR induction, loss of expression of PR genes and increased susceptibility to infections.36
,37
WGA analysis predicts that the last intron of this gene is retained and contains an in-frame stop codon. GIGANTEA promotes flowering under long days in a circadian clock-controlled flowering pathway, together with CONSTANTS (CO) and FLOWERING LOCUS T (FT). Mutations in this gene cause late-flowering phenotype.38
WGA analysis predicts intron 10 (out of 13) to be retained and to contain a stop codon.
The biological function of retained introns remains an enigma. A recent survey of human transcripts shows that one-third of the AS contained premature termination codons that are the apparent targets of NMD. It was proposed that regulated unproductive splicing may be a means of modifying the level of protein expression or of introducing modified gene products.39
This work provides a novel database of retained introns and their genomic sites and lays a framework for future application of WGA for dynamic screening of changes in splicing during the plants' life and as a result of environmental input.
| Acknowledgements |
|---|
|
|
|---|
This research was supported by the Israel Science Foundation grant No 388/02 and by the BARD United States-Israel Binational Agricultural Research and Development Fund Grant number IS-3454-03.
Supplementary Data: Supplementary data is available at http://www.dnaresearch.oxfordjournals.org
| Footnotes |
|---|
*To whom correspondence should be addressed. Tel. +972-8-9342175. Fax. +972-8-9344181, E-mail: robert.fluhr{at}weizmann.ac.il
Communicated by Kazuo Shinozaki
| References |
|---|
|
|
|---|
- Kan, Z., States, D., Gish, W. 2002, Selecting for functional alternative splices in ESTs, Genome Res., 12, 18371845.
[Abstract/Free Full Text] - Carninci, P., Kasukawa, T., Katayama, S., et al. 2005, The transcriptional landscape of the mammalian genome, Science, 309, 15591563.
[Abstract/Free Full Text] - Nagasaki, H., Arita, M., Nishizawa, T., Suwa, M., Gotoh, O. 2005, Species-specific variation of alternative splicing and transcriptional initiation in six eukaryotes, Gene, 364, 5362.[CrossRef][ISI][Medline]
- Modrek, B. and Lee, C. 2002, A genomic view of alternative splicing, Nat. Genet., 30, 1319.[CrossRef][ISI][Medline]
- Goodison, S., Yoshida, K., Churchman, M., Tarin, D. 1998, Multiple intron retention occurs in tumor cell CD44 mRNA processing, Am. J. Pathol., 153, 12211228.
[Abstract/Free Full Text] - Mansilla, A., Lopez-Sanchez, C., de la Rosa, E.J., et al. 2005, Developmental regulation of a proinsulin messenger RNA generated by intron retention, EMBO Rep., 6, 11821187.[CrossRef][ISI][Medline]
- Iida, K., Seki, M., Sakurai, T., et al. 2004, Genome-wide analysis of alternative pre-mRNA splicing in Arabidopsis thaliana based on full-length cDNA sequences, Nucleic Acids Res., 32, 50965103.
[Abstract/Free Full Text] - Wang, B.B. and Brendel, V. 2006, Genomewide comparative analysis of alternative splicing in plants, Proc. Natl. Acad. Sci. USA, 103, 71757180.
[Abstract/Free Full Text] - Ner-Gaon, H., Halachmi, R., Savaldi-Goldstein, S., Rubin, E., Ophir, R., Fluhr, R. 2004, Intron retention is a major phenomenon in alternative splicing in Arabidopsis, Plant J., 39, 877885.[CrossRef][ISI][Medline]
- Altschul, S.F., Madden, T.L., Schaffer, A.A., et al. 1997, Gapped BLAST and PSIBLAST: a new generation of protein database search programs, Nucleic Acids Res., 25, 33893402.
[Abstract/Free Full Text] - Martienssen, R.A., Doerge, R.W., Colot, V. 2005, Epigenomic mapping in Arabidopsis using tiling microarrays, Chromosome Res., 13, 299308.[CrossRef][ISI][Medline]
- Kapranov, P., Cawley, S.E., Drenkow, J., et al. 2002, Large-scale transcriptional activity in chromosomes 21 and 22, Science, 296, 916919.
[Abstract/Free Full Text] - Kampa, D., Cheng, J., Kapranov, P., et al. 2004, Novel RNAs identified from an in-depth analysis of the transcriptome of human chromosomes 21 and 22, Genome Res., 14, 331342.
[Abstract/Free Full Text] - Yamada, K., Lim, J., Dale, J.M., et al. 2003, Empirical analysis of transcriptional activity in the Arabidopsis genome, Science, 302, 842846.
[Abstract/Free Full Text] - Benjamini, Y. and Hochberg, Y. 1995, Controlling the false discovery rate: a practical and powerful approach to multiple testing, J. R. Stat. Soc. Ser., B57, 289300.
- Savaldi-Goldstein, S., Aviv, D., Davydov, O., Fluhr, R. 2003, Alternative splicing modulation by a LAMMER kinase impinges on developmental and transcriptome expression, Plant Cell, 15, 926938.
[Abstract/Free Full Text] - Arabidopsis Genome Initiative. 2000, Analysis of the genome sequence of the flowering plant Arabidopsis thaliana, Nature, 408, 796815.[CrossRef][Medline]
- Zhu, W., Schlueter, S.D., Brendel, V. 2003, Refined annotation of the Arabidopsis genome by complete expressed sequence tag mapping, Plant Physiol., 132, 469484.
[Abstract/Free Full Text] - Haas, B.J., Delcher, A.L., Mount, S.M., et al. 2003, Improving the Arabidopsis genome annotation using maximal transcript alignment assemblies, Nucleic Acids Res., 31, 56545666.
[Abstract/Free Full Text] - Rogozin, B., Sverdlov, V., Babenko, N., Koonin, V. 2005, Analysis of evolution of exonintron structure of eukaryotic genes, Brief Bioinform., 6, 118134.
[Abstract/Free Full Text] - Fox-Walsh, K.L., Dou, Y., Lam, B.J., Hung, S.P., Baldi, P.F., Hertel, K.J. 2005, The architecture of pre-mRNAs affects mechanisms of splice-site pairing, Proc. Natl Acad. Sci. USA, 102, 1617616181.
[Abstract/Free Full Text] - Brown, J.W. 1996, Arabidopsis intron mutations and pre-mRNA splicing, Plant J., 10, 771780.[CrossRef][ISI][Medline]
- Simpson, G.G. and Filipowicz, W. 1996, Splicing of precursors to mRNA in higher plants: mechanism, regulation and sub-nuclear organisation of the spliceosomal machinery, Plant Mol. Biol., 32, 141.[CrossRef][ISI][Medline]
- Nagy, E. and Maquat, L.E. 1998, A rule for termination-codon position within intron-containing genes: when nonsense affects RNA abundance, Trends Biochem. Sci., 23, 198199.[CrossRef][ISI][Medline]
- Baker, K.E. and Parker, R. 2004, Nonsense-mediated mRNA decay: terminating erroneous gene expression, Curr. Opin. Cell Biol., 16, 293299.[CrossRef][ISI][Medline]
- Hori, K. and Watanabe, Y. 2005, UPF3 suppresses aberrant spliced mRNA in Arabidopsis, Plant J., 43, 530540.[CrossRef][ISI][Medline]
- Lejeune, F. and Maquat, L.E. 2005, Mechanistic links between nonsense-mediated mRNA decay and pre-mRNA splicing in mammalian cells, Curr. Opin. Cell Biol., 17, 309315.[CrossRef][ISI][Medline]
- Lareau, L.F., Green, R.E., Bhatnagar, R.S., Brenner, S.E. 2004, The evolving roles of alternative splicing, Curr. Opin. Struct. Biol., 14, 273282.[CrossRef][ISI][Medline]
- Yoine, M., Nishii, T., Nakamura, K. 2006, Arabidopsis UPF1 RNA helicase for nonsense-mediated mRNA decay is involved in seed size control and is essential for growth, Plant Cell. Physiol., 47, 572580.
[Abstract/Free Full Text] - Reddy, A.S. 2004, Plant serine/arginine-rich proteins and their role in pre-mRNA splicing, Trends Plant Sci., 9, 541547.[CrossRef][ISI][Medline]
- Iida, K. and Go, M. 2006, Survey of conserved alternative splicing events of mRNAs encoding SR proteins in land plants, Mol Biol Evol., [Please update reference 31 if now it has been published.].
- Lazar, G. and Goodman, H.M. 2000, The Arabidopsis splicing factor SR1 is regulated by alternative splicing, Plant Mol. Biol., 42, 571581.[CrossRef][ISI][Medline]
- Gong, W., Shen, Y.P., Ma, L.G., et al. 2004, Genome-wide ORFeome cloning and analysis of Arabidopsis transcription factor genes, Plant Physiol., 135, 773782.
[Abstract/Free Full Text] - Jumaa, H. and Nielsen, P.J. 1997, The splicing factor SRp20 modifies splicing of its own mRNA and ASF/SF2 antagonizes this regulation, EMBO J., 16, 50775085.[CrossRef][ISI][Medline]
- Manley, J.L. and Tacke, R. 1996, SR proteins and splicing control, Genes Dev., 10, 15691579.
[Free Full Text] - Cao, H., Glazebrook, J., Clarke, J.D., Volko, S., Dong, X. 1997, The Arabidopsis NPR1 gene that controls systemic acquired resistance encodes a novel protein containing ankyrin repeats, Cell, 88, 5763.[CrossRef][ISI][Medline]
- Ryals, J., Weymann, K., Lawton, K., et al. 1997, The Arabidopsis NIM1 protein shows homology to the mammalian transcription factor inhibitor I kappa B, Plant Cell, 9, 425439.[Abstract]
- Eimert, K., Wang, S.M., Lue, W.I., Chen, J. 1995, Monogenic recessive mutations causing both late floral initiation and excess starch accumulation in Arabidopsis, Plant Cell, 7, 17031712.[Abstract]
- Lewis, B.P., Green, R.E., Brenner, S.E. 2003, Evidence for the widespread coupling of alternative splicing and nonsense-mediated mRNA decay in humans, Proc. Natl Acad. Sci. USA, 100, 189192.
[Abstract/Free Full Text] - Berardini, T.Z., Mundodi, S., Reiser, L., et al. 2004, Functional annotation of the Arabidopsis genome using controlled vocabularies, Plant Physiol., 135, 745755.
[Abstract/Free Full Text]
This article has been cited by other articles:
![]() |
R. Narsai, K. A. Howell, A. H. Millar, N. O'Toole, I. Small, and J. Whelan Genome-Wide Analysis of mRNA Decay Rates and Their Determinants in Arabidopsis thaliana PLANT CELL, November 1, 2007; 19(11): 3418 - 3436. [Abstract] [Full Text] [PDF] |
||||
![]() |
F.-C. Chen, S.-S. Wang, S.-M. Chaw, Y.-T. Huang, and T.-J. Chuang Plant Gene and Alternatively Spliced Variant Annotator. A Plant Genome Annotation Pipeline for Rice Gene and Alternatively Spliced Variant Identification with Cross-Species Expressed Sequence Tag Conservation from Seven Plant Species Plant Physiology, March 1, 2007; 143(3): 1086 - 1095. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||




