Skip Navigation

DNA Research 2005 12(4):257-267; doi:10.1093/dnares/dsi010
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow Supplementary data
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (1)
Right arrow Request Permissions
Google Scholar
Right arrow Articles by Nakajima, D.
Right arrow Articles by Ohara, O.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Nakajima, D.
Right arrow Articles by Ohara, O.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2006. Kazusa DNA Research Institute
The online version of this article has been published under an open access model. Users are entitled to use, reproduce, disseminate, or display the open access version of this article for non-commercial purposes provided that: the original authorship is properly and fully attributed; the Journal and Oxford University Press are attributed as the original place of publication with the correct citation details given; if an article is subsequently reproduced or disseminated not in its entirety but only in part or as a derivative work this must be clearly indicated. For commercial re-use, please contact journals.permissions@oxfordjournals.org

Preparation of a Set of Expression-Ready Clones of Mammalian Long cDNAs Encoding Large Proteins by the ORF Trap Cloning Method

Daisuke Nakajima1, Kenji Saito2, Hisashi Yamakawa1, Reiko F. Kikuno1, Manabu Nakayama1,2, Reiko Ohara1, Noriko Okazaki1, Hisashi Koga1,3, Takahiro Nagase1,* and Osamu Ohara1,2,4

1Department of Human Gene Research, Kazusa DNA Research Institute 2-6-7 Kazusa-Kamatari, Kisarazu, Chiba 292-0818, Japan
2Laboratory of Pharmacogenomics, Graduate School of Pharmaceutical Sciences, Chiba University 2-6-7 Kazusa-Kamatari, Kisarazu, Chiba 292-0818, Japan
3Chiba Industry Advancement Center 2-6 Nakase, Mihama-ku, Chiba 261-7126, Japan
4RIKEN Research Center for Allergy and Immunology 1-7-22 Suehiro, Tsurumi-ku, Yokohama, Kanagawa 230-0045, Japan

Received 9 March 2005; revised 22 July 2005


    Abstract
 Top
 Abstract
 1. Introduction
 2. Materials and Methods
 3. Results
 4. Discussion
 Supplementary Material
 Acknowledgements
 References
 
Although we have so far identified and sequenced >2000 human long cDNAs, known as KIAA cDNAs, half of them have yet to be functionally annotated. Expression-ready cDNA clones derived from these genes, where the open reading frame (ORF) of the gene of interest is placed under the control of an appropriate promoter, are critical for functional characterization of these gene products. In this study, we attempted to systematically convert original cDNA clones to expression-ready forms for native and fusion proteins. For this purpose, we developed a new method for ORF cloning based on a homologous recombination in Escherichia coli to avoid laborious manipulations and artificial introduction of mutations in ORF. Using 1589 putative full-length ORFs (from 1002 KIAA genes, 119 human known genes and 468 mouse genes) with an average size of 2.8 kb, we successfully prepared expression plasmids for 1463 native proteins and for 1343 fusion proteins by this method. The resultant expression-ready clones were examined using an in vitro transcription/translation system followed by SDS–polyacrylamide gel electrophoresis and by transient expression of GFP-fusion proteins in human embryonic kidney (HEK) 293 cells. This set of expression-ready clones of long cDNAs encoding large proteins would open a new route to experimentally analyze their functions on a proteomic scale, since unavailability of expression-ready clones for mammalian large proteins has been a major obstacle to the functional analysis of these cDNAs.

Key words: large protein; cDNA; expression clone; proteomics; subcellular localization


    1. Introduction
 Top
 Abstract
 1. Introduction
 2. Materials and Methods
 3. Results
 4. Discussion
 Supplementary Material
 Acknowledgements
 References
 
In the last decade, whole genome sequences have been revealed from bacteria to humans in a complete or a draft form. This invaluable information is expected to shift the paradigm of molecular biology by making it possible to predict a complete catalog of protein-coding genes in an organism. However, the genome sequence information alone is not sufficient to achieve this. Information about transcribed sequences is indispensable for deducing protein primary structures encoded by the mammalian genome due to difficult-to-predict exon–intron structure. Furthermore, because the protein primary structures do not necessarily enable us to deduce their biological functions, we need various types of genomic resources to experimentally explore them in a genome-wide manner. In particular, a set of expression plasmids for all the proteins in an organism serves as an indispensable reagent for functional genomics. Thus, many efforts have been made to prepare sets of defined protein-expression clones, so-called ‘ORFeome cloning’, to analyze protein function on a proteomic scale.1Go–7Go Although open reading frame (ORF) is different from the protein-coding sequence (CDS) by definition, we hereafter use ORF rather than CDS for protein-coding sequence to avoid unnecessary confusion, contrary to the conventions used by other researchers.

Although ORF cloning into expression vectors has already been attempted in mammals,6Go,8Go–10Go it involves many technical difficulties at the ORFeome level. The most serious problem is the difficulty in predicting the ORFeome from the mammalian genomic information. Thus, to achieve reliable prediction of protein-coding genes, a large volume of sequence information of cDNAs was accumulated to complement the genomic sequence information in the public domain.11Go–18Go Furthermore, rapidly growing data of genomic and transcriptomic sequences from several different mammals allow us to make comparisons and thereby assign mammalian ORFeomes more convincingly than was possible a few years ago. Therefore, we believe that ORFeome cloning in mammals is a research area that should be accorded high priority.

Our cDNA sequencing project was unique, focusing on long cDNAs (>4 kb) encoding relatively large proteins from the brain. This was because of our interest in cDNA clones encoding multidomain proteins, many of which play crucial roles, such as signal transduction, cell–cell communication, cell structure/motility and gene regulation, in multicellular organisms.19Go,20Go More than 2000 newly isolated cDNAs have been entirely sequenced (referred to as KIAA plus a 4-digit number; the average number of amino acid residues in KIAA ORFs was 949). The KIAA cDNA sequences and clones are continuously updated if their ORFs were spuriously interrupted or truncated.21Go We have recently started to collect cDNA clones for mouse homologs of human KIAA (mKIAA) genes and polyclonal antibodies against mKIAA proteins to explore the function of KIAA gene products using an animal model in vivo.17Go,22Go–24Go The information about KIAA and mKIAA genes is available through our databases (HUGE for human KIAA genes; http://www.kazusa.or.jp/huge and ROUGE for mouse KIAA genes; http://www.kazusa.or.jp/rouge).25Go Approximately half of the KIAA proteins have not been functionally annotated by Gene Ontology.26Go To overcome this situation, it is essential to express KIAA proteins in cells and analyze their functions. Thus, we considered that preparation of cDNA clones in an expression-ready format to accelerate the functional study of KIAA gene products with unknown function on a molecular level was urgently needed. A set of KIAA ORF clones would constitute an indispensable subset of the human ORFeome collection. To achieve this, we needed to solve some technical problems in transferring long ORF to an expression vector in a systematic manner. In this study, we developed a new method, termed as the ORF trap method, for ORFeome cloning of long cDNAs. We describe the details of the ORF trap method, its application to long ORF cloning, and characterization of the protein products of the resultant ORF clones. The results indicate that this method worked effectively and the resultant clone set could be successfully used as a versatile reagent for functional characterization of large proteins.


    2. Materials and Methods
 Top
 Abstract
 1. Introduction
 2. Materials and Methods
 3. Results
 4. Discussion
 Supplementary Material
 Acknowledgements
 References
 
2.1. Materials
pTRAP1 plasmid DNA was constructed by the Gateway BP reaction with pDONR201 plasmid (Invitrogen, USA) and the attB1-SD-Kozak-Not I-attB2 fragment (5'-GGGGACAAGTTTGTACAAAAAAGCAGGCTTCGAAGGAGATAGAACCA TGGGGGATGTGAAGCTGGTTGCCTCGTCACACATTTCCAAAACCTCCCT CAGTGTGGCGGCCGCCTCATTGTGGAAACGATGGAGGAAGGTGAAGGG GAAGGGGAAGAGGAAGAAGAGTGGAACCCAGCTTTCTTGTACAAAGTGGTCCCC-3') according to the instructions provided by the supplier, followed by insertion of the DNA fragment harboring the chloramphenicol resistance and ccdB genes amplified by PCR with pDONR201 into the NotI site. The expression clones including various ORFs were constructed by the Gateway LR reaction with the entry clones amplified by the TempliPhi Amplification Kit (Amersham Biosciences, UK) and pcDNA-DEST47 for expression in an in vitro transcription/translation system or in mammalian cells.

2.2. Cultured cells
The human embryonic kidney (HEK) 293 cell line was obtained from the Health Science Research Resources Bank (HSRRB) of the Japan Health Sciences Foundation. The cell line was grown in DMEM supplemented with 10% of Tet System Approved Fetal Bovine Serum (BD Biosciences, USA).

2.3. Preparation of JC8679 competent cells for electroporation
E. coli strain JC8679 (recB21, recC22, sbcA23, thr-1, leuB6, phi-1, lacY1, galK2, ara-14, Xyl-5, mtl-1, proA2, his-4, argE3, rpsL31, tsx-33, supE44, his-328) was obtained from HSRRB of Japan Health Sciences Foundation and used for bacterial transformation. The competent bacterial cells were prepared as follows: a single colony of JC8679 was picked and grown in 30 ml of SOB medium overnight at 37°C. After 5 ml of the cultured bacteria were inoculated into 500 ml of SOB medium, they were cultured with agitation at 37°C until the cells reached OD600 = 0.35–0.4. The cells were then chilled on ice for 15 min and harvested by centrifuging of 2300 g for 10 min at 2°C. After discarding the supernatants, the cells were suspended in 5 ml of ice-cold H2O, then 500 ml of ice-cold H2O was added and the cells were centrifuged and then washed twice. After washing with 20 ml of ice-cold 10% glycerol, the cells were suspended in 1.25 ml of ice-cold 10% glycerol. Finally, the cells were divided and placed in separate tubes, frozen in dry ice-ethanol and stored in –80°C freezer.

2.4. Homologous recombination
A linear plasmid containing gene-specific sequences was generated by PCR in a 50 µl reaction including 2.5 units LA Taq DNA polymerase (Takara, Japan), 0.25 mM each of four dNTPs and 5 ng of plasmid template, pTRAP1 (94°C for 5 min; 5 cycles of 94°C for 30 s, 40°C for 30 s and 72°C for 2 min; 20 cycles of 94°C for 30 s, 55°C for 30 s and 72°C for 2 min) using 2 µM each of a gene-specific orfn-primer (5'-N41CATGGTTCTATC-3', where N41 indicates a 41 nt-anti-sense sequence after the start codon of ORF) and a gene-specific orfc-primer (5'-N'42XXXXXXXACCCAGCTTTC-3', where N'42 indicates a 42-nt-sense sequence before the stop codon of ORF). A consecutive X is a discriminator for a native or fusion-type of construct which contains one of the following sequences: TAA/CGTAG, TGA/CGCAG, TGA/TCTAG(A) or TA/CGTACG where the underlined sequences can be digested by SnaBI, FspI, XbaI or BsiWI restriction endonucleases, respectively, where the construct is a fusion-type. Hereafter, PCR products thus amplified are tentatively referred to as trap vectors. The linear trap vector was then purified with 5–10 µg of a plasmid DNA containing cDNA corresponding to respective gene-specific primer sequences (original clones) by Wizard SV 96 PCR Clean-Up System (Promega, USA) and finally dissolved in 5 µl of H2O. JC8679 (20 µl) competent cells were transformed with a mixture of linear trap vector and circular or linear original clone (2 µl) in a cuvette (Bio-Rad, USA, 0.1 cm electrode) by electroporation using a Bio-Rad Gene Pulser set at 1.67 kV, 200 ohms and 25 µF. The transformed cells were incubated in 500 µl of SOC at 37°C for 2 h with agitation and were plated on LB-kanamycin (50 µg/ml) agar. To select recombinant clones, plasmid DNAs were prepared from at least eight colonies per electroporation by the nucleic acid purification system, MFX-9600-Magnia with MagExtractor (Toyobo, Japan) in a 96-well format and the recombined regions were confirmed by single-pass sequencing. Single-pass sequencing was performed from both ends of the insert DNA with GatewaySeqL-A (5'-TCGCGTTAACGCTAGCATGGATCTC-3') and GatewaySeqL-B (5'-GTAACATCAGAGATTTTGAGACAC-3') primers and ABI BigDye Terminator Cycle Sequencing Kit Ver. 3.1 (Applied Biosystems, USA) using ABI3700 DNA sequencer. After arraying the plasmid clones correctly recombined in a new 96-well plate, the plasmid DNAs were introduced into competent DH10B cells to resolve plasmid multimers formed in JC8679 and were amplified. At the same time, single-pass sequencing was performed from both ends of the insert DNA again and the sizes of the plasmid DNAs in a covalently closed circular form were estimated by agarose gel electrophoresis with a molecular weight marker for supercoiled DNA (Supercoiled DNA Ladder, Invitrogen, USA) followed by an ethidium bromide-staining and the FragmeNT analysis of gel image data obtained by FluorImager SI (Molecular Dynamics, USA). Although most of the original clones whose backbones were pBluescript II SK (+) vector were adapted into linear form, the cDNA fragment cloned in the pSPORT1 vector were cut out with appropriate restriction enzymes before the recombination reaction to prevent undesired recombination. For some cDNA clones with truncation of a few amino acid residues at the predicted N-terminal sequence, entry clones were constructed by using gene-specific primers containing complementing sequences to make full-length ORFs.

2.5. In vitro transcription/translation assay
pcDNA-DEST47 destination vector (Invitrogen, USA) was used for the Gateway LR reaction with the entry clones we constructed to produce proteins by an in vitro transcription/translation system using rabbit reticulocyte lysate. For the fusion type of expression clones, C-terminal Cycle 3 green fluorescent protein (GFP)-fusion vector was used. Plasmid DNAs used in an in vitro expression were prepared by MFX-9600-Magnia in a 96-well format from overnight-cultured bacteria and were purified using MultiScreen, MAFB NOB (Millipore, USA) according to the instructions provided by the manufacturer. Plasmids (100–200 ng) were subjected to an in vitro transcription/translation system (TNT T7 Quick Coupled Reticulocyte Lysate System, Promega, USA) in the presence of 4 µl of Quick Mix, 0.2 µl of FluoroTectTM GreenLys tRNA (Promega, USA) and 0.1 µl of 1 M methionine in a final reaction volume of 5.3 µl. The products were resolved on MDG-267 Real Gel Plate (concentration gradient: 5–10%, Biocraft, Japan). Prestained protein size markers (BenchMark size markers, Invitrogen, USA) were used for estimation of the apparent molecular masses of the in vitro products. When multiple discrete bands were observed from a single clone, the size of the largest band was estimated.

2.6. Subcellular localization of GFP-fusion protein
GFP-fusion protein expression clones used for subcellular localization were constructed and prepared as described above. Resultant purified DNAs (100–200 ng) were transfected into HEK293 cells using FUGENE 6 Transfection Reagent (Roche, USA) in an 8-well chambered coverglass (Nalge Nunc International, USA). Cells (200 µl; 4 x 104) were plated 24 h before the transfection experiment. After 40–50 h of transfection, the subcellular localization of the GFP-fusion proteins was observed by fluorescent microscope (Axiovert S100, Zeiss, Germany) and recorded (DP70, Olympus, Tokyo).

2.7. Western blotting
For western blot analysis of GFP-fusion proteins, expression clones were constructed as described above and prepared using QIAGEN Plasmid Kit (Qiagen, Germany). HEK293 (1.5 x 105) cells were plated in 24-well culture plate (Becton Dickinson, San Jose, CA) 24 h before transfection. GFP-fusion protein expression clones (250 ng) were used for the transfection per well. Cell extracts were prepared 48 h after the transfection by washing cells with PBS and dissolving with 50 µl of the 2x SDS sample buffer [100 mM Tris-Cl (pH 6.8), 4% SDS, 20% glycerol, 0.1% BPB]. After brief sonication, 2-mercaptoethanol was added to the samples to a final concentration of 144 mM and the samples were heated at 95°C for 5 min. The sample (1–5 µl) was applied for SDS–Polyacrylamide gel electrophoresis (SDS–PAGE) (MDG-267 Real Gel Plate, concentration gradient: 5–10%, BIOCRAFT, Japan). At a time, some expression clones were subjected to an in vitro transcription/translation system according to the above-described method except a labeling using FluoroTectTM GreenLys tRNA and 5 µl of the sample was used for SDS–PAGE. After the electrophoresis, gels were immersed into the transfer buffer [25 mM Tris-Cl (pH 8.3), 192 mM glycine, 20% (v/v) methanol] for 15 min and the separated proteins were electrophoretically transferred onto PVDF membrane (FluoroTrans W, Pall, USA) using the transfer buffer with the BIOCRAFT BE-300 semidry transfer device. Detection of GFP-fusion protein was performed at room temperature as follows. The transfer membrane was washed with TBS [20 mM Tris-Cl (pH 7.5) and 150 mM NaCl] including 0.05% Tween-20 (TBST) buffer with gentle agitation for 10 min followed by incubation with blocking solution (TBST containing 5% skim milk) for 60 min. The membrane was then incubated with 1:1000 anti-GFP mouse IgG1-k antibody (Nacalai Tesque, Japan) in blocking solution for 60 min. After washing with TBST for 5 min four times, the membrane was incubated with 1:4000 horseradish-peroxidase (HRP)-conjugated anti-mouse IgG antibody (Dako Corporation, Carpinteria, CA) in blocking solution for 60 min. GFP-fusion proteins on the membrane were finally detected by ECL plus (Amersham Biosciences, UK) according to the instructions provided by the manufacturer after washing the membrane with TBST for 5 min four times and exposing it to Biomax X-ray films (Kodak, USA) or recording by Luminescent Image Analyzer LAS1000 (FUJIFILM, Japan). Prestained protein size markers (Magic Mark XP, Invitrogen, USA) were used for estimation of the apparent molecular masses of the GFP-fusion proteins.


    3. Results
 Top
 Abstract
 1. Introduction
 2. Materials and Methods
 3. Results
 4. Discussion
 Supplementary Material
 Acknowledgements
 References
 
3.1. Cloning of ORFs by homologous recombination in E. coli
In conventional ORF cloning, PCR has been extensively used as a convenient method for excising ORF from parental cDNA clones. Although care was taken to minimize artificial introduction of mutations in ORF during PCR, accidental isolation of artificially mutated clones was always a possibility. This risk obviously increases as the size of ORF increases. Thus, we were strongly motivated to develop an alternative way to isolate ORF clones, because our ORFs of interest were considerably longer than those manipulated in previous reports.1Go–4Go,6Go In this study, we used a homologous recombination system in E. coli, JC8679, which has genotype of recBC and sbcA,27Go–29Go for ORF cloning. This recombination cloning system enabled us to construct multiple clones harboring only ORFs (in practice, from the initiation codon to the termination codon) in parallel through the following three major steps: (i) generation of gene-specific trap vector fragments by PCR using pairs of gene-specific primers and pTRAP1 plasmid DNA as a template; (ii) transformation of E. coli JC8679 cells with a gene-specific trap vector fragment and a plasmid DNA harboring the ORF of interest by electroporation; (iii) clone selection/confirmation by single-pass sequencing after growth on agar plates containing appropriate antibiotics. An outline of the ORF trap cloning method is schematically shown in Fig. 1A. In brief, pTRAP1 vector was amplified by PCR using the orfn and orfc primers, which contained gene-specific sequences corresponding to a 41-nt anti-sense sequence after the start codon of ORFs and a 42-nt sense sequence before the stop codon of ORFs, respectively, which serve as tags of homologous recombination. The gene-specific sequence of orfn was flanked by the Shine–Dalgarno (SD)–Kozak sequence while orfc was a mixture of oligomers carrying the gene-specific sequence followed by a termination codon and a non-termination codon. The structures of orfn and orfc primers are shown in Fig. 1B. Resultant linear trap-vector fragments, which have gene-specific sequences at the extremities, were co-introduced into E. coli JC8679 cells with the corresponding cDNA clones. From clones appearing on agar plates containing antibiotics, the flanking sequences of the recombination sites of multiple independent clones were confirmed by single-pass sequencing using vector primers to confirm the recombination at an appropriate site and to remove clones with mutations derived from small amounts of incorrectly synthesized primers and/or PCR errors. Mutations introduced in the vector backbone sequence did not matter because the vector backbone was eventually replaced by the expression vectors. In particular, we designed this cloning method so as to be able to simultaneously obtain ORF clones both for production of C-terminal fusion protein and for native protein. For this purpose, the orfc primer was a mixture of two primers with and without the ORF termination codon. As shown in Fig. 1B, to discriminate native and C-terminal fusion type of expression clones, a unique restriction endonuclease cleavage site was introduced in the fusion type of clone at the region where the termination codon was removed. As a result, any tag sequence of interest can be introduced in the unique restriction cleavage site. For maximum versatility, the resultant ORF clones are in a compatible format with an in vitro lambda phage recombinational cloning system, known as the Gateway system.30Go Once the ORF clones were successfully obtained, we were able to prepare expression plasmids for their native, N-terminal fusion and C-terminal fusion proteins in various expression systems on demand using the Gateway recombination cloning system. Using 1589 putative full-length ORFs (from 1121 human and 468 mouse genes) with an average size of 2.8 kb, we successfully prepared expression plasmids for 1463 native proteins and for 1343 fusion proteins by this method. Putative full-length ORFs were chosen on the basis of sequence information on homologous genes and GENSCAN predicted CDSs. Information about the entry clones thus prepared is available through Supplementary Table (available at www.dnares.oxfordjournals.org) and the HUGE protein database.


Figure 1
View larger version (27K):
[in this window]
[in a new window]
 
Schematic representation of the ORF trap cloning. (A) Strategy for cloning of ORFs. Trap vector is prepared by PCR using the gene-specific orfn and orfc primers with pTRAP1 vector as a template. The trap vector is introduced into JC8679 cells with the original clones corresponding to the gene-specific primers to induce homologous recombination. The resultant recombinant clones termed pENT can be used for Gateway recombination cloning to produce a variety of expression vectors. Kanamycin-resistant (KanR), chloramphenicol-resistant (CmR) and ampicillin-resistant (AmpR) genes are boxed. Shine–Dalgarno (SD) and Kozak consensus sequences, stop codon (stop) or non-stop codon (fusion), ccdB gene and tag sequences are also indicated in the open boxes. Gateway recombination sequences (attL) are shown in the closed boxes. ORFs and untranslated sequences are indicated as gray and open boxes. (B) Sequences of orfn and orfc primers. The consensus sequences of gene-specific primers and the corresponding vector sequences are boxed. SD, Kozak and stop codon sequences are indicated in boldface. The gene-specific sequences are shown as N(41) and N(42) (see text for details). Discriminator sequences in the orfc primer are indicated as X and the respective discriminators for the fusion-type clones are shown with a unique restriction endonuclease site, in which the degenerated nucleotides are underlined.

 
Unlike PCR, homologous recombination-based cloning systems are not expected to introduce point mutations easily because the cloned sequence is amplified in vivo by the endogenous E. coli replication machinery.31Go In fact, the sequencing analysis of coding regions derived from 14 cDNA clones, which totaled ~50 kb, revealed no nucleotide substitutions, insertions or deletions after homologous recombination-based cloning (data not shown).

3.2. Evaluation of recombinant proteins produced in vitro by SDS–PAGE
To characterize the recombinant proteins, 1530 native type and 1442 GFP-fusion proteins were synthesized in vitro and evaluated by SDS–PAGE. Although some protein products migrated faster or slower than expected, apparent molecular mass estimated by SDS–PAGE is known to vary considerably according to the amino acid sequences and/or post-translational modifications on proteins.32Go Thus, we focused our efforts on comparing the size-differences between the native and the fusion protein to evaluate the recombinant products, rather than on their absolute apparent molecular masses. Thus, apparent molecular masses of 1334 (87%) native recombinant proteins and 1283 (89%) GFP-fusion proteins were estimated in a range from 70 to 160% against molecular weights predicted by SDS–PAGE (Supplementary Table is available at www.dnares.oxfordjournals.org). Fig. 2A shows examples of the analysis of the fluorescence-labeled proteins (clones described in ORFT16 and ORFT17 plates in the Supplementary Table which is available at www.dnares.oxfordjournals.org). Although some small bands, probably resulting from degradation or accidental translation from an internal methionine codon, were seen in some lanes, the sizes of the largest proteins were always compared with the predicted molecular masses. Comparison of the apparent molecular masses of 1179 pairs of native and GFP-fusion proteins are shown in Fig. 2B. The average size of fluorescence-labeled native proteins was 111 kDa and the average size-difference between native and GFP-fusion proteins was 29.6 kDa with a standard deviation of 5.3. Because fusion of the GFP moiety resulted in an increase in the molecular mass of the recombinant proteins by 26.8 kDa, the increase of apparent molecular mass of the GFP-fusion protein by ~30 kDa on SDS–PAGE indicated that the reading frame of the recombinant protein was correctly maintained after the ORF trap cloning.


Figure 2
View larger version (69K):
[in this window]
[in a new window]
 
Evaluation of ORF trap clones by estimation of the size of their recombinant proteins. (A) Fluorescence-labeled recombinant proteins produced in an in vitro transcription/translation system. 303 native (N) and GFP-fusion (F) proteins for respective genes were separated by SDS–polyacrylamide gel as a pair and detected with FluorImager (Molecular Device, USA). The positions of BenchMark protein size markers are indicated on the left. (B) Apparent molecular masses of 1179 pairs of native and GFP-fusion proteins. The apparent molecular masses of recombinant clones for native (blue) and GFP-fusion proteins (red) estimated by SDS–PAGE are shown next to each other. The vertical axis indicates the apparent molecular masses in kDa. The horizontal axis indicates the number of protein pairs.

 
3.3. Analysis of transiently expressed GFP-fusion proteins in cultured cells
To demonstrate the usefulness of the set of long ORF clones, we first examined the subcellular localization of transiently expressed GFP-fusion proteins in living HEK293 cells along with the systematic analysis for subcellular localization of expressed fusion proteins developed by Simpson and coworkers.8Go Example of subcellular localizations of transiently expressed GFP-fusion proteins were shown in Fig. 3A. We identified the subcellular localizations for 719 GFP-fusion proteins in HEK293 cells with a success rate of 82%, and they were tentatively classified into 13 groups as shown in Table 1. Nearly 34% of the fusion proteins were dispersed in whole cytoplasm, while some proteins were localized in unspecified organelles within the cytoplasm in a speckled distribution (the ‘cytoplasmic organelle’ group in Table 1). Some speckles had the potential to translocate with detectable speed in the cytoplasm, suggesting that their organization is dependent on a microtubular system similar to that of peroxisomes (data not shown).33Go The next most frequently observed localization was the nucleus (12.6%). Although some fusion proteins were localized as interchromatin granule clusters in the nucleus (the ‘nuclear vesicle’ group in Table 1), their detailed localization has yet to be studied. The fluorescent images of HEK293 cells expressing GFP-fusion proteins are accessible through the InGaP database (http://www.kazusa.or.jp/ingap).34Go


Figure 3
View larger version (86K):
[in this window]
[in a new window]
 
Recombinant proteins expressed in HEK293 cells. Twelve GFP-fusion protein expression clones were analyzed as described below: lane 1, L1CAM: neural cell adhesion molecule L1 precursor; lane 2, PCDHGA8/KIAA0588: protocadherin gamma A12 precursor; lane 3, MCLC/KIAA0761: mid-1-related chloride channel; lane 4, KCND2/KIAA1044: potassium voltage-gated channel subfamily D member 2; lane 5, SCN3B/KIAA1158: sodium channel beta-3 subunit precursor; lane 6, TPCN1/KIAA1169: two pore segment channel 1; lane 7, C11orf11/KIAA0659: neural stem cell-derived dendrite regulator; lane 8, NRXN2/KIAA0921: neurexin 2-beta precursor; lane 9, PLXND1/KIAA0620: plexin D1 precursor; lane 10, PLXNB3/KIAA1206: plexin B3 precursor; lane 11, ERR3/KIAA0832: estrogen-related receptor gamma isoform 2; and lane 12, PDZRN3/KIAA1095, PDZ domain containing RING finger 3. (A) Subcellular localizations of transiently expressed the GFP-fusion proteins. Fluorescence image (left) shows localization of GFP-fusion proteins in white; the phase contrast image (right) shows cell shape. A scale bar is indicated in the panel of L1CAM as a white bar. Gene names are shown on the downside of photos. (B) Western blots of transiently expressed GFP-fusion proteins: 12 fusion type entry clones were transferred into pcDNA-DEST47 vector. Resultant GFP-fusion protein expression clones were transfected into HEK293 cells. The cell extracts (5 µl of the sample) were electrophoretically separated by SDS–PAGE and the GFP-fusion proteins were detected by ECL plus using the anti-GFP antibody after transfer to the membrane and by exposing to X-ray films. The lane numbers correspond to the expression clone described above. Multiple bands that migrated slower than expected are indicated by white closing brackets. (C) Comparison of the apparent molecular masses between GFP-fusion proteins produced in HEK293 cells and rabbit reticulocyte lysates. The lane numbers correspond to the expression clone described above. The GFP-fusion proteins produced in the cultured cells (Cell Ext.: 1 µl of the sample) and in rabbit reticulocyte lysate (RRL: 1 µl of the sample) were electrophoretically separated by SDS–PAGE and detected as described above except the data were recorded by Luminescent Image Analyzer LAS1000. Major products detected by the antibody in the lanes of RRL are indicated by arrows. Molecular masses determined by the protein-size marker are shown in kDa on the left-hand side of the panels in (B) and (C).

 

View this table:
[in this window]
[in a new window]
 
Subcellular localization of exogenously expressed GFP-fusion proteins in 293 cells

 
To confirm biochemically whether exogenous gene products were produced in HEK293 cells, we next analyzed the GFP-fusion proteins transiently expressed in HEK293 cells by protein blot analysis using an anti-GFP antibody. The production levels of the GFP-fusion proteins varied considerably from gene to gene, although all of them were placed under the control of an identical CMV promoter (Fig. 3B, lanes 1–12). Among 12 GFP-fusion proteins examined in our study, some of them showed the extra bands that were slower to migrate on the SDS–PAGE (white closing brackets in Fig. 3B, lanes 1, 4–7) than expected bands with the sizes estimated by in vitro products. Multiple bands that were faster to migrate on the gel were also detected in some lanes. To compare the apparent molecular masses of the recombinant proteins with the extra bands in HEK293 cells that migrated slowly (Cell Ext. in Fig. 3C) with those of the corresponding in vitro products, the GFP-fusion proteins produced in a rabbit reticulocyte lysate (RRL in Fig. 3C) were simultaneously analyzed by the protein blot analysis using an anti-GFP antibody. The positions of their major products produced in rabbit reticulocyte lysate are indicated by black arrows in Fig. 3C. These results can reveal the occurrence of post-translational events such as modification or processing on certain proteins in certain cells. In the human L1 cell adhesion molecule (L1CAM), the size difference appeared to be caused by protein modifications such as glycosylation because this gene encoded an integral membrane glycoprotein in the immunoglobulin superfamily (Fig. 3C, lane 1). KIAA1044, KIAA1158, KIAA1169 and KIAA0659 GFP-fusion proteins, which are potassium channel KV4.2, sodium channel beta-3 subunit precursor, two-pore channel 1 and neural stem cell-derived dendrite regulator, respectively, also showed distinct SDS–PAGE patterns when they were produced in vivo and in vitro (Fig. 3C, lanes 4–7).


    4. Discussion
 Top
 Abstract
 1. Introduction
 2. Materials and Methods
 3. Results
 4. Discussion
 Supplementary Material
 Acknowledgements
 References
 
It is widely accepted that full-length cDNA clones in an expression-ready form are essential for functional analysis of mammalian genes.35Go For comprehensive functional analysis, it is an urgent task to prepare a complete set of full-length genes, which can direct the synthesis of authentic proteins encoded by the genome, because they serve as indispensable and versatile reagents for production of recombinant proteins. In mammalian systems, cDNA clones are extensively used for this purpose. There are many sophisticated methods for constructing full-length clone-enriched cDNA libraries, but none of them guarantees that every cDNA clone thus generated is full-length.36Go,37Go Therefore, the cDNA clones must be carefully evaluated by multiple independent criteria to ensure they are full-length. Since sequence information of orthologous genes among different mammals has become available recently, comparison of the genomic structures of these orthologues enables us to judge whether a cDNA clone encodes full-length ORF or not with increased confidence. The long cDNA clones manipulated in the present study were considered as full-length in this way.21Go Recent technical improvements in comprehensively assigning 5'-extreme sequences could eventually confirm the 5'-end structures of human cDNAs,38Go–40Go and, if necessary, these ORF clones should be revised accordingly. This is highly critical in ORF clones because even a single amino acid change could have a dramatic effect on protein function.

Large cDNAs are generally difficult targets to manipulate, particularly in a systematic manner. Although PCR is often used for ORF cloning of short cDNAs, we considered that for long cDNAs, the amplification step of PCR should be avoided if possible. We developed a homologous recombination system in E. coli to clone large ORF regions into a vector to prevent mutations in the protein-coding sequences in this study. So far, we have attempted to construct 1121 clones for human genes and 468 clones for mouse genes including KIAA and mKIAA genes, which encode relatively large proteins. Most of the clones were confirmed by single-pass sequencing of recombination sites and by evaluation of their expressed proteins. Our experience indicates that this method is quite useful for ORFeome cloning in general.

When considering the role of gene products, it is useful to elucidate their subcellular distribution even though this information is often obtained from experiments using exogenous gene products. According to the UniProt Knowledgebase (http://www.uniprot.org), there are only ~80 reports concerning subcellular localizations of KIAA genes in the citation information of the databases. Determination of the subcellular localization of the remaining unanalyzed KIAA gene products is urgently needed. We have been analyzing the subcellular localizations and the records will be available through our InGaP database (http://www.kazusa.or.jp/ingap).34Go This database also contains immunohistochemical data using antibodies against mKIAA proteins, which we accumulated in another project.24Go The comparison of subcellular localizations of mKIAA-GFP fused proteins and antibody-detected mKIAA proteins would provide many biological insights regarding the functions of mKIAA proteins. It should be noted that we analyzed 12 GFP-fusion proteins by protein blotting in parallel with fluorescent imaging analysis. Since we accumulated SDS–PAGE patterns for respective GFP-fusion proteins produced in vitro as described in this study, the comparison of SDS–PAGE patterns for transiently expressed GFP-fusion proteins in certain cells with those of the corresponding in vitro products could reveal occurrence of post-translational modifications (e.g. processing by protease, glycosylation and phosphorylation). Together with the information about their subcellular distribution, these sources of information may provide comprehensive insights into the biological function of KIAA/mKIAA proteins.

According to the Ensembl database (build34, http://www.ensembl.org), 9.5% of human genes (2108 genes) encode proteins composed of ≥1000 amino acid residues. While each cDNA is not always full-length, ~70% of the entirely sequenced cDNAs corresponding to those genes that encode large proteins (most of which are derived from the human brain) are collected at present in our Institute. On the other hand, of the 1002 putative full-length ORFs for KIAA genes treated in this study, only 20% of KIAA cDNA clones are listed in human ORFeome Version 1.1 (Open Biosystems, 8266 human cDNAs)6Go as other full-length cDNA clones, based on structural comparison by FASTA analysis (ungapped identities and coverage against amino acid sequences of KIAA proteins were >95% and >90%, respectively, data not shown). This implies that availability of ORF clones for large proteins is quite low in the research community. As a research group making efforts to identify unknown large cDNAs, we intend to collect as many as possible cDNAs encoding relatively large proteins in a full-length expression-ready form. The collection of mammalian long ORF clones will keep growing and will be an invaluable resource for the advancement of molecular biology and medical science.


    Supplementary Material
 Top
 Abstract
 1. Introduction
 2. Materials and Methods
 3. Results
 4. Discussion
 Supplementary Material
 Acknowledgements
 References
 
Supplementary material is available online at http://dnaresearch.oxfordjournals.org


    Acknowledgements
 Top
 Abstract
 1. Introduction
 2. Materials and Methods
 3. Results
 4. Discussion
 Supplementary Material
 Acknowledgements
 References
 
This project was supported by grants from the Kazusa DNA Research Institute. We thank Kazuko Yamada, Keishi Ozawa, Kiyoe Sumi, Nobue Kashima, Emiko Suzuki and Masatoshi Murakami for their technical assistance.


    Footnotes
 
*To whom correspondence should be addressed. Tel. +81 438 52 3930, Fax. +81 438 52 3931, E-mail: nagase{at}kazusa.or.jp

Communicated by Michio Oishi


    References
 Top
 Abstract
 1. Introduction
 2. Materials and Methods
 3. Results
 4. Discussion
 Supplementary Material
 Acknowledgements
 References
 

  1. Hudson, R. J. Jr., Dawson, E. P., Rushing, K. L., et al. 1997, The complete set of predicted genes from Saccharomyces cerevisiae in a readily usable form, Genome Res., 7, 1169–1173.[Abstract/Free Full Text]
  2. Zhu, H., Bilgin, M., Bangham, R., et al. 2001, Global analysis of protein activities using proteome chips, Science, 293, 2101–2105.[Abstract/Free Full Text]
  3. Ho, Y., Gruhler, A., Heilbut, A., et al. 2002, Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry, Nature, 415, 180–183.[CrossRef][Medline]
  4. Reboul, J., Vaglio, P., Rual, J.-F., et al. 2003, C. elegans ORFeome version 1.1: Experimental verification of the genome annotation and resource for proteome-scale protein expression, Nature Genet., 34, 35–41.[CrossRef][Web of Science][Medline]
  5. Phizicky, E., Bastiaens, P. I. H., Zhu, H., Snyder, M., Fields, S. 2003, Protein analysis on a proteomic scale, Nature, 422, 208–215.[CrossRef][Medline]
  6. Rual, J.-F., Hirozane-Kishikawa, T., Hao, T., et al. 2004, Human ORFeome version 1.1: A platform for reverse proteomics, Genome Res., 14, 2128–2135.[Abstract/Free Full Text]
  7. Pearlberg, J. and LaBaer, J. 2004, Protein expression clone repositories for functional proteomics, Curr. Opin. Chem. Biol., 8, 98–102.[CrossRef][Web of Science][Medline]
  8. Simpson, J. C., Wellenreuther, R., Poustka, A., Pepperkok, R., Wiemann, S. 2000, Systematic subcellular localization of novel proteins identified by large-scale cDNA sequencing, EMBO Rep., 3, 287–292.
  9. Suzuki, H., Fukunishi, Y., Kagawa, I., et al. 2001, Protein–protein interaction panel using mouse full-length cDNAs, Genome Res., 11, 1758–1765.[Abstract/Free Full Text]
  10. Wiemann, S., Arlt, D., Huber, W., et al. 2004, From ORFeome to Biology: A functional genomics pipeline, Genome Res., 14, 2136–2144.[Abstract/Free Full Text]
  11. Nomura, N., Miyajima, N., Sazuka, T., et al. 1994, Prediction of the coding sequences of unidentified human genes. I. The coding sequences of 40 new genes (KIAA0001–KIAA0040) deduced by analysis of randomly sampled cDNA clones from human immature myeloid cell line KG-1, DNA Res., 1, 27–35.[Abstract/Free Full Text]
  12. Strausberg, R. L., Feingold, E. A., Klausner, R. D., Collins, F. S. 1999, The mammalian gene collection, Science, 286, 455–457.[Abstract/Free Full Text]
  13. Wiemann, S., Weil, B., Wellenreuther, R., et al. 2001, Toward a catalog of human genes and proteins: sequencing and analysis of 500 novel complete protein coding human cDNAs, Genome Res., 11, 422–435.[Abstract/Free Full Text]
  14. Kawai, J., Shinagawa, A., Shibata, K., et al. 2001, Functional annotation of a full-length mouse cDNA collection, Nature, 409, 685–690.[CrossRef][Medline]
  15. Mammalian Gene Collection (MGC) Program Team. 2002, Generation and initial analysis of more than 15,000 full-length human and mouse cDNA sequences, Proc. Natl Acad. Sci. USA, 99, 16899–16903.[Abstract/Free Full Text]
  16. Okazaki, Y., Furuno, M., Kasukawa, T., et al. 2002, Analysis of the mouse transcriptome based on functional annotation of 60,770 full-length cDNAs, Nature, 420, 563–573.[CrossRef][Medline]
  17. Okazaki, N., Kikuno, R. F., Ohara, R., et al. 2004, Prediction of the coding sequences of mouse homologues of KIAA gene: IV. The complete nucleotide sequences of 500 mouse KIAA-homologous cDNAs identified by screening of terminal sequences of cDNA clones randomly sampled from size-fractionated libraries, DNA Res., 11, 205–218.[Abstract]
  18. Ota, T., Suzuki, Y., Nishikawa, T., et al. 2004, Complete sequencing and characterization of 21,243 full-length human cDNAs, Nature Genet., 36, 40–45.[CrossRef][Web of Science][Medline]
  19. Ohara, O., Nagase, T., Ishikawa, K.-I., et al. 1997, Construction and characterization of human brain cDNA libraries suitable for analysis of cDNA clones encoding relatively large proteins, DNA Res., 4, 53–59.[Abstract]
  20. Nagase, T., Kikuno, R., Ohara, O. 2003, The Kazusa cDNA project for identification of unknown human transcripts, C. R. Biol., 326, 959–966.[Web of Science][Medline]
  21. Nakajima, D., Okazaki, N., Yamakawa, H., Kikuno, R., Ohara, O., Nagase, T. 2002, Construction of expression-ready cDNA clones for KIAA genes: Manual curation of 330 KIAA cDNA clones, DNA Res., 9, 99–106.[Abstract]
  22. Okazaki, N., Kikuno, R., Ohara, R., et al. 2002, Prediction of the coding sequences of mouse homologues of KIAA gene: I. The complete nucleotide sequences of 100 mouse KIAA-homologous cDNAs identified by screening of terminal sequences of cDNA clones randomly sampled from size-fractionated libraries, DNA Res., 9, 179–188.[Abstract]
  23. Hara, Y., Shimada, K., Kohga, H., Ohara, O., Koga, H. 2003, High-throughput production of recombinant antigens for mouse KIAA proteins in Escherichia coli: computational allocation of plasmids of glutathione-S-transferase-fused antigens by an in vitro recombination-assisted method, DNA Res., 10, 129–136.[Abstract]
  24. Koga, H., Shimada, K., Hara, Y., et al. 2004, A comprehensive approach for establishment of the platform to analyze functions of KIAA proteins: Generation and evaluation of anti-mKIAA antibodies, Proteomics, 4, 1412–1416.[CrossRef][Web of Science][Medline]
  25. Kikuno, R., Nagase, T., Nakayama, M., et al. 2004, HUGE: a database for human KIAA proteins, a 2004 update integrating HUGEppi and ROUGE, Nucleic Acids Res., 32, D502–D504.[Abstract/Free Full Text]
  26. Camon, E., Magrane, M., Barrell, D., et al. 2003, The Gene Ontology Annotation (GOA) project: Implementation of GO in SWISS-PROT, TrEMBL, and InterPro, Genome Res., 13, 662–672.[Abstract/Free Full Text]
  27. Gillen, J. R., Willis, D. K., Clark, A. J. 1981, Genetic analysis of the RecE pathway of genetic recombination in Escherichia coli K-12, J. Bacteriol., 145, 521–532.[Abstract/Free Full Text]
  28. Oliner, J. D., Kinzler, K. W., Vogelstein, B. 1993, In vivo cloning of PCR products in E. coli, Nucleic Acids Res., 21, 5192–5197.[Abstract/Free Full Text]
  29. Zhang, Y., Buchholz, F., Muyrers, J. P., P., Stewart, A. F. 1998, A new logic for DNA engineering using recombination in Escherichia coli, Nature Genet., 20, 123–128.[CrossRef][Web of Science][Medline]
  30. Walhout, A. J. M., Temple, G. F., Brasch, M. A., et al. 2000, GATEWAY recombinational cloning: application to the cloning of large numbers of open reading frames or ORFeomes, Meth. Enzymol., 328, 575–592.[Web of Science][Medline]
  31. Muyrers, J. P. P., Zhang, Y., Stewart, F. 2001, Techniques: Recombinogenic engineering—new options for cloning and manipulating DNA, Trends Biochem. Sci., 26, 325–331.[CrossRef][Web of Science][Medline]
  32. Ohara, O. and Teraoka, H. 1987, Anomalous behavior of human leukocyte interferon subtypes on polyacrylamide gel electrophoresis in the presence of dodecyl sulfate, FEBS Lett., 211, 78–82.[CrossRef][Web of Science][Medline]
  33. Thiemann, M., Schrader, M., Volkl, A., Baumgart, E., Fahimi, H. D. 2000, Interaction of peroxisomes with microtubules, Eur. J. Biochem., 267, 6264–6275.[Web of Science][Medline]
  34. Koga, H., Yuasa, S., Nagase, T., et al. 2004, A comprehensive approach for establishment of the platform to analyze functions of KIAA proteins II: public release of inaugural version of InGaP database containing gene/protein expression profiles for 127 mouse KIAA genes/proteins, DNA Res., 11, 293–304.[Abstract]
  35. Brasch, M. A., Hartley, J. L., Vidal, M. 2004, ORFeome cloning and systems biology: Standardized mass production of the parts from the parts-list, Genome Res., 14, 2001–2009.[Abstract/Free Full Text]
  36. Maruyama, K. and Sugano, S. 1994, Oligo-capping: a simple method to replace the cap structure of eukaryotic mRNAs with oligoribonucleotides, Gene, 138, 171–174.[CrossRef][Web of Science][Medline]
  37. Carninci, P., Kvam, C., Kitamura, A., et al. 1996, High-efficiency full-length cDNA cloning by biotinylated CAP trapper, Genomics, 37, 327–336.[CrossRef][Web of Science][Medline]
  38. Shiraki, T., Kondo, S., Katayama, S., et al. 2003, Cap analysis gene expression for high-throughput analysis of transcriptional starting point and identification of promoter usage, Proc. Natl Acad. Sci. USA., 100, 15776–15781.[Abstract/Free Full Text]
  39. Hashimoto, S., Suzuki, Y., Kasai, Y., et al. 2004, 5'-end SAGE for the analysis of transcriptional start sites, Nature Biotech., 22, 1146–1149.[CrossRef][Web of Science][Medline]
  40. Wei, C.-L., Ng, P., Chiu, K. P, et al. 2004, 5' Long serial analysis of gene expression (LongSAGE) and 3'LongSAGE for transcriptome characterization and genome annotation, Proc. Natl Acad. Sci., 101, 11701–11706.[Abstract/Free Full Text]

Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
DNA ResHome page
T. Nagase, H. Yamakawa, S. Tadokoro, D. Nakajima, S. Inoue, K. Yamaguchi, Y. Itokawa, R. F. Kikuno, H. Koga, and O. Ohara
Exploration of Human ORFeome: High-Throughput Preparation of ORF Clones and Efficient Characterization of Their Protein Products
DNA Res, June 1, 2008; 15(3): 137 - 149.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow Supplementary data
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (1)
Right arrow Request Permissions
Google Scholar
Right arrow Articles by Nakajima, D.
Right arrow Articles by Ohara, O.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Nakajima, D.
Right arrow Articles by Ohara, O.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?