[Frontiers in Bioscience 14, 1270-1282, January 1, 2009] |
|
|
Regulation of bovine papillomavirus type 1 gene expression by RNA processing Rong Jia, Zhi-Ming Zheng HIV and AIDS Malignancy Branch, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD TABLE OF CONTENTS
1. ABSTRACT Bovine papillomavirus type 1 (BPV-1) has served as a prototype for studying the molecular biology and pathogenesis of papillomaviruses. The expression of BPV-1 early and late genes is highly regulated at both transcription and post-transcriptional levels and strictly tied to the differentiation of keratinocytes. BPV-1 infects keratinocytes in the basal layer of the skin and replicates in the nucleus of infected cells in a differentiation-dependent manner. Although viral early genes begin to be expressed from the infected, undifferentiated basal cells, viral late genes are not expressed until the infected cells enter the terminal differentiation stage. Both BPV-1 early and late transcripts are intron-containing bicistronic or polycistronic RNAs, bearing more than one open reading frame and are polyadenylated at either an early or late poly (A) site. Nuclear RNA processing of these transcripts by RNA splicing and poly (A) site selection has been extensively analyzed in the past decade and various viral cis-elements and cellular factors involved in regulation of viral RNA processing were discovered, leading to our better understanding of the gene expression and biology of human papillomaviruses. 2 INTRODUCTION Bovine papillomavirus type 1 (BPV-1) is a small DNA tumor virus that causes benign fibropapillomas in cattle. BPV-1 has an 8-kb double-stranded circular DNA genome (1) which can be separated into three distinguishable regions: a long control region (LCR) or noncoding region (NCR), an early or transforming region, and a late region encoding structural capsid proteins. These three regions are divided by two polyadenylation (pA) sites: early pA (AE) and late pA (AL). The early region encodes six early open reading frames (ORFs): E1, E2, E5, E6, E7, and E8, and a late ORF, E4 (2-5); the late region encodes two ORFs: L1 and L2. All ORFs are located on one strand of the double-stranded genome. The LCR contains a DNA replication origin (6,7), four promoters (P7185, PL, P7940, and P89,), and 12 enhancer elements (8-12) which are activated by the full-length E2 protein. Transcription of BPV-1 genes is very complex. The early region transcripts of BPV-1 are transcribed from at least six promoters: P7185, P7940, P89, P890, P2443, and P3080, but all of these early transcripts are polyadenylated at the same AE site (Figure 1). The P89 promoter is a classical promoter, with a TATA box about 30 nucleotides (nt) upstream of the E6 initiation site and a CAAT-box at position -83 (13). The P89 is one of the most active promoters in BPV-1 transformed cells and can be transactivated by E2 in vivo (9,14,15). However, an upstream E2 responsive enhancer (E2RE) is not required for efficient transcription in vitro from the P89 (16). Three transcription factor Sp1 sites are located at nt 7800, 7833 and 7854 upstream of the P89 start site. These Spl sites are critical for basal and E2-transactivated transcription from the P89 promoter (17). The P89 transcripts are responsible for the expression of transforming E6 (14) and possibly for E7 (18,19). A spliced isoform (nt 304/528) of the P89 transcripts presumably results in the production of an E6^E7 fusion (14,15), but this has never been confirmed. Transcripts from P2443 and P3080 are the most abundant BPV-1 transcripts in the fibromatous portion of the fibropapilloma (13). P2443 does not have a classical TATA box, but does have the sequence TAATATT located about 30 nts upstream of the transcription start site. Two transcripts are expressed from P2443. An unspliced message encodes a full-length E2 protein and a spliced form (nt 2505/3225) encodes the E5 protein. The spliced E5 form accounts for 90% of transcripts (14,15). P2443 is also transactivated by E2 through the E2 responsive enhancer elements in LCR (20). A Sp1 binding site upstream of the P2443 TATA-like box is critical for the basal level of transcription from this promoter (21). P3080 is located in the middle of the E2 ORF and does not have any recognizable TATA box (13,22). Like P7940, P89, and P2443, activity of P3080 is also regulated by the E2 protein. P3080 transcripts encode a transcriptional repressor, E2-TR, that contains 162 amino acid (aa) residues from the C-terminal half of the E2 ORF and binds the same E2 responsive enhancer elements as the full-length E2 protein, but inhibits E2-dependent transactivation (3,4). The P890 transcripts also encode another transactivation repressor E8^E2. The E8^E2 fusion protein has 11 aa N-terminal residues from the E8 ORF and 207 aa C-terminal residues from the E2 ORF. This fusion is created from a spliced message transcribed from the P890 promoter by RNA splicing from nt 1235 to 3225 (4,5,23). Two additional promoters, P7185 and P7940 in the LCR are also active in both bovine warts and BPV-1 transformed mouse C127 cells (ID13) (13). P7185 is the weakest BPV-1 promoter in vivo and does not have a classical TATA box, although a potential TATA box is located 30 nts upstream from the RNA start site, which overlaps with the AATAAA polyadenylation signal for the late region RNAs (13,16). P7185 has rather strong activity in an in vitro HeLa whole cell extract transcription system and responses to cycloheximide in vivo (13,16). In contrast to P7185, P7940 has a TATA box and a CCAAT box upstream of the transcription start site. P7940 contains an E2 responsive enhancer and is responsive to E2 transactivation (9,24). The level of transcripts expressed from P7940 is approximately one-third the amount from P89 (13). Deletion of the sequence between the TATA box and the transcription start site of the P7940 promoter reduces the expression of E1 (25). BPV-1 late region mRNAs are transcribed from a late promoter PL. In a fibropapilloma, the majority of transcripts from PL have 5' termini near nt 7250 and utilize a common splice donor site at nt 7385. The PL is not utilized in the BPV-1-transformed C127 cell or in fibromatous portion of a fibropapilloma, but the activity of the PL promoter dramatically increases in the granular layer of a fibropapilloma (26) where viral DNA replication takes place. Upstream of this mRNA start site is a tandemly repeated sequence element homologous to the SV40 late promoter sequence. (13,27,28). 3. BPV-1 GENE EXPRESSION IS REGULATED BY RNA SPLICING After promoter induction and initiation of viral transcription, removal of introns from a nascent pre-mRNA transcript by RNA splicing is a major step toward maturation of mRNA. Thus, regulation of RNA splicing at the posttranscriptional level plays an important role during the life cycle of BPV-1 and other papillomaviruses. BPV-1 transcripts are either bicistronic or polycistronic and contain only one type of intron: the U2-type GU-AG or GC-AG intron (Figure 2), which is the most common intron in all eukaryotic transcripts. The U12-type AU-AC intron is rare in eukaryotic transcripts and has not been identified in any of the papillomavirus transcripts to date. The bicistronic or polycistronic features of BPV-1 transcripts, composed of several ORFs in each transcript (Figure 1), are common in RNA transcripts of all other papillomaviruses. In addition, alternative RNA splicing, which disrupts or removes an ORF from a given transcript to direct a particular ORF usage for protein translation (29) is an evolutionarily conserved function in the gene expression of all papillomaviruses. 3.1. Splicing of viral early transcripts Transcripts from the P89 promoter have two introns and three exons. The RNAs spliced from nt 304 to 528 cannot encode the E6 or E7 protein, but the transcripts with this intron retention can only encode E6 protein. The question remains whether this transcript encodes E7 protein. While it is presumed that E1, E6, and E7 proteins are made in basal cells from transcripts originating at promoter P89, this has yet to be unproven. In BPV-1 transformed mouse C127 cells, a transcript with a 5' end at nt 429 was hypothesized to direct the synthesis of E7 protein, suggesting the presence of a separate putative promoter for E7 transcription (15). However, this hypothesis was not validated in a separate study (22). The second intron between nt 865 and 3224 of P89 transcripts is spliced usually, but can be retained in a small portion of the transcripts. Splicing from nt 304 to 3225 in P89 transcripts will delete the exon from nt 528 to 864 and create a transcript which has potential to encode a fusion protein E6^E4 using an AUG initiation codon at nt 91-93 and a UGA termination codon at nt 3527-3529 (15). Another P89 transcript spliced from nt 304 to 2558 may encode a full-length E2 protein (14,30). The E1 ORF is the largest in the BPV-1 genome and its expression is essential for viral DNA replication initiation and elongation. E1 proteins are well-conserved among bovine and human papillomaviruses. Transcripts containing the E1 ORF region without splicing are very rare, but are detectable in cycloheximide-treated, BPV-1-infected bovine cells by a probe from a central portion of the E1 region (31). The E1 protein (68 kDa) is also detectable in BPV-1 transformed mouse C127 cells (32). The intron 2 in the E1 ORF has an alternative 5' splice site (5' ss), nt 1235 5' ss. Splicing of the transcript from the nt 1235 5' ss to nt 3225 3' splice site (3' ss) produces a transcript that encodes a 23-kDa E1M protein, which contains 129 amino acid residues from E1 and 13 amino acid residues from downstream sequences outside of E1 ORF (5,32). However, the biological function of E1M protein to papillomavirus replication remains uncertain (33). Transcripts expressed from the P890 promoter may splice from nt 1235 5' ss to two alternative 3' ss either at nt 2558 or at nt 3225. Transcripts using nt 3225 3' ss encode a transcriptional repressor E8^E2 (5). Transcripts using nt 2558 3' ss have the potential to express a full-length E2 protein. These transcripts can be found in mouse cells transfected by BPV-1 DNA and the level of expression is the same as in the unspliced transcripts expressed from the P2443, which also encode the full-length E2 transactivator (30). Transcripts from the P2443 promoter spliced from nt 2550 to 3225 encode the E5 oncoprotein which is required for cell transformation.
3.2. Splicing of viral late transcripts The BPV-1 life cycle is tightly linked to the differentiation state of their host keratinocytes (26,34). The structural proteins, L1 and L2, are only expressed in cells of the granular or most differentiated layer of the epithelium. The PL promoter is activated in both the upper spinous and granular cell layers of the bovine fibropapilloma (26). Transcripts from the late promoter PL start at nt 7250 and share the common immediate downstream 5' ss at nt 7385, but utilize two alternative 3' ss, nt 3225 3' ss or nt 3605 3' ss, for their expression (34). Splicing from nt 7385 to 3225 and polyadenylation at the AL site form a large mRNA to encode L2 protein. Splicing from nt 7385 to 3605 activates the removal of another intron downstream from nt 3765 to 5608 (26). This alternative splicing produces a smaller transcript that encodes L1, a major capsid protein. However, a small portion of the late transcripts that use the nt 3605 3' ss for splicing does not remove the downstream intron and consequently produce an RNA species L2-S. This transcript is smaller in size than the L2 RNA (also called L2-L) which is spliced at the nt 3225 3' ss. Northern analysis of bovine fibropapilloma tissues has shown that both L1 and L2 mRNAs are abundantly expressed transcripts (26). ORF E4 completely overlaps with ORF E2 in a different open reading frame. The function of BPV-1 E4 is unknown. E4 does not promote mouse C127 cell transformed focus formation and has no transcriptional activities (35). Transcripts from the PL or P7185 promoters spliced from nt 7385 to 3225 and polyadenylated at the AE have the potential to encode E4 (13); those spliced at nt 3605 have the potential to encode E5 if polyadenylated at the AE site. Several other splice sites, such as nt 1613 5' ss and nt 1940 5' ss, and nt 1866 3' ss, nt 1032 3'ss, and nt 1024 3' ss in the E1 region are rarely utilized, but have been identified only in ID13 cells (5). 3.3. Splicing elements in BPV-1 transcripts A splicing element is any element in an RNA transcript that is required for RNA splicing. Prior to 1993, these splicing elements were usually those elements in the RNA introns that are necessary for spliceosome-mediated intron recognition and splicing. These include an intron 5' end for U1 binding (Figure 2A and 2B) and several elements at RNA intron 3' end important for the identification and accurate exclusion of an intron from a pre-mRNA. The intron 3' end has three cis elements: a branch point (BP), followed by a polypyrimidine tract (PPT), and a terminal AG dinucleotide. During spliceosome assembly, U2AF65 and U2AF35 bind to the polypyrimidine tract and terminal AG dinucleotide, respectively, and facilitate binding of U2 to the branch point sequence via its RNA base-pairing (Figure 2A) (36-42). However, identification of an exon recognition sequence which enhances the splicing of a GU-AG intron in 1993 (43) was surprising and elicited immediately a search for exonic elements that stimulate efficient RNA splicing. This search led to the discovery of a series of exonic splicing enhancers (ESEs) and exonic splicing suppressors or silencers (ESSs) that regulate alternative splicing of BPV-1 late transcripts (44). As described above, BPV-1 L1 and L2 are transcribed from the same late promoter, PL, as a bicistronic transcript that contains three exons and two introns. Intron 1 of the late transcripts is a typical GU-AG intron and bears two alternative 3' ss, at nt 3225 and nt 3605, whereas the intron 2 is a GC-AG intron (Figure 2A) and has L2 ORF within it. In bovine warts, selection of the nt 3225 3' ss allows retention of intron 2 and L2 expression. In contrast, selection of the nt 3605 3' ss can activate splicing of the intron 2 leading to L1 expression. Notably, the nt 3225 3' ss is also a common 3' ss for intron 2 splicing of all early viral transcripts (Figure 1). Characterization of BPV-1 nt 3225 and 3605 3' ss have shown that both nt 3225 and 3605 3' ss are functionally suboptimal and have a nonconsensus BP sequence and a weak PPT interspersed with purines (Figure 2A and 2C). Thus, the selection of these suboptimal 3' ss is regulated by ESEs and ESSs (45). Five cis elements are involved in the regulation of the selection of the two alternative 3' ss, including three ESEs (SE1, SE2, and SE4) and two ESSs (ESS1 and ESS2) (Figure 3). SE1, ESS1, and SE2 are located between the nt 3225 3' ss and nt 3605 3' ss (Figure 3A). SE1 and SE2 are purine-rich ESEs and synergistically promote the selection of nt 3225 ss over the suppression by the ESS1, which contains a suppressive core motif GGCUCCCCC (45-48). With respect to the nt 3605 3'ss, SE1 and SE2 are also intronic elements and may have the potential to regulate usage of the nt 3605 3' ss. SE4 and ESS2 are located between nt 3605 3' ss and nt 3764 5' ss. SE4 is an AC-rich element and promotes selection of the nt 3605 3' ss over the suppression by ESS2. ESS2 contains a UGGU core suppressor motif, which may base pair with the functional CACCACCAC motif of SE4. Two copies of UGGU are more inhibitory for in vitro RNA splicing (49). Although SE4 may function as an ESE in the context of different 3' ss, ESS2 functions as an ESS only in a splice site-specific manner (49). This feature of the ESS2 is in contrast to the ESS1 which functions in a less splice site-specific manner and inhibits not only 3225 3' ss in BPV-1 late pre-mRNA, but also suboptimal 3' ss in human immunodeficiency virus tat-rev pre-mRNA, Drosophilia dsx pre-mRNA, and Rous sarcoma virus src pre-mRNA. Optimization or strength of 3225 3' ss counteracts the repression function of ESS1 (48). Whether a splicing element is required to regulate alternative splicing of early viral transcripts remains unknown. As the nt 3225 3' ss is also a major 3' ss for many early viral transcripts, its utilization in splicing of viral early transcripts, by analogy, might be also associated with the activities of SE1, ESS1, and SE2, three exonic cis-elements in regulation of alternative RNA splicing of viral late transcripts (Figure 1). This may be also true for the expression of viral E2. The E2 protein plays an important role in viral DNA replication and transactivation of viral genes. As we discussed previously, two types of RNA transcripts, P890 and P2443, have the potential to encode a full-length E2 that functions as a transcriptional transactivator (E2-TA). The P890 transcript contains an intron with a 5' ss at nt 1235 and two alternative 3' ss; one 3' ss position at nt 2558 and another at nt 3225 (Figure 4). Thus, the P890 transcripts can splice from the nt 1235 5' ss to either nt 2558 3' ss for the expression of a full-length E2-TA or nt 3225 3' ss for the expression of an E8^E2 fusion that functions as a transcriptional repressor. The P2443 transcript has an intron with a 5' ss at nt 2505 and a 3' ss at nt 3225. Retention of this intron in the transcript leads to the expression of a full-length E2-TA (Figure 4), but splicing of this intron can result in the production of E5 (Figure 1). Thus, it is conceivable that a delicate regulation in control of RNA splicing of these two transcripts would be essential for a well balanced expression of E2-TA, E8^E2, and E5. 3.4. Cellular splicing factors in BPV-1 RNA splicing BPV-1 SE1 and SE2 are purine-rich ESEs and each has two ASF/SF2 (alternative splicing factor/splicing factor 2) -binding sites similar to the ASF/SF2 consensus sequence RGAAGAAC (R=A or G): GAAGGAC and GAAGGAG for SE1 and GGAAGAAG and GGAAGAAC for SE2 (Figure 3B) (46). Binding of ASF/SF2 to SE1 and SE2 enhances RNA splicing at the nt 3225 ss. Cells with a high level of ASF/SF2 increase usage of the nt 3225 ss, over the nt 3605 3'ss (50). In fact, ASF/SF2 expression in the upper epithelial layers with highly differentiated keratinocytes is significantly reduced when compared with the levels in basal or superbasal layers with undifferentiated keratinocytes (51). Other SR (serine/arginine-rich) proteins also bind SE1 and SE2. These proteins are SC35, SRp55, and SRp75 (46) (Jia R and Zheng ZM, unpublished data). However, the binding motifs for these proteins in both SE1 and SE2 remain unidentified. BPV-1 ESS1 is a 48-nt pyrimidine-rich sequence motif (Figure 3B) and has a U-rich 5' region interacting with U2AF and PTB proteins, a central C-rich part binding 35- and 54- to 55-kd SR proteins, and an AG-rich 3' end interacting with ASF/SF2. In general, the PTB binding site overlaps the U2AF65 binding site in the PPT region of RNA 3' ss. Binding of PTB to this site prevents the binding of U2AF65 to PPT and negatively regulates splicing of the downstream 3' ss. Although both U2AF65 and PTB bind the ESS1 5' U-rich region, their binding to this region is not essential for ESS activity. Instead, the most critical region of ESS1 is the central C-rich core GGCUCCCC. Binding of SR proteins to the central core, along with additional SR protein binding to the sequence immediately downstream of the core, is sufficient for partial suppression of spliceosome assembly and splicing of BPV-1 pre-mRNAs (47). It remains unknown what binds ESS2 and executes its negative role in selection of the nt 3605 3' ss. As an AC-rich ESE, BPV-1 SE4 interacts with SRp20 and YB-1, and regulates alternative RNA splicing of viral late pre-mRNAs. Binding of SRp20 to SE4 suppresses selection of the BPV-1 late-specific nt 3605 3' ss for L1 expression, but favors usage of the nt 3225 3' ss for splicing to express viral L2 and early proteins. Thus, cells with deficient expression of SRp20 have more L1 expression. In terminal differentiated keratinocytes of bovine wart tissues, L1 expression inversely correlates with levels of SRp20 (Jia R and Zheng ZM, Unpublished data). In contrast, binding of YB-1 to SE4 has only a minimal, positive effect on selection of the nt 3605 3' ss. 4. REGULATION OF BPV-1 GENE EXPRESSION BY ALTERNATIVE POLYADENYLATION RNA polyadenylation plays an important role in the control of eukaryotic and viral gene expression. This process involves cleavage of the nascent transcript (52,53) and addition of a poly (A) tail with 150-200 adenylate residues following RNA splicing in the nucleus. Cleavage and polyadenylation are tightly coupled events that are triggered through recognition of three RNA signals by the cellular polyadenylation machinery: a highly conserved AAUAAA hexamer, a cleavage site generally positioned 10-30 nt downstream of AAUAAA, and a G/U- or U-rich element that is ~10-30 nt further downstream of the cleavage site (54). Cellular polyadenylation factors include the cleavage and polyadenylation specificity factor (CPSF) that binds to the AAUAAA hexamer, the cleavage stimulation factor (CstF) that binds to the G/U- or U- rich sequence, the cleavage factors I and II (CFI and CFII) that bind to the cleavage site, poly (A) polymerase (PAP) for poly (A) addition, and the poly (A) binding protein (PABP) that associates with the nascent poly (A) tail to facilitate RNA export and translation (55,56). The BPV-1 genome contains two major polyadenylation signals: the major early polyadenylation signal AE at nt 4180 and the major late polyadenylation signal AL at nt 7156. Recognition of the AE at nt 4180 promotes the cleavage of all early transcripts at nt 4203 for addition of a poly (A) tail (15) and precludes the expression of viral L1 and L2 genes. This polyadenylation activity requires that all viral early transcripts are terminated before reaching the viral late AL signal and are in preferential selection of the viral ealy AE. Recognition of the AL at nt 7156 leads to the cleavage of viral late transcripts L1 and L2 at nt 7175 for polyadenylation (13), but this polyadenylation is restricted in those keratinocytes undergoing terminal differentiation. Since all viral transcripts including viral early and late transcripts are initiated from a promoter positioned upstream of the major early poly (A) signal AE, the viral late transcripts L1 and L2 must bypass the early AE signal to preferentially use the downstream AL signal for their polyadenylation. Keratinocyte differentiation probably increases the readthrough of the early AE signal for late gene expression. This might be in part due to changes in activities and levels of polyadenylation and other RNA processing factors, in addition to viral DNA amplification and induction of the late promoter PL. Thus, it is conceivable that efficient usage of the viral early AE signal would regulate the expression of viral late genes. However, mutation of the AE has no effect on the levels of viral late transcripts and polyadenylation at the AL, rather it induces preferential use of a noncanonical polyadenylation site UAUAUA, approximately 100 nts upstream of the major AE (57). It has been noted that all viral early transcripts are indeed polyadenylated mainly at the viral early AE. By doing so, the viral early transcripts terminate their transcription within the late region beginning approximately 1000 nts downstream of the AE before reaching the AL (58) and undergo an endonucleolytic cleavage at the nt 4203 with a little site heterogeneity for polyadenylation. Viral cis-elements also control polyadenylation of papillomavirus late transcripts (59-61). This was initially discovered in BPV-1 late transcripts. A 53-nt negative regulatory element identified in the BPV-1 late 3' UTR upstream of the AL (Figure 1) contains a sequence motif homologous to a 5' ss or U1 binding site and inhibits poly (A) tail addition, leading to destabilization of viral late transcripts (59,62). The 53-nt element does not appear to function by destabilizing polyadenylated cytoplasmic RNA (59). Similar U1 binding sites also exist in the HPV16 and HPV31 late 3' UTR and inhibit polyadenylation of those late transcripts (61,62). The mechanism of inhibition involves U1 snRNP binding. In the presence of high levels of free U1 snRNP, BPV-1 late transcripts can be cleaved, but can not be polyadenylated at the RNA 3'-end, and consequently can be degraded rapidly. This mechanism is different from HIV-1, in which the pre-mRNA cleavage is inhibited by the interaction between U1 snRNP and a true 5' ss downstream of the pA site in 5' LTR (63). The inhibition in BPV-1 late transcripts is mediated by the direct interaction between U1 70K protein, one component of U1 snRNP, and poly (A) polymerase (PAP). U1 70K has four PAP inhibitory motifs. Mutation of all four motifs in U1 70K could inhibit PAP to polyadenylate BPV-1 late transcripts (64). However, it remains unknown how the late transcripts can be polyadenylated at the AL in the presence of 53-nt inhibitory element in differentiated keratinocytes. A logical presumption is that the polyadenylation inhibition of BPV-1 late transcripts may be relieved by the decreased level of free U1 snRNP or its associated factors during the differentiation of keratinocytes (64). U1A is another component of U1 snRNP and is involved in the inhibition of its own pre-mRNA polyadenylation (65) and usage of a secretory pA site in immunoglobulin M (IgM) RNA in B cells. Reduction of U1A during the differentiation of B cells activates the selection of the secretory pA site, suggesting that the repression of polyadenylation by U1 snRNP appears to be associated with cell differentiation (66). 5. COUPLING OF RNA SPLICING AND POLYADENYLATION TO RNA EXPORT After splicing and polyadenylation, a mature mRNA in the nucleus is ready to be exported to the cytoplasm for protein translation. Since RNA export requires association with cellular export factors, identifying such factors essential for BPV-1 early and late mRNA export may elucidate other mechanisms of regulation. Various studies indicate that RNA splicing, polyadenylation, and export are coupled events (67,68). RNA splicing creates an exon-exon junction and recruits several proteins including hUpf3B, Aly/REF, SRm160, NXF1, and RNPS1, Magoh, and Y14/RBM8 to bind to an RNA region approximately 20 nts upstream of the junction (69,70). Aly/REF is one of the proteins in the protein complex or exon-junction complex (EJC) and was once thought to be an important factor for all RNA export. Recent studies indicate that Aly/REF plays no role in RNA export (71-74) despite the fact that Aly/REF interacts with several important export factors, including UAP56, a splicing factor and an important mediator in RNA export (75) and NXF1 /TAP, a required RNA transport factor that acts as a heterodimer with p15/Nxt to catalyze translocation of the mature mRNA into the cytoplasm (76). In addition to the roles of the proteins in the EJC, two small members of SR protein family, SRp20 and 9G8, may have some roles in BPV-1 RNA export. SRp20 and 9G8 interact with TAP and mediate RNA export (77-80). Recent studies demonstrated that SRp20 preferentially binds to an A/C-rich SE4 both in viral early and late transcripts in regulation of viral RNA splicing (Jia R and Zheng ZM, unpublished data), but its possible role in viral RNA export can not be excluded. Several factors involved in RNA polyadenylation may also contribute to viral RNA export. Specifically, PABP binds to the nascent poly (A) tail and facilitates RNA export and translation (55,56). We have noticed that BPV-1 transcription from a BPV-1 late minigene (deletion of a large region with all viral early promoters) in 293 cells terminates gradually from nt 6509 to nt 6840, but only those transcripts that are cleaved and polyadenylated at the AE site are stable and can be exported into the cytoplasm (Liu XF and Zheng ZM, unpublished data), indicating the importance of the mRNA 3' end polyadenylation in stabilization and export of BPV-1 mRNAs. 6. REMARKS AND PERSPECTIVES BPV-1 has served as a prototype for studying viral life cycle, molecular biology, and pathogenesis of papillomaviruses. This is mainly attributable to the following reasons: (1) The BPV-1 genome had to be one of the first two papillomavirus genomes completely sequenced in the early 1980s (1,81); (2) A simple focus assay was established to study BPV-1 transformation in the late 1970s (82) which allowed researchers to study the viral gene functions involved in BPV-1-induced proliferation and transformation of mouse C127 cells or NIH 3T3 cells in in vitro cell culture; (3) A complete transcription map of BPV-1 in mouse C127 cells and bovine fibropapillomas was constructed in the late 1980s (83) and has been the best transcription map for understanding the complex gene expression of other papillomaviruses. The transcription mapping of BPV-1 and later other papillomaviruses has provided a foundation to study viral gene structure and expression, leading to the conclusion that all viral transcripts expressed along with keratinocyte differentiation are bicistronic or polycistronic transcripts, with two or more ORFs in each transcript using either an early or a late poly (A) signal for polyadenylation. This fundamental consensus further led to the identification of viral RNA cis-elements and cellular factors involved in regulation of BPV-1 RNA processing in the early 1990s. As a prototypical virus, studies of BPV-1 have greatly advanced our knowledge of papillomavirus biology and pathogenesis. Despite the recent shift in focus to more applicable prevention of human papillomavirus infection in the past few years, there are many fundamental questions left in papillomavirus basic research as we discussed in our previous review (2). With respect to viral gene expression and RNA processing, we have only a limited knowledge on which viral transcript encodes what viral protein in the context of the virus genome, as almost all viral transcripts are transcribed as bicistronic or polycistronic RNAs with two or more ORFs. BPV-1 E6 and E1 ORFs contain a classical U2-type GU-AG intron and L2 ORF a U2-type GC-AG intron. Additional studies are needed to determine what makes intron retention during RNA splicing of these three transcripts or how the transcripts with a retained intron are exported from the nucleus to the cytoplasm for translation into a protein. Transcription of viral late genes and the processing of their transcripts are tightly coupled to keratinocyte differentiation. It will be important to determine what cellular and viral factors contribute to the differentiation-specific regulation of viral late gene expression, which leads to an early-to-late switch of the virus life cycle. The finding that cellular SRp20 partially controls the viral early-to-late switch (Jia R and Zheng ZM, unpublished data) has laid some groundwork for more mechanistic studies. Together with the finding that ASF/SF2 binds the HPV16 3' UTR control element and its expression is regulated during differentiation of virus-infected epithelial cells (51), we propose that a reduced expression of SR proteins or other factors for RNA processing along with keratinocyte differentiation is involved in viral early-to-late switch (Figure 5). 7. ACKNOWLEDGEMENTS We thank Barbara Spalholz and the NIH fellows editorial board for their critical reading of the manuscript. This research was supported by the Intramural Research Program of the NIH, National Cancer Institute, Center for Cancer Research. 8. REFERENCES 1. E. Y. Chen, P. M. Howley, A. D. Levinson & P. H. Seeburg, The primary structure and genetic organization of the bovine papillomavirus type 1 genome, Nature 299, 529-534 (1982) 2. Z. M. Zheng & C. C. Baker, Papillomavirus genome structure, expression, and post-transcriptional regulation, Front Biosci. 11, 2286-2302 (2006)
45. Z. M. Zheng, P. He & C. C. Baker, Selection of the bovine papillomavirus type 1 nucleotide 3225 3' splice site is regulated through an exonic splicing enhancer and its juxtaposed exonic splicing suppressor, J.Virol. 70, 4691-4699 (1996)
Key Words: Papillomaviruses, Gene expression, RNA splicing, RNA polyadenylation, Post-transcriptional regulation, Review Send correspondence to: Zhi-Ming Zheng, 10 Center Dr. Rm. 6N106, Bethesda, MD 20892-1868, Tel: 301-594-1382, Fax: 301-480-8250, E-mail:zhengt@exchange.nih.gov |