[Frontiers in Bioscience 14, 4968-4977, June 1, 2009] |
|
|
RNA processing in the polyoma virus life cycle Yingqun Huang1, Gordon G. Carmichael2
1 TABLE OF CONTENTS
1. ABSTRACT Not only is gene regulation in polyoma interesting, but it has also proven to be highly informative and illustrative of a number of novel concepts in gene regulation. Of special interest and importance are the mechanisms by which this virus switches from the expression of early gene products to late gene products after the onset of viral DNA replication. This switch is mediated at least in part by changes in transcription elongation and polyadenylation in the late region, and by the formation and editing of dsRNA in the nucleus. In this review we will summarize the regulation of RNA synthesis and processing during polyoma infection, and will point out in particular those aspects that have been most novel. 2. INTRODUCTION Murine polyoma virus lytically infects mouse cells in tissue culture. In wild mice it is not known to be pathogenic, but in baby mice and in other rodents it is highly tumorigenic, and efficiently transforms rat or hamster cells in culture (1). In cell culture, polyoma lytically infects mouse cells and oncogenically transforms rat cells. The polyoma genome is small and compact, and viral gene expression is carefully regulated during infection in order to maximize mature viral output and to minimize effective host antiviral responses. Because its genome is so small, slightly larger than 5000 base pairs, the expression and regulation of viral gene expression relies heavily on the host cell machinery. This small size allows ease of manipulation of the viral genome and makes polyoma a good model system for studying not only the molecular biology of cell transformation and tumorigenesis, but also mechanisms of regulation of eukaryotic gene expression. As we will point out below, there are a number of aspects of polyoma RNA processing that offer unique insights into several novel modes of mammalian gene control. 3. GENOME ORGANIZATION
The polyoma genome is a circular DNA molecule of about 5300 base pairs. The genome is divided into "early" and "late" regions, which are expressed and regulated differently as infection proceeds (2-5) (Fig. 1A). The early and late transcription units extend in opposite directions around the circular genome from startsites near the unique, bidirectional origin of DNA replication (5, 6). Primary RNA products from the early transcription unit are alternatively spliced to yield three early mRNAs which code for the large T antigen (100 kDa), the middle T antigen (56 kDa) and the small T antigen (22 kDa). Large T binds to sequences in or near the DNA replication origin region (7-10) and is involved in the initiation of DNA replication, indirectly in the autoregulation of early-strand RNA levels (11-13) and indirectly in the activation of high levels of expression from the late promoter (13, 14). The other two early proteins are dispensable for lytic infection, but are important for cell transformation. Late primary transcripts accumulate after the onset of DNA replication and are also spliced in alternative ways to give mRNAs which code for the three viral structural proteins VP1, VP2 and VP3. Fig. 2 shows the relevant features of the early and late regions. Note in particular that in the early region there are two alternative 5' splice sites and two alternative 3' splice sites, while in the late region there is a single 5' splice site and three 3' splice sites, one lying upstream of the 5' splice site. 4. TEMPORAL REGULATION OF VIRAL GENE EXPRESSION
Gene expression during lytic infection of permissive mouse cells proceeds in a well defined temporally regulated manner (2, 15, 16) (see Fig. 1B). Immediately after infection, RNA from the early transcription unit (E-RNA) begins to accumulate; however, RNA from the late transcription unit (L-RNA) accumulates more slowly. At 12 hours after infection, the early-late RNA ratio is about 4 to 1 (2, 15, 17) and in the presence of DNA replication inhibitors, the ratio is 10 to 1 or higher. At 12-15 hours post-infection, viral DNA replication commences and L-RNA begins to accumulate rapidly while E-RNA accumulates at a slower rate. In fact, the absolute amount of E-RNA in the cell is similar at 12 hr and 24 hr post infection (2, 15, 17). Thus there is a dramatic change in the relative abundances of E- and L-RNA; by 24 hours post-infection, the early to late RNA ratio is as low as 1 to 50 (2, 15, 17). This early-late "switch" is dependent on viral DNA replication; if replication is inhibited, E-RNA accumulates to abnormally high levels with minimal accumulation of L-RNA (11, 12, 17-19). It has been commonly accepted in the field for many years that the early-late switch is the result of T antigen repression of the early promoter, coupled with a transactivation of the late promoter (1). In fact, there is little experimental support for this notion, and it is not true. This temporally regulated switch is not controlled mainly at the level of transcription initiation, but results from changes in transcription elongation and/or RNA stability (13, 17, 20, 21). Work from our laboratory has shown clearly that late RNA accumulation is regulated post-transcriptionally, by what appears to be a novel RNA titration event (21), while early RNA levels are regulated by nuclear antisense RNA, which results in extensive editing of early-strand RNA molecules at late times in infection (see below) (21). 5. THE EARLY PROMOTER
The polyoma early promoter resembles other known viral and cellular promoters, and the regulation of early gene expression has been extensively studied. About 30 nt before the startsite there is a TATA box sequence and far upstream lies a 240 bp enhancer region which consists of two basic enhancer elements, A and B, both required for wild type levels of early transcription (22-25). These elements can function independently to stimulate early transcription, with differing cell specificities (23). Specific sequences in the enhancer region that are required for early promoter function have been revealed by deletion analysis (23, 24, 26-31). Interestingly, wild type polyoma cannot grow in embryonic mouse cells, owing to a deficiency in essential transcription factors in undifferentiated cells. However, mutations in the enhancer B region can allow viral expression in these cells (32-37). 6. EARLY RNA SPLICING
The mechanism by which early mRNAs are produced by alternative splicing is interesting and unusual. In particular, the introns for mRNAs for both small T and middle T antigen are very short (62 nt for the middle T intron and only 48 nt for the small T intron) (Fig. 3A). Further, the branchpoint for middle T lies 18 nt upstream of its 3' splice site, while one prominent (mapped in vitro) branchpoint for small T is at the same position, lying only 4 nt upstream of the small T 3' splice site (38). This distance between branchpoint and 3' splice site is far shorter than that found in other systems. Curiously, and by a mechanism that remains unclear, there is essential cooperation between these two splice sites, both in vitro (38) and in vivo. Thus, remarkably, when we mutated either the middle T 3' splice site dinucleotide AG to AA (mutant Py808A) (39) or the middle T branchpoint from G to G (Py791G), the splicing of both small and middle T antigen was completely blocked (Fig. 3B, C). At the same time, however, large T antigen mRNA was still efficiently spliced, even though it uses the same 3' splice site as small T. Rather than being a mere curiosity, this system may prove of interest in future studies on the mechanisms of regulation of spliceosome assembly on closely spaced, alternative 3' splice sites. 7. THE LATE PROMOTER
Late transcription startsites are heterogeneous; there are at least 15 startsites, within a 96 bp region, from nt 5077-5170 (40-45). More than 90% of these are in a 25 bp region, just upstream of the late leader exon. Almost every purine in this region can be used; pyrimidines are not used. Several labs, including ours, have studied the late promoter (14, 46-51). While there is some evidence that the early and late promoters may share common elements (47-51), both we (14, 46) and others (47) have reported similar results indicating that the major contributing element to late promoter function lies in the enhancer A region. We have shown (L. Rapp and G. Carmichael, unpublished) that our minimal late promoter consists of an initiator element (Inr) (52) that specifies the major late startsite, and which is not GC-rich as are Inrs of the housekeeping gene type (53), or homologous to the terminal deoxynucleotidyl transferase type (54). It binds the zinc finger protein Miz-1,which also binds the cellular c-myc protein (55). Although late promoter activity increases in the presence of large T antigen, there is no specific effect on the late promoter, and much of the increase in late-strand transcription may be the consequence of template amplification through DNA replication, or replication-induced template conformation (14, 20). 8. LATE RNA SPLICING AND EXPORT
Late viral gene expression has a number of unusual features that have turned out to be useful for helping to unravel fundamental aspects of RNA synthesis, processing, regulation, and mRNA transport from the nucleus. Late-strand pre-mRNA molecules are processed into mature mRNAs using a highly unusual pathway that involves inefficient polyadenylation and ordered splice site selection from precursors containing tandemly repeated introns and exons. Unlike early primary transcripts, late nuclear pre-mRNAs are heterogeneous in size, the result of inefficient transcription termination and polyadenylation, and range from about 2.5 Kb to over 60 Kb in length (40, 41, 56-59). Most late pre-mRNAs are not polyadenylated (41). Further, most late RNA sequences never leave the nucleus as they are removed during mRNA splicing, and are subsequently degraded (57, 60). Mature late mRNA molecules have polyadenylated 3'-ends which map very close to those of the early RNAs (4, 18). At their 5'-ends, late messages contain tandem repeats of the 57-base noncoding "late leader" sequence, which appears only once in the viral genome. In multigenomic late-strand transcripts, the repeats of this exon are spliced to one another, removing genome-length introns before leader-to-body splicing generates the final VP1, VP2 or VP3 messages (Fig. 4). Thus, leader-leader splicing is part of an alternative splicing pathway that is unique in the field: during RNA processing, precise alternative splice site choices are made, but between identical splice sites! For example, if late transcription proceeds around the genome five times, there will be five copies of the VP1 3' splice site in the transcript, but only the one nearest to the site of polyadenylation will be selected for splicing, and the others, though identical, will be skipped. Experimental analysis of this phenomenon revealed that in polyoma splicing requires the concomitant selection of both ends of exons, even if one of them is a poly(A) site (61, 62). What is the function of the late leader exon, and does its multiplicity at the 5'-ends of late mRNAs confer any advantage to the virus? Since the leader exon contains two regions having strong complementarity to the 3'-end of mouse 18S rRNA, the possibility existed that it might play a role in facilitating ribosome loading and translation efficiency. However, replacing the leader sequence with a variety of other inserts did not affect the translation of viral late proteins (63) and there is no evidence that the leader sequence, per se, has any role other than to participate in the late pre-mRNA splicing described above (64-66). How are the relative abundances of late mRNAs regulated? Polyoma virus late pre-mRNAs contain a single 5' splice site and two message body 3' splice sites (Fig. 2), which are not used at equal frequencies. Owing to alternative splicing, about 5% of all late mRNAs encode VP2 (no message body splice chosen), about 15% encode VP3 (promoter-proximal 3' splice site chosen), and about 80% encode VP1 (promoter-distal 3' splice site chosen). Interestingly, splice site strength does not appear to determine the ratio of spliced products. Constructs containing duplicated or rearranged 3' splice sites and sequences throughout the late region indicated that the 3' splice closest to the polyadenylation site (the shortest terminal exon) is always used preferentially. Thus, in polyoma, late splicing choices appear to be determined largely not by sequence, but rather by the relative position of the message body 3'-splice sites (67). In contrast to mVP1 and mVP3, mVP2 messages have no body splice, and a fraction of them (representing only about 5% of all late mRNAs) are completely unspliced. This balance between unspliced and alternatively spliced products is reminiscent of the situation found in retroviruses, which must produce both spliced and unspliced messages. Interestingly, although as many as 50% of all mVP2 RNAs are unspliced, many of these mRNAs are nevertheless exported to the cytoplasm. This is of interest because splicing is generally a prerequisite for mRNA export, unless specific cis-acting sequences in the unspliced mRNAs (often found in retroviral genomes) override retention. Examination of the intracellular distribution of late viral messages revealed, however, that mVP2 molecules are exported less efficiently than mVP1 and mVP3, in which the 5' splice site has been removed by splicing (68). Point mutations and deletion analyses demonstrated that the efficiency of mVP2 export is inversely correlated with the strength of the 5' splice site and that unused 3' splice sites present in the message have little or no effect on export (68). These results suggested that the unused 5' splice is a key player in mVP2 export. Further, results comparing spliced and unspliced forms of mVP2 molecules indicated that the process of splicing does not enhance nuclear export. Since mVP2 and some of its mutant forms can accumulate in the cytoplasm in the absence of splicing, it was proposed that in the polyoma virus system, removal of splicing machinery from mRNAs may be required, but that splicing itself is not essential (68). 9. REGULATION OF THE EARLY-LATE SWITCH
Contrary to the straightforward notion that the early-late switch is controlled primarily at the level of transcription initiation, the regulation has turned out to be much more complex, and consequently much more interesting. Both reporter assays (14, 46) and nuclear run-on analyses (17) revealed that the early and late promoters are of similar strength, and do not appear to be differentially regulated in response to viral DNA replication or early proteins such as large T antigen. Thus, at both early and late times in infection, transcription initiates at similar rates from both promoters. However, at early times late-strand RNAs fail to accumulate efficiently, and at late times early-strand RNAs appear to accumulate to much lower levels than late-strand RNAs. The result is that before viral DNA replication commences early mRNAs outnumber late mRNAs by a factor of 5 or more, while at late times in infection late mRNAs can accumulate to levels 20-50 times more abundant than early mRNAs. 9.1. Downregulation of early RNAs at late times in infection via late-strand antisense When late RNAs accumulate, late-strand transcription termination is inefficient (Fig. 4). This gives rise to giant primary transcripts that include intronic late-strand sequences that are antisense to early-strand transcripts. Mutations that destabilize these naturally-occurring nuclear antisense RNAs (splice site mutations in the late region) always have the phenotype of overexpression of early RNAs, as one would expect if early RNAs were in fact regulated by nuclear antisense RNA (13). In addition, if late RNAs are overexpressed, early-strand RNAs are repressed even more. This natural regulation can be quantitatively mimicked using nuclear antisense expression vectors (13, 21). If the polyadenylation signal is replaced with a hammerhead ribozyme sequence then RNA polymerase II transcripts are not exported from the nucleus, and accumulate there (69). Such molecules can act as specific and effective antisense inhibitors of gene expression. Nuclear antisense RNA leads to double strand RNA (dsRNA) formation in the nucleus. These dsRNA molecules serve as substrates for promiscuous adenosine-to-inosine editing by the ADAR enzyme at late times in infection (70). Editing of long dsRNAs results in up to about 50% editing of all A's to I's within these molecules, and editing occurs on both strands (71). Promiscuously edited RNAs are quantitatively retained in the nucleus and therefore are not translated into mutant proteins in the cytoplasm (70, 72). Thus, RNA editing dramatically reduces the amount of cytoplasmic translatable early-strand mRNAs at late times. On the other hand, edited late-strand sequences lie within a large intron which is removed and degraded in the nucleus, so that editing does not directly impact late-strand mRNAs. 9.2. Activation of late RNA accumulation Why do late mRNAs accumulate to high levels only after the onset of viral DNA replication? Nonreplicating genomes express only very low levels of late-strand transcripts, while early-strand RNAs accumulate normally (13). However, late genes from nonreplicating genomes can be turned on if a replicating polyoma genome is introduced into the same cell (13). By systematically altering the replicating genome and looking at the effects of mutations on its ability to activate late gene expression from the non-replicating genome, it was demonstrated that late genes are not activated by large, middle or small T antigens, late virion structural proteins or even replicating DNA molecules (13). Rather, late mRNA accumulation appeared to be most strongly affected by late transcripts themselves! Thus, there is something about late transcripts that, in concert with the onset of viral DNA replication, can lead to the activation of late genes from a nonreplicating genome in the same cells. If a mutation that destabilizes late RNAs in the nucleus was introduced into an otherwise wild type genome, then that genome could not transactivate the late genes from the nonreplicating genome, even though DNA replication was normal for both and early gene expression (including T antigen expression) was efficient (13). Thus, the mechanism of activation of late gene expression was unexpected, and quite different from the incorrect but commonly assumed mechanism of late promoter transactivation by large T antigen (which, in fact, is based on very little, if any, direct experimental data). In order to obtain more molecular details about this novel regulation of late RNA accumulation by late-strand RNA, a number of additional late region deletion and mutation constructs were made, and sequences responsible for activating late RNA accumulation were narrowed down to the late polyadenylation signal, or to the region containing it (73). Close inspection of this region of the genome revealed an interesting feature: the early and late polyadenylation signals overlap one another (Fig. 5). Thus, at all times in infection, there is the potential for the early-strand and late-strand transcripts to anneal at their 3'-ends, leading to a duplex region of at least 45 base pairs. Could this overlap lead to dsRNA formation and ADAR editing of the poly(A) signals? If so, this would provide the basis for a model that can account for one of the major keys to the early-late switch. Further, this model would represent a novel way in which gene expression can be regulated in mammalian cells - regulation of polyadenylation by dsRNA formation. Our current working model is presented in Fig. 6. Before the onset of viral DNA replication, transcripts from both the early and late promoters are produced. From these, early mRNAs are made and accumulate. However, late-strand primary transcripts that contain only a single leader exon splice inefficiently, and are rapidly degraded in the nucleus (66). This may be related to the fact that the VP1 and VP3 3' splice sites are weak (67). Thus, since leader-leader splicing is important for late mRNA accumulation, late-strand RNAs are turned over rapidly in the nucleus (Fig. 6A). As DNA replication commences, something changes in the nature of viral RNA processing. While we do not yet know the precise nature of this molecular trigger, all available data are consistent with the notion that it involves promoting the annealing of the 3'-ends of early-strand and late-strand RNAs. This in turn could lead to the editing of at least some of them. Since polyadenylation and transcription termination are intimately linked, poly(A) site editing (or perhaps just dsRNA formation) would lead to transcription readthrough, as is observed. On the late strand, this would generate multigenomic transcripts that would allow leader-leader splicing and late mRNA accumulation. On the early strand, transcription readthrough would lead to transcripts that cannot be productively processed and which would therefore most likely be degraded rapidly in the nucleus (Fig. 6B). Further, in this model dsRNA and/or editing and polyadenylation/transcription termination would be in competition with each other. Since editing would occur before polyadenylation only some of the time, there would be a given probability of each occurring and this would lead to the heterogeneity observed in the late-strand primary transcripts in the nucleus. This new model makes a number of predictions that have been tested and which are being reported elsewhere (74). One of these is that knockdown of ADAR editing activity in mouse cells should interfere with the viral life cycle, which is observed. Another, and critical, prediction is that mutant viruses in which the 3'-ends of early and late strand transcripts can not overlap should be defective for the early-late switch, even when all other known regulatory signals are present and unchanged. This is also the case. In conclusion, studies of the polyoma virus life cycle have revealed a number of very new and interesting insights not only into how a small DNA virus can effectively use the host RNA processing machinery to regulate its temporal pattern of gene expression, but also into novel underlying mechanisms of regulation of mammalian gene expression. 10. ACKNOWLEDGMENTS
We thank N. Barrett for help with some of the work shown in Figure 3. This work was supported by grant CA04382 from the National Cancer Institute. 11. REFERENCES 1. J. Tooze, ed.: Molecular Biology of Tumour Viruses, 2nd edition: DNA Tumour Viruses. Cold Spring Harbor Laboratory, Cold Spring Harbor, New York (1980) 51. F. G. Kern and C. Basilico: Transcription from the polyoma late promoter in cells stably transformed by chimeric plasmids. Mol. Cell. Biol., 5, 797-807 (1985) Key Words: Polyoma, Gene Regulation, Antisense RNA, Editing, Polyadenylation, Review Send correspondence to: Gordon G. Carmichael, Department of Genetics and Developmental Biology, University of Connecticut Health Center, 263 Farmington Ave., Farmington, CT 06030-3301, Tel: 860-679-2259, Fax: 860-679-8345, E-mail:carmichael@nso2.uchc.edu |