[Frontiers in Bioscience 14, 900-917, January 1, 2009] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Evolutionary and biophysical relationships among the papillomavirus E2 proteins Dukagjin M. Blakaj1, Narcis Fernandez-Fuentes2, Zigui Chen3, Rashmi Hegde4, Andras Fiser1, Robert D. Burk3,5, Michael Brenowitz1
1 TABLE OF CONTENTS
1. ABSTRACT Infection by human papillomavirus (HPV) may result in clinical conditions ranging from benign warts to invasive cancer. The HPV E2 protein represses oncoprotein transcription and is required for viral replication. HPV E2 binds to palindromic DNA sequences of highly conserved four base pair sequences flanking an identical length variable 'spacer'. E2 proteins directly contact the conserved but not the spacer DNA. Variation in naturally occurring spacer sequences results in differential protein affinity that is dependent on their sensitivity to the spacer DNA's unique conformational and/or dynamic properties. This article explores the biophysical character of this core viral protein with the goal of identifying characteristics that associated with risk of virally caused malignancy. The amino acid sequence, 3d structure and electrostatic features of the E2 protein DNA binding domain are highly conserved; specific interactions with DNA binding sites have also been conserved. In contrast, the E2 protein's transactivation domain does not have extensive surfaces of highly conserved residues. Rather, regions of high conservation are localized to small surface patches. Implications to cancer biology are discussed. 2. INTRODUCTION Human papillomaviruses (HPVs) are small, double-stranded DNA viruses that infect cutaneous and mucosal epithelial tissues. Worldwide screenings for papillomaviruses have identified over 120 viral types, about one third of which infect the epithelium of the genital tract (1-4). The viral types associated with development of anogenital cancers, including those of the cervix, are denoted 'high risk' while 'low risk' viruses induce benign genital warts or cause minimal or no cytological effects (5). The taxonomy used to classify the relationship among the different papillomaviruses has been based on the L1 open reading frame (ORF) DNA sequence (1). A new papillomaviruses type is recognized if the L1 ORF differs by more than 10% from its closest relative. DNA sequence differences of 2 - 10% and < 2% differences in sequence identity define a subtype and a variant, respectively (6). The papillomavirus types have recently been reclassified into species, groups and higher order taxonomy (1). Of the viral types associated with cancer, HPV16 is associated with half of all cervical cancers. At least twenty four variant lineages of HPV16 have been identified; these variants are divided broadly into European and Non-European lineages. Studies investigating HPV16 variants and risk for cancer of the cervix and their precursor high grade lesions indicate an increased risk of disease associated with the non-European variants (7-9). For instance, an epidemiological study of 10,000 women in Costa Rica revealed that those infected with Non-European HPV16 variants were eleven times more likely than those infected with the prototype HPV16 to be diagnosed with CIN3/cervical cancer (7). The important role played by the E2 protein in papillomavirus life cycle and human infection is supported by epidemiological, evolutionary and clinical studies (1, 7, 10-13). The HPV E2 protein represses the transcription of the E6 and E7 genes in integrated papillomavirus genomes (14) and together with the E1 protein is required for viral replication. Whether the E2 protein activates or represses gene transcription is dependent on the composition of the E2 DNA binding site and its position within the Long Control Region (LCR) of the viral genome. E2 also participates in DNA replication by binding to, and recruiting the E1 helicase to the viral origin of replication (15). The regulation of the oncoproteins E6 and E7 expression by E2 has clinical significance; Loss of their E2-depenent repression through viral integration contributes to cancer progression (15, 16). The intracellular concentration, binding site affinity, cooperative interactions between E2 proteins bound to multiple sites and interaction with E1 are critical to control of viral life cycle (17-19). A goal of our comparative analysis is to extrapolate our understanding of E2 protein function from the few viral types and variants for which detailed structural, biophysical and biochemical studies have been conducted (20).
The solution of atomic resolution structures of DNA Binding Domain (E2/D) from several papillomavirus types, free and bound to DNA (20-25) together with detailed binding and thermodynamic analyses, allow nuanced inquiries into the molecular mechanisms of direct and indirect readout of DNA sequence affinity and specificity by the E2 proteins (26-30). More limited structural information about the transactivation domain provides comparable insight into the protein-protein interactions that also contribute to the biological function of the E2 protein (31-33).
This article explores E2 protein structure and function by comparative analysis to explore its role in the viral life cycle, virulence and contribution to the oncogenic potential of clinically important papillomavirus types. Our analysis of the amino acid sequence, three-dimensional structure, and the electrostatic features of the E2/D shows high conservation among all papillomavirus types, indicating that the specific interactions between the E2 protein and its binding sites on DNA have been highly conserved through evolution. 2.1. Overview of the structure of the E2 protein The papillomavirus HPV16 E2 gene encodes a DNA binding protein of 360 amino acids in length that dimerizes to regulate viral gene expression and replication (Figure 1A). The E2 protein consists of a N-terminal transactivation domain (Figure 1B) and a C-terminal DNA binding domain (Figure 1C) connected by a flexible linker. The protein binds as a dimer to its cognate DNA sequence (20, 34). Structures of the DNA binding domain (abbreviated E2/D as noted above) from several viral types have been reported, free and in complex with different cognate DNA sequences. The solved E2/D crystal structures include the human high-risk cancer associated types HPV16, HPV18 and HPV31 (23-25, 32), the low risk cancer associated types include HPV6 (30), which cause benign genital warts in humans and the cow wart causing type BPV1. Solved NMR solution structures include the E2/D from HPV16 (35), HPV31 (24, 36) and BPV1 (37, 38).
The E2/D is part of a novel structural class forming a dimeric β-barrel, with each subunit contributing a 4-stranded β-Sheet "half-barrel" (Figure 1C). The α1 helix is termed the 'recognition helix' and contains all the amino acids involved in direct readout. The dimer interface is made up of hydrogen bonds between subunits and a substantial hydrophobic β-barrel core (20). This topology is unusual since secondary, tertiary and quaternary structure is coupled. Unfolding experiments with either urea or acid suggest early dimerization as a step in the folding pathway (39). The monomer to dimer transition for this system is nM or less (34) indicating that the protein is likely a stable dimer in the cell. The sequence conservation among the numerous E2/Ds ranges from 80% identity among closely related viral types to 30% sequence identity among distant viral types (Figure 3). The N-terminal transactivation domain activates gene transcription and viral replication (Figure 1). Crystal structures of the HPV16 (31), and HPV18 (32) transactivation domains have been solved. The HPV18 protein has also been solved in complex with the E1 helicase domain (33). These structures are very similar, the C-alpha backbone of 189 residues superpose with a root means square deviation (RMSD) of ~1.2 angstroms (33). The transactivation domain contains two sub-domains, a curved anti-parallel beta-sheet domain and a helical domain containing three anti-parallel helices arranged to give the module an overall L shaped appearance (Figure 1b).
A 'linker' of 40 - 200 amino acids depending on the viral type (Figure 1A) connects the E2/D and transactivation domains. This region is poorly conserved and is believed to be unstructured. Little functional information is available for it. Some evidence exists to suggest that it is important to E2 function including nuclear localization (40). Phosphorylation of serine residues in the BPV1 E2 linker is required for viral DNA replication (41) and the linker is necessary for regulation during transcription and viral DNA replication of HPV11 E2 (40). 2.2. Overview of E2 protein DNA binding The E2 protein binds to specific DNA sequences in the viral long control regions (LCR), thereby regulating transcription of viral genes (Figure 2A). The consensus recognition sequence is ACCG NNNN CGGT where highly conserved four base pair sequences flank a four base pair 'spacer'; the E2/D homodimer binds these sequences with nM affinity (Figure 2; (20, 21, 23, 25, 26, 42, 43)). The backbone and side chains of the recognition helix mediate direct sequence-specific contacts with the DNA (Figure 2; (23)) while the variable nucleotides of the central spacer region are not contacted (20, 21, 25). The sequence of the spacer is variable and profoundly modulates the E2 protein binding affinity (26, 27, 44-46). As will be discussed in detail later in this article, unique conformational and/or dynamic properties of the spacer sequence modulates the relative affinity of E2 proteins for the binding sites present in the viral genomes (20, 27, 47-49). 3. Conserved residues yield conserved structure and function Papillomaviruses can be grouped into three phylogenetic clusters designated α, β, and other. Figure 3 summarizes the differences in amino acid conservation among the three groups for the E2/D by mapping the degree of conservation onto the HPV16 structure1 These representations show that significant conservation is only present at the DNA binding and dimerization interfaces (Fig 3A & 3C). The alpha genus is the most conserved. Inspection of atomic resolution structures of the E2/D reveals that absolutely conserved Gly293 is located in the loop connecting the recognition helix to the strands of the beta-barrel (Figure 4A). Glycine residues are well known to reside within tight turns where side chains larger than a hydrogen atom would sterically clash with adjacent side chains (50-53). Mutating Gly293 to Val and Phe in silico disrupts the predicted structure confirming that spatial constraints preclude insertion of a larger side chain into the E2 protein at this position. Molecular models of the G293V and G293F (the amino acid numbering is based on HPV16 E2 structure assignment (22)) proteins reveal a shift in the recognition helix that prevents it from contacting the DNA without distortion of one of the macromolecules (Figure 4B). Other residues are also highly conserved among the all the aligned papillomavirus sequences. Residues with > 90% conservation include: Asn296, Lys299, Cys300 and Gln349 that are located on the surface of the recognition helix and mediate direct contact with the completely conserved nucleotides of the palindromic E2 recognition sequence (Figure 2B). Only conservative substitutions Glu for Asn, Arg for Lys, Ser for Cys and Glu (predominantly) for Gln occur at these positions (Appendix 1). The residues forming the dimerization interface of the E2 DNA binding domain are also highly conserved. For example, Ser313 resides within a loop that contributes to the stabilization of the dimerization interface and is conserved in 88% of the analyzed sequences. Again, substitutions are conservative; Thr substituted for Ser in most cases except for Ile in one HPV type (ChPV; Appendix 1). Trp319, Trp321 and Pro353 participate in intersubunit contacts and are invariant in > 90% of the papillomavirus types. Val333 and Leu335 are located within the subunit interface and are characterized by ~ 75% conservation. 3.1. The electrostatic surface of E2/D types and variants The high degree of sequence identity among the E2 DNA binding domains of 122 HPV types and 24 HPV16 variants analyzed suggests corresponding conservation of structure (54). This hypothesis is supported by the low variability among the amino acids that dock the recognition alpha-helix on the beta-barrel and form the dimer interface as discussed above. To confirm the expected conservation of structure, homology models (54) of each of 122 viral types were determined using crystal structures from the alpha genus and other papillomavirus genra as templates (Figure 5). The average root means square deviation (RMSD) of the modeled structures is £ 2 angstroms indicating that they are essentially the same. No deviations were observed from the overall fold of the E2 DNA binding domain (data not shown). Thus, the amino acid variations of the studied virus types do not compromise the integrity of the domain's overall fold that is crucial for virus viability.
A subtle, but potentially important variability among the papillomavirus types is the nature of the protein-DNA interface. The electrostatic potential of the DNA binding surfaces of HPV16 and HPV18 E2 proteins differ with regard to both net charge and charge distribution (20). Since electrostatic interactions contribute significantly to protein-DNA interactions in general, and in an unique way to the E2 protein in particular (27), we used molecular modeling to explore differences in the electrostatic potential. The goal of this analysis was to assess whether the E2 DNA binding surfaces from HPV types defined as oncogenic (i.e., high risk) (5) have unique characteristics that might contribute to their ability to cause disease. We utilized the E2/D homology models from α papillomavirus types to compare the electrostatic potential of the E2 DNA binding surface. An electrostatic potential map was generated for each model by centering the domain within a 65 x 65 x 65 Ε cubic lattice with 1.5 angstrom spacing. We limited our analysis to the DNA binding surfaces and the alpha papillomaviruses. Each surface is analyzed against all the others to generate a similarity index (SI); zero (black) and one (white) denote dissimilar and identical surfaces, respectively (Figure 6A). The SI is a composite measure of similarity in net charge and charge distribution among the analyzed DNA binding surfaces. It is not surprising given the high degree of amino acid sequence similarity that the overall electropositive nature of the surface is conserved; the smallest SI among the surface potentials of the DNA binding surfaces of about 0.65 (Figure 6A).
Figure 6B compares the electrostatic potential maps of the DNA binding surfaces of the HPV6 (low risk) and HPV16 (high risk) E2 proteins whose SI is 0.70 among the most diverse pairs of proteins (Figure 6A); blue denotes a positive and red denotes a negative potential. While the HPV16 surface contains an increased electropositive surface potential within the middle of the DNA binding surface and the surface containing the recognition helix, a clear distinction between the high and low risk human mucosal papillomavirus types surfaces is not observed. The overall differences are minor corresponding to the magnitude of the potentials not their distribution. The SI scores denote strong electrostatic conservation with no correlation with epidemiological classifications. The overall conclusion drawn from this analysis is that the electrostatic nature of the DNA binding surface of the E2 proteins is highly conserved and lacks the heterogeneity necessary to explain the oncogenic potential of the mucosal human papillomaviruses.
3.2. Transactivation domain sequence conservation at the E2 - E1 Interaction Interface The assembly of the E1 and E2 proteins at the viral origin is required for the initiation of papillomavirus DNA replication (55, 56). The E1 protein is a helicase that on its own, binds with low affinity and specificity to the origin of replication; specific binding is accomplished by the cooperative binding of the E1 and the E2 proteins to adjacent sites. Once the complex is formed, E2 is displaced and additional E1 molecules are added to the origin in an ATP dependent step (33, 57). The interaction of the HPV E1 helicase and E2 transactivation domains is well defined crystallographically (Figure 8B; (33)). It is less certain whether the DNA binding domain of the human E2 protein interacts with E1. Three bases separate the E1 and E2 binding sites at the origin or replication in BPV1; this close proximity is highly suggestive that the DNA binding domains of the two proteins interact (15, 33, 58). In contrast, the alpha papillomavirus E1 and E2 binding sites are further separated; significant distortion of the DNA would be required to bring the bound DNA binding domains into direct contact (15). An amino acid sequence alignment analysis comparable to that described above for the E2/D was performed for the E2 transactivation domain to explore conservation in relation to the domain's three-dimensional structure. The alignment revealed similarity between any two PV of 36 - 100% among the transactivation domains indicating structural conservation among the papillomavirus types and variants comparable to that seen for the E2/D. Unlike the E2/D, the transactivation domain does not have extensive surfaces of highly conserved residues. Rather, the regions of very high conservation are localized to small surface patches (Figure 8). The residues Pro60, Ile73 and Gly156 are absolutely conserved (Figure 7 & Figure 8, colored red). Pro60 is within an alpha-turn-alpha motif. Glycine 156 is within a beta-turn-beta motif. Both residues are likely required for the stability of their respective turns and thus, the maintenance of the functional conformation of the domain.
High conservation is observed for a number of residues including several implicated in the interaction between the HPV11 E1 and E2 proteins (59). Tyr19, Glu20, Trp33 and Lys93 cluster around a pocket that binds a peptide which inhibits the E1-E2 interaction (Figure 3D, surrounding the green peptide; (59)). Mutation of Tyr19 to Ala inhibited the E2-E1 interaction (59). Arg37, Ala69 and Ile73 form a second cluster of residues on the opposite surface of the transactivation domain (Figure 3D, asterisk). Mutagenesis of Ile73 to Ala inhibits the E2-inhibitor interaction suggesting E1-E2 interference and that the surface defined by this cluster of conserved residues participates in the inter-protein interaction (59). Arg37 and Ile73 residues were found to be crucial for interaction with Brd4 (60). It has been suggested that this conserved surface helps regulate viral gene transcription (60-62). Highly conserved residues for which no functional correlation is available are Val59, Trp134 and Phe170. 4. Analysis of HPV16 variants Twenty four variants of HPV16 have been identified by clinical screening and fully sequenced (12, 63). These variants are closely related and group into five phylogenetic branches designated European (E), Asian (As), Asian American (AA), African-1 (Af1) and African-2 (Af2) (Figure 7; (12)). An increased risk of squamous cell cervical carcinomas and its precursor high grade lesions is associated with non-European (NE) variants of HPV16 (7-9, 64). Patients infected with the non-European variants were 11 times more likely to be diagnosed with cervical cancer relative to infection with the prototype European-related HPV16 variants (7). The correlation between viral variant and disease may be due to differences in transcriptional regulation, the biological activities of the proteins encoded by HPV16 variants or in the ability of the host to mount an immunological response to specific viral epitopes (65). The summary of this analysis shown in Figure 7 and includes the transactivation domain, the 'linker' region connecting the two domains as well as the DNA binding domain. Since there is > 90% identity among the HPV16 variants, only substitutions relative to the prototype HPV16 E2pro reference sequence are shown (66).
The HPV16 variants can be dichotomized into European and Non-European clinical isolates (Figure 7); some variants denoted in Figure 7 as transitional, show characteristics of both the European and Non-European groups. The shared characteristics may be due to a common ancestor followed by separate evolution. The DNA binding domain of the Non-European variants are characterized by T310K (also noted in two European variants), W341C, and D344E amino acid variations. The residues W341C and D344E map to the surface of this domain that interacts with the p53 protein and can thus affect apoptosis (Figure 8A; (67)). The T310K variation may increase the electropositive character of the recognition helix and is also noted in two European HPV16 variants. The transactivation domain substitutions H35Q, T135K, H136Y and R165Q that are characteristic of the Non-European variants map to surfaces implicated in the E2 - E1 interaction at the origin of replication (Figure 8B; (31, 33)). This analysis shows that while there is > 93% amino acid identity among the E2 genes of the 24 HPV16 variants analyzed, the 7% variation between the European and Non-European viruses predominantly maps to the E2 - E1 interaction surface necessary for the initiation of viral replication. This clustering of the variations to an interface critical to the viral life cycle suggests functional significance.
We also asked whether the HPV16 E1 protein also has amino acid substitutions unique to the European and Non-European variants and if so, whether they mapped to the proposed inter-protein surfaces. Our analysis of nine E1 protein sequences (available at the time), revealed the amino acid substitutions Q78E, C168S, I326M and E452D. A structural correlation can be drawn only for E452D as it is located within the E1 helicase domain for which structural information is available. This residue is on the E2 - E1 interaction surface consistent with the conclusion drawn from the analysis of the E2 protein variants (Figure 7 & 8B; (59)).
Lastly, we note a difference in the E2/D between the European and Non-European HPV16 variants. These amino acid substitutions are likely to affect the cooperative binding of E1 and E2 at the replication origin and promoters. It is possible that alteration of the balance between viral replication and expression of the E6 and E7 oncoproteins might play a role in the increased oncogenic potential of the non-European viruses towards the development of high grade squamous intraepithelial lesions of the cervix (68, 69) and life threatening malignancy (10).
5. Role of the E2 protein in malignancy and its interaction with p53 In cervical cancer, the genomes of high risk HPV types are often integrated into the host genome disrupting the E2 open reading frame (15, 55, 70-72). Since the E6 and E7 open reading frames remain intact in the integrated genome, through loss of E2 expression, they can be de-repressed resulting in expression of the E6 and E7 oncoproteins. These events abrogate cell cycle control, favor cell proliferation and thus contribute to oncogenesis (15, 20, 55, 67). Recent studies suggest that the HPV16 E2 proteins might also regulate cell proliferation and cell death through a direct interaction with p53 that induces apoptotic cell death (15, 67, 73, 74). It has been suggested that the E2 protein of high risk HPV's may also function as a tumor suppressor protein (15, 74). In contrast, the E2 protein from low risk human papillomaviruses such as HPV6 and HPV11 do not bind p53 (67). The three residues implicated in the E2 - p53 interaction, by alanine mutagenesis, are W341, D344 and D338. Mutation of these residues in HPV16 eliminates the E2 - p53 interaction and the induction of apoptosis in non-HPV transformed cell lines (67). Thus, the amino acid variations W341C and D344E (Figure 8A) that distinguish the European and Non European variants may influence the balance of proapoptotic signals by altering their interaction with p53; direct biochemical studies will be necessary to validate this hypothesis.
6. E2/D- DNA AFFINITY AND SPECIFICITY
The ability of proteins to 'read' DNA sequence is the net result of noncovalent interactions that include formation of enthalpically favorable protein-DNA contacts, entropically favorable release of bound water and ions and conformational changes in either or both partners. Base-specific interactions between protein and DNA, such as hydrogen bonds inferred from atomic resolution structures, are typically referred to as 'direct readout'. It is not unusual for conformational changes in either or both macromolecules to improve the configuration of direct interactions. In some cases, the propensity of duplex DNA to assume or change conformation is dependent upon the properties of nucleotides that do not directly contact the protein. These contributions to binding are typically referred to as `indirect readout' (20, 27, 75-78). The E2 protein utilizes both direct and indirect readout to bind its recognition sequences (Figure 1) (20).
The high level of primary sequence conservation in the E2/D results in conservation of tertiary structure and the amino acid residues that mediate direct interactions as discussed in the preceding sections (Figure 3). This conservation extends to the DNA sequence that is bound by the E2 protein; the two-fold symmetric four base pair sequences of the palindromic binding site directly bound by the E2 protein (20, 21, 23, 25, 26, 42, 43) are virtually invariant among papillomavirus genomes. Significant variability is only observed for positions -3 and +3 among the viruses analyzed in this study (Figure 2b). A little over half of the binding sites have palindromic C and G, respectively, at these two positions. The frequency for a C in position +3 or a G in position -3 is 61% and 65%, respectively. Only 10% of the remaining possible combinations of nucleotides are palindromes. The frequency of T, A or C is no greater than 15% at either position; the directly contacted base pairs are highly conserved. Taken together, these observations show that the direct component of DNA sequence specific binding by the E2 protein is critical to PV biology and has thus been 'locked in' by evolution. 'Fine tuning' of the affinity, structure and dynamics of the protein-DNA interaction can be attributed to the 'indirect' component of the reaction (26, 27).
The nature of the E2-DNA complex interface has been studied using a number of point mutations designed to effect amino acids directly within the interface of this complex (28). According to this analysis, the sum of the individual amino acid contributions differs by about 1.0 kcal/mol, or roughly 10% of the interaction energy was due to indirect readout. This study suggests that more water molecules are present at the molecular interface than visualized crystallographically for HPV18-E2/D and BPV1-E2/D DNA complexes (20). Solvent is an important component of the E2-DNA complex (27). The relative contributions of direct and indirect interactions are likely to be dependent on solution conditions. As discussed below, the contribution of indirect readout to binding can be gleaned from careful studies conducted as a function of salt concentration and type (Figure 9; (27)). Although the cognate 'spacer' sequence is not contacted by the protein, spacer sequence differences among the binding sites present in each HPV genome result in 30 - 100 fold changes in binding affinity (26, 27). For example, the E2 cognate binding sites containing spacer sequences AATT, TTAA and ACGT have distinct structural propensities (29, 47). An analysis combining gel electrophoretic mobility measurements with X-ray crystallographic analysis and theoretical structural prediction has shown that DNA sequences containing AATT are curved ~ 17� while those containing TTAA are curved by ~ 11 degrees (47). Net curvature is not observed for the ACGT sequence.
The sequence dependent effects of complex formation by the E2/D-DNA interaction have also been studied by directly comparing E2/D binding from HPV-16 and BPV-1 types. Utilizing quantitative gel-mobility shift experiment as well as solution equilibrium experiments, it was shown that the BPV-1 E2/D has moderate sensitivity to the sequence of the spacer, while the HPV-16 E2/D has a clear preference (30 - 100 fold greater) to spacer sequences rich in A:T base pairing, especially in high monovalent cation concentration or in the presence of divalent cations (Figure 9; (26, 27)). Nicked and gapped DNA sequences in the spacer region are detrimental to HPV-16 E2 binding, whereas only minimal effects in BPV-1 E2 binding were detected (26). Extending the consensus binding site by adding an AT or a GC base pair to each end results in tighter binding affinities for the HPV16 E2 DNA binding domain when compared to adding a CG or TA base pair to the identical sites (79).
The structure analysis of DNA sequences A4T4 and T4A4 show monovalent cation-dependant bending manifest as changes in NMR signal and electrophoretic mobility (80). Divalent cation dependence for A-tract sequences has been reported with Mg2+ increasing the angle of curvature by ~ 2 fold (81). These reports show that the structures of the E2 binding sites are likely to differ based on the spacer sequence, and that monovalent and divalent cations play an important role in determining the structural conformation of a given DNA sequence. Analyzing DNA structure, by utilizing the cyclization method, enabled the prediction variations in E2/D affinity for the cognate sites (29). The predictive ability proved to be correct in 15 of 16 sequences, with the sole exception being traced to differential magnesium ion binding. These results further highlight the importance of indirect readout with regard to both DNA structure as well as the role of ions in sequence specific affinity and specificity (27, 29). Cations penetrate within the grooves of duplex DNA; high resolution crystal structures show K+, Rb+ and Cs+ ions within the DNA duplex minor groove's 'spine of hydration' (82, 83). Molecular dynamics simulations show fractional occupancy of cations within the minor groove of the Dickerson dodecamer duplex (84, 85). An analysis of the electrostatic potential of this sequence identified a highly negative electrostatic potential within the 'ApT pocket' of the minor groove (86, 87). Cations are observed within the minor grooves of DNA duplexes bearing either AnTn or TnAn by NMR although their localization within the two sequences differs (88-90). Additional experimental and theoretical studies show monovalent cations localized deep within and near the top of the minor grooves of AT rich sequences, especially A tracts (84, 91-94).
The cations localized within the minor groove are hypothesized to reduce repulsion between proximal phosphates and the electronegative O2 of thymine and N3 of adenine on the groove's floor, resulting in narrowing of the groove width and facilitating bending of the helical axis (86, 95). The minor groove widths of free E2 binding sites are 9.4 Å and 12.1 Å, respectively, for the AATT and ACGT spacers (48) (Figure 1C). The structural differences of AT rich sequences as well as distinct differences in spacer sequences of cognate binding sites allows for a thorough exploration of structural effects of DNA in E2 protein binding affinity and specificity.
Biochemical and computational studies provide compelling evidence that the structure, dynamics and flexibility of the spacer DNA are a critical determinant of E2 binding affinity (26, 29). The E2 proteins in general cause a bend ~42 degrees to the spacer region of the DNA and bind to their cognate DNA sequences as homo-dimers (20). The available evidence suggests that full length E2 proteins and their isolated DNA-binding domains display comparable specificity for DNA sequence and bind to the DNA in the same manner (96). Peptides derived from the alpha1 (recognition) helix of 18 amino acid bind to the cognate sequence and not with non-specific DNA (97). This analysis revealed the propensity for this derived peptide to bind to the ACCG half site, demonstrating a capacity for discrimination of nucleic acid sequences without the need for the entire protein architecture (97).
Cations play a key role in E2/D sequence specific binding and affinity and the sequence-specific uptake of cations into the DNA upon binding of the E2 proteins is a key contribution to this binding-site discrimination (Figure 9; (27)). Augmenting the cation concentration increases the affinity of the E2 DNA binding domain for pre-bent sequences containing AT rich spacers (27). Furthermore, divalent cations also revealed an increase in affinity and specificity for human papillomavirus type 16 E2 DNA binding domain when bound to cognate binding sites containing AT rich spacers. Thus, divalent cations in the intracellular milieu are essential to the ability of HPV16 E2 protein to discriminate among binding sites with different spacer sequences.
The mechanism for DNA binding utilized by E2 is distinct from the generally observed displacement of the cations condensed around DNA, neutralizing its highly negative charge, upon the binding of proteins (98-100). The E2 protein thus utilizes a novel mechanism of indirect readout in which cations penetrate into the grooves of the bound DNA's minor groove (27). These cations neutralize the highly electronegative charge density within the minor groove of the spacer DNA resulting from its distortion from a canonical B-helix induced by E2 binding (20, 27, 87). Of the multitude of mechanisms that proteins use to recognize specific sequences of DNA, indirect readout is particularly intriguing since it is based upon their ability to distinguish subtle aspects of nucleic acid structure and dynamics. The results highlight differences in the contribution of electrostatics to spacer sequence discrimination by E2 DNA binding domains. Since the levels of K+ and Mg2+ are homeostatically regulated in mammalian cells, the cation dependence of binding is unlikely to be a direct regulator of the papillomavirus life-cycle. However, these dependences illuminate aspects of the underlying mechanism of DNA sequence discrimination by the E2 proteins that may differ among the various papillomavirus types. 6.1. Computational analysis OF E2 structure and DNA binding Insight into the contribution of DNA deformation to formation of E2/D-DNA complexes has been obtained thorough molecular dynamic simulations of the free and E2/D-bound DNA (101-104). Simulations of DNA containing the ACCGAATTCGGT E2 binding sequence that is tightly bound by HPV E2 proteins were run from the uncomplexed or E2/D-complexed starting coordinates. Both simulations rapidly relaxed to the dynamical structure represented by the crystal structure of the free DNA. This result shows that the structure of the bound DNA sequence is dynamically unstable in the absence of protein and arises as a consequence of conformational changes induced by the E2/D (101). Comparison of these simulations with those of an ideal canonical B form structure of the same sequence indicates a propensity for DNA bending to occur in the direction of the protein induced conformational changes. Since the free DNA structure containing the AATT spacer sequence is bent in the direction of the E2/D-induced conformational change, complex formation to some sequences is a consequence of both intrinsic DNA structure as well as protein induced structural change. The indirect readout mechanism manifests itself through the intrinsic structure and the flexibility of the sequence (101).
Simulations have also been conducted to explore the relative flexibility of the spacer and conserved half-sites of the E2 binding sequence. The ACGT spacer is more flexible than AAAC spacers especially in the backbone dynamics of the CpG step (102). The higher affinity of BPV1 E2/D for sites with the ACGT spacer is thus, likely due to the lesser penalty incurred in deforming the sequence upon protein binding. It was noted that the conserved half-sites behave identically and adapt conformations close to those seen in the bound conformations regardless of the spacer sequences present (104). Thus, the E2 proteins may take advantage of the invariant flanking half sites to form and initial complexes whose spacer sequence subsequently relaxes to its final conformation (104). The overall effect of E2/D binding is to diminish global DNA motion and to impose and lock base displacements and helix curvature.
6.2. Modulation of E2 protein binding Cytosine methylation at CpG dinucleotides influences transcription and replication of DNA in eukaryotic organisms. DNA methylation is hypothesized to be involved in silencing gene expression; the pattern of methylation is thought to reflect the gene expression profile of a cell. The E2 protein of papillomaviruses contributes to viral transcription and viral DNA replication, all of which are dependent on its ability to bind the consensus sequences located within the long control region (LCR; Figure 2A). E2 binding sites are potential sites for DNA methylation in the mammalian host cell because they contain CpG dinucleotides. In vitro studies have shown that methylation of the CpG dinucleotides contained within the binding site block binding of the HPV16 E2 protein. Methylation in all four sites of the E2 binding site abolished DNA binding; partially methylated sites decrease but do not abolish specific binding (105). More recent studies have indicated that the ability of E2 proteins to activate transcription is inhibited by global methylation of CpG dinucleotides (106). Furthermore, in studies that detected HPV16 LCR methylation, hypomethylation was present in well differentiated epithelial cells; in contrast, hypermethylation was present in poorly differentiated epithelial cells (106). This data suggests that DNA methylation may play an important role in modulating both transcription and replication and in turn the viral life cycle. Studies on HPV18 have shown that the methylation of CpG steps repress promoter activity (107). Since the methylation state of the viral genome within a mammalian cell may vary during its life cycle, DNA binding by the E2 protein and hence its function can be modulated by DNA methylation.
Non-covalent associations between proteins and their specific target sites on DNA plays a pivotal role in replication, transcription and replication, where the underlying mechanism involves molecular processes of protein-nucleic acid recognition and protein-protein association. The overall specificity and affinity of proteins to their target DNA involves a balancing act of diverse competing free energy components, including electrostatics, hydrogen binding, ion and water release, and van der Waals contributions (108). Formation of sequence-specific protein-DNA complexes can also be viewed as a melding of direct and indirect readout contributions that adds up to an overall favorable free energy of binding (109). The structure and flexibility of the target DNA (indirect readout) plays a crucial role in determining the binding specificity and affinity of the E2-protein (27). The E2 protein presents a novel class of DNA binding protein that utilizes a novel mechanism of indirect readout in which cations penetrate into the grooves of the bound DNA's minor groove for sequence specific binding and affinity. These cations neutralize the highly electronegative charge density within the minor groove of the spacer DNA resulting from its distortion from a canonical B-helix induced by E2 binding (20, 27, 87). The DNA binding domain of the papillomavirus E2 protein is a prototype of a novel structural class of DNA-binding proteins (20). With the evolutionary and molecular modeling investigations we have been able to extend our investigation to 146 papillomaviruses and found that the 'direct' component of DNA sequence specific binding by the E2 protein appears to have been maintained throughout evolution. Thus, the fine tuning of the affinity, structure and dynamics of the protein-DNA interaction can be attributed to the 'indirect' component of the reaction (26, 27).
The functional properties of the E2 protein are crucial to the viral life cycle via its regulation of gene transcription and DNA replication. As such, the conservation of the E2/D could have therapeutic implications for HPV infection and disease. Papillomavirus vaccines recently approved by the FDA are very effective in lowering viral load and preventing disease (110-113). These vaccines target the L1 protein of HPV16 and 18 that together account for ~ 70% of infections that lead to malignancies. Since the L1 targeted vaccines are type specific, they do not offer protection against the remaining viral infections that can cause malignancy. The E2 protein has been tested as a possible vaccine target in rabbits; this test vaccine lowers viral load and reduces tumor size (114). In one report, the immune responses to the E2, E6 and E7 proteins is impaired (115); these patients cannot mount a T-cell response to these antigens and are at a higher risk for disease progression (115). In contrast, specific antibodies have been raised against a HPV16 E2-DNA complex (116) showing that the human E2 protein is immunogenic. Since the structural properties of the 'high risk' human papillomavirus E2 proteins are highly conserved and immunogenic, the E2 protein should be evaluated as a vaccine candidate for prophylactic protection against a broad spectrum of HPV types. 8. REFERNCES
Footnotes: Methods used in studies not previously published are described in the legends to the figures, 2 There is some ambiguity regarding the structure of DNA containing the TTAA sequence; cyclization kinetics analysis indicates that an E2 binding-site containing this spacer sequence is flexible but not curved (29). Abbreviations: HPV: human papillomavirus; E2/D: HPV E2 DNA binding domain; ORF: open reading frame; LCR: long control region; RMSD: root means square deviation; E: European; As: Asian; AA: Asian American; Af1: African-1; Af2: African-2; NE: non European Key Words: Papillomavirus, DNA, Protein-DNA interactions, Electrostatics, E2, Review Send correspondence to: Michael Brenowitz, Department of Biochemistry, Albert Einstein College of Medicine, 1300 Morris Park Avenue, Bronx, NY 10461, Tel: 718 430-3179, Fax: 718 430-8585, E-mail:brenowit@aecom.yu.edu |