[Frontiers in Bioscience 14, 1292-1303, January 1, 2009] |
|
|
Free-energy landscapes of proteins in solution by generalized-ensemble simulations Yuji Sugita1,2
1 TABLE OF CONTENTS
1. ABSTRACT Free-energy landscapes of proteins in solution are essential for understanding molecular mechanism of protein folding, stability, and dynamics. Because of the multiple-minima problem (or quasi-ergodicity problem), the conventional molecular dynamics or Monte Carlo methods cannot provide the landscapes accurately at low temperatures. By contrast, the simulations based on the generalized-ensemble algorithms can sample wider conformational spaces than the conventional approaches, thereby providing better free-energy landscapes of proteins at low temperatures. In this article, we review two well-known generalized-ensemble algorithms, namely, multicanonical algorithm and replica-exchange method, and then introduce further extensions of the above two methods, which are applicable to larger systems with rugged energy landscapes. These simulation methods have been applied to the protein folding simulations of the C-peptide in ribonuclease A with explicit solvent. We also demonstrate how the methods and the free-energy landscapes of proteins are useful for the biological research, by showing the simulation results on the phospholamban, a reversible regulator of sarco(end)plasmic reticulum Ca2+-pump. 2. INTRODUCTION Understanding molecular mechanisms of protein folding and stability is one of the central issues in molecular biology. Folding pathways of proteins have been explored with the experimentally detected intermediate states between the native and denatured states. Recent advances in experimental studies allow us to characterize the structures of these intermediate states, including the transition states in protein-folding reactions (1-3). In theoretical studies, a different view on protein folding has appeared recently (4, 5). In this view, free-energy landscapes of proteins play essential roles for understanding the folding kinetics: rapid folding of proteins occurs in smooth free-energy landscapes, whereas proteins with rough free-energy landscapes fold more slowly, causing kinetic traps during the folding processes. This view has been derived from the free-energy landscapes of simple lattice models for proteins and summarized to the principle of minimum frustration in protein structures by Wolynes and co-workers (4). Note that this principle is essentially the same as the consistency principle in proteins proposed by Go in which he claimed that maximum consistency among various interactions is realized in the native structure (6). The free-energy landscapes of proteins from the simulations based on the all-atom representation of proteins and solvent molecules may be the most accurate ones, because all the interactions acting on the systems are taken into account in the simulations (7). However, due to the ruggedness of the landscapes at low temperatures, it is not straightforward to obtain them accurately by conventional molecular dynamics (MD) or Monte Carlo (MC) simulations. Unless the computational time is sufficiently long, simulations of proteins at low temperatures tend to get trapped at one of the local-energy-minimum states and can not sample the rest of the local-energy-minimum states. In most cases, it is difficult to obtain the global-energy-minimum state, which corresponds to the native structures of proteins, by the conventional simulation methods. To avoid the difficulty, we require much more powerful conformational sampling algorithms for proteins or other systems with rugged energy landscapes. Generalized-ensemble algorithms have been developed for spin systems with strong frustration (8-11) and have been used for biological systems recently (12). The algorithms have two advantages over the conventional simulation methods. First, generalized-ensemble simulations can avoid trapping at one of the local-energy-minimum states by performing a random walk in the potential-energy space, implying that the simulations can sample wider conformational spaces than the conventional simulations. The random walk in the potential-energy space can be achieved by non-Boltzmann weight factor used in the algorithms. Secondly, we can provide canonical-ensemble averages at any temperatures precisely just from one long production run of the generalized-ensemble simulation, by combining the histogram-reweighting techniques (13-15). Over the last 15 years, various new algorithms that can be categorized in the generalized-ensemble algorithms have been developed by different researchers (12). Of these, multicanonical algorithm (MUCA) (8, 9) and replica-exchange method (REM) (10) may be two of the most widely used algorithms. MUCA and REM have been shown to be very powerful in the simulations of small proteins or polypeptides in gas phase or in implicit solvent (16-20). However, the energy ranges that should be covered in these methods become very wide in larger biological systems, implying that a random walk in the potential-energy space may be difficult to achieve during the simulations. To solve the problem, we combined the merits of MUCA and REM, and developed new methods, namely, replica-exchange multicanonical algorithm (REMUCA) and multicanonical replica-exchange method (MUCAREM) (21-23). In this article, we first review the algorithms of MUCA and REM, and then, introduce the methods of REMUCA and MUCAREM. Finally, we demonstrate two recent applications of protein folding and stability by the generalized-ensemble simulations. In the first application, we discuss about the folding mechanism of the C-peptide in ribonuclease A in explicit solvent by using the free-energy landscape of the peptide (24). In the second application, we show the effect of phosphorylation on phospholamban (PLN), a reversible regulator of sarco(end)plasmic reticulum Ca2+-pump (SERCA), by comparing the free-energy landscapes of unphosphorylated (PLN) and phosphorylated PLN (pPLN) at room temperatures (25). These two applications indicate that the free-energy landscapes of proteins in explicit solvent by the generalized-ensemble simulations should have significant contributions for the research on biological sciences. 3. GENERALIZED-ENSEMBLE ALGORITHMS 3.1. Multicanonical algorithm Multicanonical algorithm (MUCA) has been first developed by Berg et al. for the simulations of spin glasses (8, 9). In the algorithm, the multicanonical ensemble is based on the non-Boltzmann weight factor (multicanonical weight factor), < < where < < The flat distribution implies that a free random walk in the potential-energy space is realized in the ensemble, so that the simulations based on MUCA can escape from any local-minimum-energy states. Thus, the simulations based on the algorithm can sample wider conformational spaces than the conventional simulations. MC or MD simulations based on MUCA can be performed just by replacing the total potential energy, < < where < In MD simulations, constant-temperature MD is performed by solving the modified Newton equation (26, 27): < where < In most cases, the density of states or multicanonical weight factor for the system is not a priori known, implying that we need the way to estimate the factor. In practice, < < where the best estimate of the density of states is given by the single histogram reweighting techniques (13): < 3.2. Replica-exchange method Replica-exchange method (REM) has been developed for the simulations of spin systems by Hukushima et al. (10). Swendsen and co-workers independently developed a closely related method (28). Similar method, in which the same equation is used but emphasis is laid on optimization, has also been developed (29). Because of the history, REM is also referred to as multiple-Markov chain method (30) or parallel-tempering method (PT) (11). In REM, a generalized-ensemble consists of M non-interacting copies of the original system (or replicas). Each replica has its own temperature, < Step 1: each replica in canonical ensemble at the constant temperature is simulated for a certain MD or MC steps. Step 2: a pair of replicas, for instance, i and j, which are at the neighboring temperatures < < where < Here, < In MD version of REM (REMD), we also have to deal with the momenta of the systems. We have shown that rescaling the momenta uniformly by the square root of the ratio of < Like MUCA, the canonical-ensemble average at any temperatures can be obtained from the simulations based on REM by using the equation (1.6). Here, the best estimate of the density of states is given by the multiple-histogram reweighting techniques (14, 15): < where< 3.3. Extensions of MUCA and REM These two algorithms work very well for small systems, such as small proteins or polypeptides in gas phase or in implicit solvent (16-20). However, both algorithms have their own weak points when they are applied to larger systems with rough free-energy landscapes. In MUCA, obtaining the optimal weight factor may be a very difficult and tedious task. In most cases, a large number of iterations are required to refine and update the weight factor before the production run. Wang-Landau method (31) is one of the efficient methods for determining the multicanonical weight factor. In contrast, no such effort is required in REM, because the weight factor is automatically determined in the algorithm. However, number of replicas that is required to keep sufficient probabilities of replica-exchanges increases rapidly as the system size becomes large. It implies that the simulations based on REM become very expensive for large systems, such as protein systems in explicit solvent. To overcome the difficulties, we have developed two new methods, namely, replica-exchange multicanonical algorithm (REMUCA) and multicanonical replica-exchange method (MUCAREM) (22, 23). They are hybrid approaches of MUCA and REM. In REMUCA, the multicanonical weight factor, < < and < where < Note that we need to evaluate the multicanonical potential energy, < In MUCAREM, a random walk in the potential-energy space is realized not only by the non-Boltzmann weight factor but also by the replica-exchange processes. Because the latter is a global update for the structures between the neighboring replicas, sampling efficiency of MUCAREM may be much improved compared with the original MUCA and REM. We have used MUCAREM in the simulation of G-peptide, a 16-residue polypeptide of the C-terminal end of streptococcal protein G B1 domain, in explicit solvent (34-36). In the simulation, only 8 replicas were required to cover the energy range corresponding to the temperatures between 275 K and 689 K, whereas at least 64 replicas were necessary for the original REM in the same system. 4. FOLDING SIMULATIONS OF THE C-PEPTIDES IN SOLUTION Here, we show the efficiency of the new generalized-ensemble algorithms in protein-folding studies. Molecular mechanism of folding and stability for the C-peptide in ribonuclease A has been studied both by experimental and theoretical researchers. Circular Dichroism (CD) (37, 38) and nuclear magnetic resonance (NMR) (39) experiments have shown that the C-peptide analogs have a partially distorted alpha-helix structure at room temperatures. In their studies, two side-chain interactions have been pointed out to be essential for the alpha-helical structure: one is a salt-bridge interaction between Glu2 and Arg10, the other is an aromatic-aromatic interaction between Phe8 and His12 (37, 40). At neutral pH, side chain of Glu is negatively charged, whereas that of Arg is positive, suggesting that Glu2 and Arg10 may form a stable salt bridge by electrostatic interactions. In theoretical studies, the alpha-helical structure has been obtained from the random-coil structure or fully extended structure by simulated annealing in gas phase (41), or the simulations based on the generalized-ensemble algorithms (19, 22). However, the roles of the side-chain interactions pointed out by the experiments have not been fully examined. We performed all-atom MD simulations of the C-peptide in explicit solvents (24) and then, analyzed the free-energy landscapes of the peptide at various temperatures. The simulations were based on REMUCA (21). The peptide analog comprises 13 amino-acid residues: Ala-Glu-Thr-Ala-Ala-Ala-Lys-Phe-Leu-Arg-Ala-His-Ala, in which the N and C-termini were blocked with the acetyl group and the N-methyl group, respectively. The initial configuration of the peptide was a fully extended conformation and was simulated at 1000 K in gas phase. Then, one of the structures in the simulation was solvated with 1387 water molecules in a sphere of radius 22 Å. AMBER parm99 (42, 43) and TIP3P model (44) were used for the force-field parameters of proteins and water molecules. We first performed a REMD simulation with 32 replicas (REMD1) for 100 ps per replica, covering the temperature range from 250 K to 700 K. During this simulation, replica-changes were attempted every 200 MD steps. The potential-energy distributions were used as input data to the multiple-histogram analysis (14, 15) and then, we obtained the first estimate of< To construct the free-energy landscapes of the C-peptide at various temperatures, we performed principal component analysis (PCA), which is one of the standard methods for studying conformational dynamics or fluctuations of proteins (45-47). We used the first two principal component (PC) axes as reaction coordinates, because the first and second PCs contributed significantly to the total root-mean-square fluctuations of the proteins (22.8 and 17.0 %, respectively). To obtain the landscapes at different temperatures, we employed histogram-reweighting techniques and calculated the potentials of mean force along the first two PCs: < where PC1 and PC2 are the first two PC values. < To characterize each LM, we focused on the conformations whose total potential energy was lower than -12,500 kcal/mol (including the contributions from water molecules). -12,500 kcal/mol was chosen, because it was the highest energy value that can be reached for the energy distribution at 300 K in Figure 1c. Figure 3 indicated the different characters of the LMs. In LM2, the peptide contained the salt-bridge interaction between Glu2 and Arg10, whereas two aromatic residues, Phe8 and His12, were separated. In contrast, in LM3, the two aromatic residues made contacts, whereas the salt-interaction was not observed. In LM1, which corresponds to the global free-energy-minimum state (GM), both interactions were observed. The structures in GM agreed with the X-ray structure of the C-peptide, which has a partially distorted alpha-helix with these two side-chain interactions. The two side-chain interactions in the protein have quite different characteristics: the aromatic-aromatic interaction between Phe8 and His12 is a short-range interaction mainly caused by hydrophobic forces, whereas the salt interaction between Glu2 and Arg10 occurs due to the long-range electrostatic interactions. In LM2 and LM3, the C-peptide is likely trapped at collapsed local-energy-minimum states, where one of the major stabilizing factors in the native structure is lost. Thus, the simulation reveals a funnel-like shape of the free-energy landscape of the C-peptide, where all the important interactions consistently exist only in GM. This may be the mechanism how such a short peptide can fold into a stable conformation rather than a random-coil state in solution. To understand the role of each side-chain interaction on the folding reaction of the C-peptide, the free-energy landscape is useful. In Figure 3, the green points (salt-bridge interaction) and the blue points (alpha-helix formation) do not overlap except for the region around GM. In contrast, the red points (aromatic-aromatic interaction) overlap significantly with the blue points, suggesting that the formation of alpha-helix is strongly correlated with the aromatic-aromatic interaction between Phe8 and His12. The salt interaction between Glu2 and Arg10 may contribute to prevent the structure from expanding to the N-terminus. Although the simulation trajectories of the generalized-ensemble simulations do not provide the kinetic information directly, careful analysis on the free-energy landscape can provide the information on how proteins fold from the random-coil or the denatured states. 5. CONFORMATIONAL CHANGES OF PHOSPHOLAMBAN BY PHOSPHORYLATION AT SER16 Phospholamban (PLN) is a 52-residue integral membrane protein that regulates the activity of the sarco(endo)plasmic reticulum calcium pump (SERCA) in cardiac muscle (48). Its inhibitory action is relieved when PLN is phosphorylated at Ser16 by cAMP-dependent protein kinase or at Thr17 by Ca2+/calmodulin-dependent kinase (49, 50). However, molecular mechanism of how the phosphorylation can change the inhibitory action remained to be answered. To understand the mechanism, we studied the effect of phosphorylation on the structure of PLN. We performed REMD simulations of the unphosphorylated form of PLN (PLN) and phosphorylated form of PLN (pPLN) in solution to explore the free-energy landscapes of the two forms (25). In the simulations, we did not include the transmembrane region of PLN or pPLN, because the phosphorylation site (Ser16) is far from the transmembrane region of PLN. Thus, we used the first 20 amino-acid residues of PLN in the simulations, whose amino-acid sequence is Met-Glu-Lys-Val-Gln-Tyr-Leu-Thr-Arg-Ser-Ala-Ile-Arg-Arg-Ala-Ser-Thr-Ile-Glu-Met. The C-termini of the protein was blocked with an N-methyl group. The starting structure of PLN was taken from the corresponding region of the first conformer of the NMR structures reported by Zamoon et al. (51). For pPLN, the starting structure was prepared by simply changing the side chain of Ser16 to the phosphorylated form. Since the phosphate is negatively charged at neutral pH, the phosphorylated Ser16 can form salt bridges with arginine residues in the peptide. PLN and pPLN were solvated in a sphere with a radius of 30 Å, in which 8 K+, 11 Cl- and 3405 water molecules, and 9 K+, 10 Cl- and 3399 water molecules were included, respectively. CHARMM27 CMAP potential function (52, 53) was used for proteins, and the TIP3P model (44) was used for water molecules. Other simulation details were written in the original paper (25). We used sixty replicas with different temperatures from 290 K to 653 K. In the simulations of PLN and pPLN, we performed 8 ns and 10 ns of MD simulations for each replica, respectively, and attempted replica-exchanges every 100 steps. The total simulation lengths were 480 ns for PLN and 600 ns for pPLN. The coordinates of PLN or pPLN saved every 500 steps for the final 6 ns were used to construct the free-energy landscapes. To characterize the landscapes at 310 K, we used the first (PC1) and third (PC3) axes, because different local-energy-minimum states appeared more clearly in the PC1-PC3 plane than in the PC1-PC2 plane. Contributions to the mean square fluctuation from the PC2 and PC3 were similar (8.6 and 8.4 %, respectively). In the free-energy landscape of PLN (Figure 4), three major clusters were found. In cluster I, PLN consisted of a long alpha-helix from Glu2 and Glu19, whereas the C-terminal part of the alpha-helix was unwound in clusters II (between Thr17 and Glu19) and III (between Ile12 and Glu19), suggesting that the structural ensemble of PLN in solution contains the conformations similar to the atomic model of PLN bound to SERCA (54). In contrast, the free-energy landscape of pPLN exhibited a broader distribution comprising eight clusters, which were further classified into three groups. In the first group, pPLN consisted of a long alpha-helix from Lys3 to pSer16, and contained, at most, a single salt-bridge between pSer16 and Arg13. In the second group, two salt-bridges were formed between pSer16, and two arginines (Arg9 and Arg13). Due to the two salt bridges, the C-terminal part of the alpha-helix was unwound or divided into two short helices. The third group contained disordered conformations, in which a salt bridge was formed between pSer16 and Arg14. Thus, phosphorylation of PLN at Ser16 likely form salt-bridges with arginines and increased the number of conformational substates by distorting the alpha-helix. The effect of phosphorylation at Ser16 of PLN has been extensively studied by experimentalists. However, nuclear magnetic resonance (NMR) (55) and fluorescence resonance electron transfer (FRET) (56, 57) did not provide a consistent picture of the effect of phosphorylation. NMR studies of a mutant of PLN (AFA-PLN) (55) showed that the phosphorylation unwinds the C-terminal part of the cytoplasmic alpha-helix and shifts the conformational equilibrium toward disordered states. The authors suggested that a stable salt bridge involving pSer16 is unlikely. In contrast, FRET studies suggested that pPLN takes more compact helical conformation stabilized by a salt bridge between pSer16 and Arg13 (56, 57). One of the major purposes of the simulation study is to explain this experimental inconsistency. The simulation results on PLN and pPLN agree with the NMR measurements (55), because the phosphorylation increases the degree of conformational disorder as shown in the free-energy landscapes of PLN and pPLN (Figure 4). Average alpha-helicity of each residue in PLN and pPLN at 310 K in Figure 5 represents the degree of disorder more quantitatively. Although a direct comparison between FRET and the simulations is not possible, Figure 6 shows approximately 2 Å decrease in the C-distance between Tyr6 and Met20 upon the phosphorylation at Ser16 of PLN. It is quantitatively consistent with the results of FRET (56, 57), in which the distance between Tyr6 and a fluorophore, N-(1-pyrenylmaleimide), attached to Cys24 decreased by 3 Å by the phosphorylation. The decrease was initially interpreted as an elongation of the cytoplasmic alpha-helix (56, 57). However, the free-energy landscapes of PLN and pPLN indicated that the decrease is due to the conformational disorder of alpha-helix upon the phosphorylation at Ser16. Thus, two experimental measurements and the simulation results are consistent with each other. Then, how does the phosphorylation relieve the inhibitory effect of PLN on SERCA in cardiac muscle? The current simulation indicated that the phosphorylation of PLN at Ser16 unwinds the cytoplasmic helix by forming salt bridges between pSer16 and three arginines. Based on the structural model (54), the crosslink between Lys3 in PLN and Lys397 or Lys400 in SERCA is possible only if the cytoplasmic domain takes an alpha-helical structure. These suggest that the conformational disorder upon the phosphorylation of PLN at Ser16 inhibits a stable binding of PLN to SERCA. To see the validity of the model, we should study the effect of phosphorylation of PLN bound to SERCA. 6. PERSPECTIVE Recently, folding simulations of small polypeptides or proteins have been performed extensively by using the generalized-ensemble algorithms. In particular, REMD simulations with implicit solvent models have been the most popular approach, because their computational cost is much smaller than the simulations with explicit solvent. Although implicit-solvent simulations have provided new insight on protein folding, the detailed description of the side-chain interactions during protein-folding processes may be difficult with the model. For instance, the implicit solvent model cannot describe the side-chain interactions that are bridged by a water molecule. In contrast, the simulations with explicit solvent can provide atomically detailed information on the side-chain interactions, as shown here. Another important issue on protein folding and stability is the effect of salt ions on the conformational stability of proteins. In the simulations of PLN and pPLN, we added K+ and Cl- in the solvent sphere to mimic the experimental environment. However, due to the small size of the simulation system, we cannot study the details of the salt effect on the protein. In the near future, we may extend the size of the solvent sphere and study the salt effect on the protein stability in an atomic resolution. In such simulations, MUCAREM or REMUCA should be useful, because the number of replicas for a larger simulation system is still very small compared with the original REMD. The multi-dimensional replica-exchange method (MREM) (32), in which pre-defined reaction coordinates as well as temperatures are attempted to exchange during the simulations, (it is also referred to as Hamiltonian REM (33)) may be another candidate that are applicable to the large simulation systems. In MREM, a careful choice of the reaction coordinate is required before the production dynamics. Toward better description of the free-energy landscapes of proteins, accuracy of the force-field parameters should also be essential. Yoda et al, have compared the structural tendency of six commonly used force-field parameters, by performing the generalized-ensemble simulations of C-peptide and G-peptide in explicit solvent (34, 36). The results indicated that alpha-helixis favored for AMBER94 (42) and AMBER99 (42, 43) and that beta-hairpin is favored for GROMOS96 (58), whereas CHARMM22 (52), AMBER96 (59), and OPLSAA/L (60, 61) have intermediate tendency. This result indicates that we should use different force fields depending on the folds (alpha, beta, alpha/beta, etc.) of the target proteins in their folding simulations from the first principles (from a random initial conformation). Therefore, further improvements on the force fields are necessary for studying the free-energy landscapes of protein folding or large-scale conformational changes in proteins. 7. ACKNOWLEDGEMENTS I am grateful to all co-workers contributing to the studies described here. In particular, I would like to mention the following three people. Developments of the new generalized-ensemble algorithms (REMD, MUCAREM, and REMUCA) have been done with Prof. Y. Okamoto in Nagoya University. Dr. T. Yoda in Nagahama Bio University has made a lot of contributions to the folding simulations of C-peptide and G-peptide in explicit solvents. Prof. C. Toyoshima in the university of Tokyo invited me to the studies of Ca2+-pump and PLN and provided biological insight to the simulation studies. These works were supported in part by a grant from the Ministry of Education, Culture, Sports, Science and Technology of Japan, and a grant from the Japan Society for the Promotion of Science. 8. REFERENCES 1. Dobson, C. M., A. Sali and M. Karplus: Protein folding: A perspective from theory and experiment. Angewandte Chemie-International Edition 37, 868-893 (1998) Abbreviations: CD: circular dichroism, FRET: fluorescence resonance electron transfer, GM: global minimum energy state, LM: local minimum energy state, MD: molecular dynamics, MC: Monte Carlo, MREM: multi-dimensional replica-exchange method, MUCA: multicanonical algorithm, MUCAREM: multicanonical replica-exchange method, NMR: nucleic magnetic resonance, REM: replica-exchange method, REMD: replica-exchange molecular dynamics, REMUCA: replica-exchange multicanonical algorithm, SERCA: sarco(end)plasmic reticulum Ca2+-pump, PLN, phospholamban, PC: principal component, PT: parallel tempering. Key Words: Generalized-ensemble Algorithms, Multicanonical Algorithm, Replica-Exchange Method, Replica-Exchange Multicanonical Method, Multicanonical Replica-Exchange Method, Principal Component Analysis, Histogram Reweighting Techniques, Principal Component Analysis, C-peptide in ribonuclease A, G-peptide in protein G B1 Domain, Phospholamban, Sarco(Endo)Plasmic Reticulum Calcium ATPase, Nuclear Magnetic Resonance, Fluorescence Resonance Energy Transfer, Review Send correspondence to: Yuji Sugita, Advanced Science Institute, RIKEN, 2-1 Hirosawa, Wako, Saitama, 351-0198, Japan, Tel: 81-48-462-1407, Fax: 81-48-467-4532, E-mail:sugita@riken.jp |