[Frontiers in Bioscience E3, 453-462, January 1, 2011] |
|
|
HIV-1 integration site preferences in pluripotent cells Janet L. Markman1, Robert M. Silvers1, Abibatou M.N. Ndoye1, Kyla R. Geary1, David Alvarado1, Johanna A. Smith1, Rene Daniel1,2,3
1 TABLE OF CONTENTS
1. ABSTRACT HIV-1-based vectors are widely used in gene therapy. In somatic cells, these vectors mainly integrate within genes. However, no distinct integration site preferences have been observed with regard to large chromosomal regions. The recent emergence of induced pluripotent stem (iPS) cells, similar to embryonic stem (ES) cells, has raised questions about where integration occurs in these cells. In this work we investigated the integration site preferences of HIV-1-based vectors in a pluripotent, ES-like cell line. We show that approximately 30% of the integrations occur in the vicinity of telomeres. We have analyzed integration sites in various somatic cells, as reported by us and other groups, and observed that this integration pattern is unique to the analyzed pluripotent cell line. We conclude that pluripotent cells may contain distinct cellular cofactors that participate in integration targeting and that are not present in somatic cells. 2. INTRODUCTION Integration of viral DNA into host DNA is an essential step in the replication cycle of retroviral vectors (1-3). Gene therapy trials currently use three classes of retroviral vectors. These are MLV (murine leukemia virus)-based, HIV (human immunodeficiency virus)-based, and ASV (avian sarcoma virus)-based vectors. Currently gene therapy trials most commonly utilize MLV-based vectors. This is so because MLV-based vectors are thoroughly characterized, as their genomes are relatively simple. However, HIV-based vectors made tremendous gains in recent years and are slowly replacing MLV-based vectors in widespread use. This is mainly due to the development of simple packaging systems and the removal of all HIV-1- genes which are not necessary for gene transfer (4). In addition, these HIV-based vectors possess a crucial advantage over the MLV-based vectors: unlike the latter, they can efficiently integrate in the genome of nondividing cells (5, 6). HIV-based vectors are thus more versatile than those constructed from MLV. Finally, some laboratories utilize ASV-based vectors (7, 8). However, these are not yet fully characterized and also integrate in nondividing cells less efficiently than HIV-based vectors (9). It has long been observed that retroviral vectors of all classes can integrate at virtually any site of the human genome (1-3). This is due to nonspecific binding of cellular DNA by retroviral integrases (1-3). However, integration site selection by retroviral vectors is not a random process and the MLV-based vectors were shown to preferentially integrate in promoter regions (10, 11). This unfortunate fact is also responsible for the disastrous outcome of a gene therapy trial, where an MLV-based vector integrated in the vicinity of a protooncogene, and activated its expression (12). In contrast, HIV-1-based vectors prefer to integrate in genes, but do not specifically prefer the promoter region (13). Finally, ASV-based vectors show a weak preference for genes (40% of integrations occur in genes, as opposed to 33%, which would be a result of random integration, (10, 14)). The above described integration site preferences were found in all cell types examined to date. So far, it is not known what factors drive integration site selection, with an exception of the HIV-1-based vectors, where integration into genes is directed by the cellular protein LEDGF/p75, which binds to the integrase protein and tethers it to chromatin (15-20). However, additional factors likely exist, which control integration site selection. The above-cited studies focused on integration in somatic cells which originate from stem cells. However, stem cells possess unique characteristics that distinguish them from somatic cells. Stem cells are undifferentiated cells that are capable of self-renewal and differentiation. Stem cells compose only a small minority of a tissue's cells. A subspecies of stem cells are embryonic stem (ES) cells. They are pluripotent cells, which can be obtained from early stage embryos (blastocyst) and can differentiate into all three primary germ layers (pluripotency, (21, 22)). ES cells thus represent an important, very early stage of organism development. ES cells are characterized by the expression of embryonic stem cell transcriptional factors (ESTFs). The most important of these are Oct4, Nanog and Sox2. Animal and other studies have suggested that ESTFs are crucial for self-renewal and pluripotency of ES cells (21). This hypothesis was recently confirmed by reprogramming adult somatic cells into induced pluripotent stem (iPS) cells, which possess ES cell properties (23). iPS cells were developed with the long-term objective to gain a new therapeutic tool. Since they are pluripotent, they could hypothetically be used to develop into any desired cell type for the replacement of cells that are lacking in patients, from whose cells the iPS cells originated (e.g. insulin-producing cells for the treatment of diabetes, or replacement cells for treating genetic diseases, such as ADA-SCID and Huntington Disease, (24)). These cells are also attractive targets for gene transfer. We speculated that integration site selection in pluripotent stem cells may have unique characteristics, since the protein content of these cells is different from somatic cells. As a model, we have chosen NCCIT cells, which are derived from a human embryonal carcinoma and have features common to ES and iPS cells, including the ability to differentiate into derivatives of all three germ layers (22, 25, 26). We demonstrate that in these cells, integration by HIV-1-based vectors occurs in a unique pattern, which is characterized by an increased frequency of integration in the vicinity of chromosomal ends. 3. MATERIALS AND METHODS 3.1. Cells NCCIT cells (CRL-2073TM) were purchased from the American Type Culture Collection (ATCC, Manassas, VA) and were maintained in RPMI-1640 medium supplemented with 10% fetal bovine serum, non-essential amino acids, and Pen/Strep. HIV-1 based vectors: Vesicular stomatitis virus (VSV) G-pseudotyped HIV-1 based vectors were prepared as described by Naldini and colleagues (27). 3.2. Infection Viral particles with a titer of 5 x 106 particles per mL were applied to each 60 mm cell culture dish at a multiplicity of infection (MOI) of 5 or 20. The viral particles were supplemented with 10 microgram/mL DEAE dextran to enhance infection. The media was changed after a 16 hour incubation period. Four days post-infection the cultured cells were trypsinized and genomic DNA extraction was performed using the QIAamp® DNA mini kit (cat. #51306; QIAGEN, Hilden, Germany). The genomic DNA was eluted with 200 microliters of AE buffer provided with the kit. Concentration of the genomic DNA sample was determined using ultraviolet (UV) spectroscopy. Approximately 1 microgram of DNA was electrophoresed on a 0.6% agarose gel to determine the overall size and purity of the extracted DNA. 3.3. Construction of Genome WalkerTM (GW) libraries The following protocol is adapted from the Clontech Genome WalkerTM Universal Kit (cat. # 638904); Clontech/TaKaRa Bio, Mountain View, CA). Approximately 3 micrograms of extracted genomic DNA was digested with four blunt-cutting enzymes (DraI, StuI, MscI, and MslI). A total volume of 100 microliters per reaction included 3 micrograms genomic DNA, 80 units of combined restriction enzymes (20 units each), 10 microliters of 10X restriction enzyme buffer 4, and nuclease-free water. The reactions were gently mixed and incubated at 37�C overnight. The following day the digested genomic DNA was extracted and purified via phenol chloroform extraction and ethanol precipitation. The pellets were dissolved in 20 microliters of nuclease-free water. To confirm digestion, 1 microgram of each sample was electrophoresed on a 0.6% agarose gel. Following electrophoresis, 4 microliters of digested, purified genomic DNA, 1.9 microliters of GenomeWalker® Adaptor (25 �M), 1.6 microliters of 10X ligation buffer, and 0.7 microliters of T4 DNA ligase enzyme (6 units/ microliter) were mixed in a 0.2 mL PCR tube and incubated overnight at 16�C in a PCR thermal cycler. The reaction was stopped the following day by incubating the samples at 70�C for five minutes. Subsequently, 72 microliters of TE buffer (10mM Tris, 1 mM EDTA, pH7.5) was added to each reaction and the samples were gently vortexed for 15 seconds. Following ligation, PCR was used to clone integration sites. First-round PCR was run using an outer adaptor primer (AP1 from the GW kit) and a custom primer, GagR2. AP1 is a forward primer located in the adaptor and GagR2 is a reverse primer located immediately downstream of the 3' end of the U5 LTR of the virus. This primer was chosen so the U3 LTR end of the virus or an internal viral fragment would not be cloned. Second-round PCR (nested) was run using an inner adaptor primer (AP2 from the GW kit) and a second custom primer, U3RU2. The U3RU2 primer is a reverse primer located in the viral LTR immediately downstream of the 5' end of the U5 LTR. Nested PCR will increase the integration site copies in number and specificity. The primer sequences are as follows: AP1, 5'-GTA ATA CGA CTC ACT ATA GGG C-3'; GagR2, 5'-TTT TGG CGT ACT CAC CAG TCG-3'; AP2, 5'-ACT ATA GGG CAC GCG TGG T-3'; and U3RU2, 5'-TGA GGG ATC TCT AGT TAC CAG AGT-3'. Each first-round PCR reaction contained the following: 35.8 microliters nuclease-free water, 1 microliter each of AP1 and GagR2 (10 �M) primers, 3 microliters DNA template, 5 microliters 10X PCR buffer, 4 microliters dNTPs (10 mM each), 0.2 microliters TaKaRa Ex Taq DNA polymerase (cat. # RR001A; TaKaRa Bio Company, Madison, WI). The PCR parameters were as follows: a pre-denaturation step for 5 minutes at 94�C, 30 cycles of 94�C for 1 minute, 55�C for 45 seconds, 72�C for 3 minutes, and a final extension step at 72�C for 5 minutes. Second-round PCR was performed using 1 microliter of the first-round PCR products as the DNA template. All components remained the same except the volume of the nuclease-free water was increased to 37.8 microliters to reach a final volume of 50 microliters and the primers were exchanged for AP2 and U3RU2. The cycling parameters remained the same. To confirm the presence of PCR product, 10 microliters of each sample were electrophoresed on a 1.5% agarose gel. 3.4. Cloning using the InvitrogenTM TOPO TA cloning® kit Each cloning reaction contained 4 microliters of PCR product, 1 microliter of TOPO® vector (cat # K452001; Invitrogen, Carlsbad, CA), and 1 microliters salt solution in a 0.2 mL PCR tube. The components were mixed gently and incubated at room temperature for 30 minutes. Two microliters of the TOPO® cloning reaction were added to one vial of thawed OneShot® Chemically Competent E. coli cells provided with the kit. The vial contents were gently mixed and incubated on ice for 20 minutes. Following incubation, the competent cells were subjected to heat shock for 30 seconds in a 42�C water bath and then immediately returned to ice. To each vial, 250 microliters of S.O.C. media was added and the samples were placed in a 37�C shaking incubator for 1 hour. The transformed bacterial cultures were plated on warmed Miller LB agar plates (supplemented with 100 micrograms/mL ampicillin) after 40 microliters of X-gal (40 mg/mL in DMF) was added. After the cultures were spread evenly across the plate, the plates were incubated at 37�C overnight. Non-blue colonies were picked off each plate using a pipette tip and transferred to a 5mL round-bottom tube containing 3 mL of Miller LB broth supplemented with 100 micrograms/mL ampicillin for expansion. These tubes were placed in a 37�C shaking incubator overnight. The bacterial cultures were transferred to 1.7 mL microcentrifuge tubes and were centrifuged at 6800x g to pellet the cells. The supernatant was removed and plasmid DNA extraction was performed on each sample using a QIAGEN QIAprep® Spin Miniprep Kit (cat # 27106; QIAGEN, Hilden, Germany). The samples were eluted with 50 microliters nuclease-free water and the concentration was measured using a UV spectrophotometer. Extracted DNA was sequenced by the Sidney Kimmel Nucleic Acid Facility at the Kimmel Cancer Center at Thomas Jefferson University (Philadelphia, PA). Each sequencing reaction contained 0.4 micrograms of plasmid DNA, 3.2 pmol of M13 forward primer (sequence: 5'-GTA AAA CGA CGG CCA G-3'), and nuclease-free water to 12 microliters. The facility uses a 3730 DNA analyzer and BigDye® Terminator Cycle Sequencing kits (Applied Biosystems, Foster City, CA). 3.5. Integration site analysis Sequences were analyzed using the BLAT program (University of California, Santa Cruz, CA; Human Genome Project Working Draft February 2009 Freeze; http://www.genome.ucsc.edu/cgi-bin/hgBlat). Sequences were considered true integration sites if the genomic portion was positioned between a recognizable adaptor sequence and the correct viral LTR end. Also, the genomic sequence must have matched with ≥ 98% identity, must not have any other high probability matches with almost equal length, and must be present in one specific site in the genome. Length away from the centromere was calculated for each site as follows using the values for the corresponding chromosome. If the integration site was on the p arm: p arm location � entire length of p arm - 1. If the integration site were on the q arm: (q arm location - entire length of p arm) � entire length of q arm. Additional features of our sequences were analyzed as described (28). In addition to the integration site sequences generated in our study, other integration site sequences were analyzed using the above described method. Schroder and colleagues produced the sequences for integration in a T cell line (13) and Shun and colleagues produced sequences for integration in a murine embryonic fibroblast (MEF) cell line and a MEF LEDGF/p75 deficient cell line (19). Integration sites in 293T cells were analyzed by us (28). 3.6. Statistics All statistical analyses were performed using the Fisher one-tailed test. 4. RESULTS Several factors complicate analysis of integration sites in ES-like cells. First, these cells need to be grown on a feeder layer of somatic cells (23). This fact results in an undesirable background due to infection of the feeder layer cells. Second, it has been shown that transduction efficiency of stem cells is low, due to blocks prior to integration (29). In addition, retroviral expression is known to be suppressed in stem cells by epigenetic mechanisms (30-32). NCCIT cells, unlike ES or iPS cells do not need a feeder layer, which simplifies the analysis. However, we have observed that expression of a vector-transduced marker in NCCIT cells is much lower than that in somatic cells (Figure 1). This is probably due to a dual effect of low transduction efficiency and suppressed expression. However, this finding again emphasizes the similarities between NCCIT and other types of pluripotent cells. To overcome the low transduction efficiency, we have infected NCCIT cells at a high m.o.i. (5-20). We have identified 98 integration sites in these cells and analyzed them as described below. We first analyzed the distribution of integration sites among individual chromosomes. As shown in Figure 2, vector integration sites were distributed broadly in the genome, and we did not detect any particular differences when we compared the pattern to that described for the human SupT1 line (13), or to mouse embryonic fibroblasts (MEFs), both LEDGF/p75-proficient and -deficient ((19), Figure 2). We then analyzed the frequency of integration in genes. We found that 80.6% of all integrations occurred in genes. This finding indicates that the LEDGF/p75-dependent targeting into genes still occurs in NCCIT cells. However, during the analysis, we observed a high frequency of integration events in the vicinity of chromosomal ends (Figure 3). As shown in Figure 3, we found that almost 30% of all integration sites were present in the 1/10 of the chromosome that is closest to the telomere. To determine if this feature can be found in other cell types, we have analyzed integration sites in the SupT1 line (13). We found that only 14.9% of integration sites are present in this chromosomal segment (Figure 4). The difference was statistically significant, with p=0.005 (Fisher one-tailed test). One possible explanation for the differences is that these could be due to the difference in methods used to clone integration sites. Although both labs employed LM-PCR, differences in primers or enzymes used to digest genomic PCR could affect the results. Thus, we compared NCCIT integration sites to those we cloned from 293T cells, using identical methodology. Figure 5 shows that the integration pattern in 293T cells resembled that of SupT1 cells (Figure 4) and was still different from NCCIT cells. The difference in the fraction of integration sites in the last 1/10 of the chromosome between NCCIT and 293T cells was again statistically significant (p=0.0489). We observed that in NCCIT cells, chromosomal ends appear to be preferred targets for integration of HIV-1-based vectors, whereas in somatic cells integration sites appear to be distributed evenly along the whole length of the chromosomes (see above). As noted above, one of the major factors controlling HIV-1 integration site selection is LEDGF/p75. One possible explanation for our data is that a cellular factor, present in somatic cells, but missing in stem cells, targets integration away from chromosomal ends in somatic cells by binding and controlling cellular distribution of LEDGF/p75. If so, a loss of LEDGF/p75 in somatic cells should result in redistribution of integration sites toward chromosomal ends. To test this hypothesis, we examined integration sites in LEDGF/p75-deficient MEFs that were presented in Shun et al. As shown in Figure 6, LEDGF/p75 deficiency resulted in a redistribution of integration sites, with some increase in frequency of integration in the last 1/10 of chromosomes. Since this finding appeared to be consistent with our hypothesis, we have further analyzed the distribution of integration sites, and examined integration sites in p and q arms separately. We observed that in LEDGF/p75-deficient MEFs, HIV-1-based vectors integrate preferentially near p arm ends, whereas in NCCIT cells, near the ends of q arms (Figure 7). Thus, we conclude that our results do not support the original hypothesis, and the integration pattern in NCCIT cells is not likely due to a LEDGF/p75-binding cellular factor that is missing in these cells. Finally, we examined the distribution of integration sites in NCCIT cells with respect to sequence features of human DNA (Table 1). First, we examined the frequency of integration events in long interspersed nuclear elements (LINEs), which constitute a high fraction of the human genome. Next, we examined integrations in short interspersed nuclear elements (SINEs), and LTR repeat elements (human endogenous retroviruses, HERVs). We also compared the frequency of integration in these sequences with results reported by Schroder et al., who first reported the preferences of HIV-1 and HIV-1-based vectors for integration in other elements of the genome (13). Our data indicate that the fraction of integration sites in SINEs and LINEs of NCCIT cells largely corresponded to the frequency of integration in these elements in somatic cells (Table 1). We also noted a high number of integration sites in Alu elements (a type of SINE), which occur frequently in genes, consistent with the published results for somatic cells (13). Finally, we have analyzed the frequency of integration in the vicinity of CpG islands and upstream and downstream of genes (Table 2). Again we did not observe significant differences between NCCIT and somatic cells. 5. DISCUSSION In this study, we demonstrated that in a model pluripotent stem cell line, integration sites of HIV-1-based vectors are distributed in a unique pattern, with a plurality of integration events occurring near chromosomal ends. We further demonstrate that integration still occurs predominantly in genes, indicating that LEDGF/p75 still plays a major role in targeting in this cell type. Integration site selection by retroviral vectors, including HIV-1-based, has recently attracted tremendous attention, due to adverse events occurring in clinical trials (12, 33). Thus, it is necessary to identify factors that influence integration site selection, with the objective of eventually controlling where vector DNA integrates. Our results show that in addition to LEDGF/p75, there are unknown factors that influence targeting toward large chromosomal segments, irrespective of whether integration occurs in genes. What could be the reason for the observed accumulation of integration sites in the vicinity of chromosomal ends in pluripotent cells? These cells are defined by the expression of stem cell transcription factors (see the Introduction), but they possess other unique features that distinguish them from somatic cells. These features include widespread DNA demethylation, a lack of lamin A/C and expression of other, stem cell specific genes (21). Moreover, mutations in lamin A/C were associated with the relocalization of chromosomes within cells, with telomeres moving into the periphery of the nucleus (34). It will be intriguing to determine which of these factors affects integration site selection in these cells. We note that understanding how integration site selection occurs is necessary to achieve targeting of retroviral vectors to predetermined, "safe" chromosomal regions, away from genes. This will lead to increasing the safety of clinical gene therapy trials. Finally, we note that our data indicate that there are differences in integration targeting between somatic and pluripotent cells. This is a crucial observation, since a recently developed type of pluripotent cells, iPS cells, is an attractive target for gene therapy and gene transfer. It is thus possible that newly developed methods of integration targeting will have to be tailored to individual cell types, which are the subject of gene transfer. 6. ACKNOWLEDGEMENTS This work has been supported by NIH grants CA125272 and CA135214 to R.D. No competing financial interests exist. 7. REFERENCES 1. Coffin, J. M., Hughes, S. H., and Varmus, H. E., Retroviruses, Cold Spring Harbor Laboratory Press, Plainview, NY (1997) Key Words: Lentivirus, HIV-1-Based Vector, Integration Preferences, Integration Site Selection, Pluripotent Cells, Stem Cells Send correspondence to: Rene Daniel, Division of Infectious Diseases - Center for Human Virology, Thomas Jefferson University, Philadelphia, Tel: 215-503-5725, Fax: 215-923-1956, E-mail:Rene.Daniel@jefferson.edu |