[Frontiers in Bioscience E5, 814-822, June 1, 2013]

Massive microRNA sequence conservation and prevalence in human and chimpanzee introns

Aubrey E. Hill1, Eric J. Sorscher2,3

1Department of Computer Science, University of Alabama at Birmingham,1530 3rd Avenue South, MCLM 796, Birmingham, AL 35294-0005, USA 2Department of Medicine, University of Alabama at Birmingham,1530 3rd Avenue South, MCLM 796, Birmingham, AL 35294-0005, USA, Gregory Fleming James Cystic Fibrosis Center, University of Alabama at Birmingham, 1530 3rd Avenue South, MCLM 796, Birmingham, AL 35294-0005, USA

TABLE OF CONTENTS

1. Abstract
2. Introduction
3. Materials and methods
4. Results
4.1. Genome-wide analysis of DNA hairpin sequences
4.2. Genome-wide mature microRNA analysis
4.3. Analysis of microRNA-related hairpins in the human and chimpanzee genomes
5. Discussion
6. Acknowledgements
7. References

1. ABSTRACT

Human and chimpanzee introns contain numerous sequences strongly related to known microRNA hairpin structures. The relative frequency is precisely maintained across all chromosomes, suggesting the possible co-evolution of gene networks dependent upon microRNA regulation and with origins corresponding to the advent of primate transposable elements (TEs). While the motifs are known to be derived from transposable elements, the most common are far more numerous than expected from the number of TEs and their paralogous sequences, and exhibit striking conservation in comparison to the surrounding TE sequence context. Several of these motifs also exhibit structural complimentarity to each other, suggesting a pairing function at the level of DNA or RNA. These "pseudomicroRNAs," in semblance to pseudogenes, include hundreds of thousands of vestigial paralogs of primate microRNAs, many of which may have functioned historically or remain active today.

2. INTRODUCTION

Mattick (1-4) and colleagues have postulated that introns constitute the basis for genetic regulatory networks. We previously demonstrated that complete intronic sequences transfected into human cells result in intron-specific gene expression patterns (5). However, expression of conventional microRNAs, small 22 nt sequences classically cleaved by Drosha from larger, hairpin-containing DNA motifs, could not account for these earlier observations. MicroRNA expression clearly contributes to gene network organization, but the extent and impact of microRNAs on mRNA expression, primate development and evolution remain to be determined. For example, environmental perturbations such as hypoxia dramatically alter genome-wide microRNA profiles, but the changes show limited concordance (and poorly predict) overall shifts in the mRNA transcriptome (6).

Rodriguez et al (7) noted that among a well described cohort of 232 mammalian microRNAs, more than half were found in introns. These same investigators observed that expression of many mammalian microRNAs may be directly linked to transcription of the surrounding DNA sequences (both protein-coding and noncoding RNAs). Others (8) have suggested that introns are the result of the propagation of replicative transposons or retrotransposons and that introns are, in fact, "broken" transposons. This latter assertion is compatible with the finding that 60% of transposable elements in both human and mouse are found in introns (9), and that a number of human microRNAs have originated from transposable elements (10). However, the bioinformatic connections between intronic DNA, the prevalence of microRNA hairpin structures in both human and nonhuman primates, and the relationships of these microRNA signatures with regard each other or to the transposable elements from which they derive, have not been assessed in detail.

Ruby et al (11) identified sequences from introns that resemble microRNA hairpins and showed that these "mirtrons" can sometimes be processed in the absence of Drosha, the enzyme that typically cleaves the microRNA hairpin at its base. In their model, splicing defines pre-microRNAs in a fashion similar to the classical means of cleaving microRNA hairpins from a larger RNA sequence. In the present work, we provide evidence for many hundreds of thousands of genomic sequences that may in fact be functioning (or have functioned evolutionarily) as microRNAs. The finding of a massive number of microRNA related hairpins ("pseudomicroRNAs") suggests a critical role during establishment of primate evolutionary diversity, and may represent the same sort of historical record attributable to pseudogenes. In this report, we describe a relative frequency profile of the 10-12 most common of these sequences in human and chimpanzee. This profile is essentially the same in both species and is strikingly maintained across all chromosomes irrespective of chromosomal size. The two most common pseudomicroRNAs have been maintained at an approximate 1:1 ratio despite a twenty-one-fold difference in the number of copies of transposable elements from which the sequences were derived.

3. MATERIALS AND METHODS

All intronic sequences from human and chimpanzee genomes were obtained from the UCSC Table Browser using hg18/NCBI Build 36.1 (32). Intronic sequences were individually used as queries in a BLAST (version 2.2.16) (33) search against both the microRNA hairpins and the mature microRNAs from the miRBase database (12). We then developed Java servlets to convert results of the BLAST searches into a relational database using Microsoft Access. The resulting database was filtered by using a number of SQL queries. The results of the initial BLAST search were filtered to yield only those matches in which there were at least 18 identical nucleotides and a maximum E-value of 1 x 10-4.

4. RESULTS

4.1. Genome-wide analysis of DNA hairpin sequences

The complete sets of known human and chimpanzee intronic sequences were used as queries against the Sanger microRNA hairpin database and the mature microRNA database (12). The result of the initial BLAST search for microRNA hairpins was filtered to yield only those matches in which there were at least 18 identical nucleotides and an E-value ≤ 1 x 10-4. There were 698,069 hairpin hits that matched these criteria across the human genome and 762,787 across the chimpanzee genome. The distribution of hairpins by chromosome is depicted in Table 1 for human (and Table 2 for chimpanzee). Human and chimpanzee E-values ranged from 10-4 to 4 x 10-67 and 1 x 10-4 to 9 x 10-67, respectively.

Table 3 depicts the percentages of significant BLAST hits against a subset of the most common microRNA hairpins for each human chromosome, and for the entire genome (Table 4 provides the corresponding data in chimpanzees). The relative proportion of hits for each of the hairpins is remarkably similar and strongly conserved across every chromosome. Notably, each of the microRNAs presented in Table 3 is listed as occurring only once in the Sanger hairpin database (i.e. at a single chromosomal location) when a standard microRNA search was conducted, despite the presence of hundreds of thousands of closely related sequences throughout both primate genomes. The authentic frequency of these related sequences, therefore, could be missed using standard methodologies to query genomic data repositories.

4.2. Genome-wide mature microRNA analysis

A BLAST search against the Sanger mature microRNA database yielded 1,661 exact microRNA matches (E-value <= 0.0001) in the human genome and 2,003 in the chimpanzee genome. This represents a more conventional accounting of microRNAs than the hairpin-based search shown above. None of the most numerous mature microRNAs corresponded to the frequent hairpin sequences described in Table 1, and are therefore likely to represent functionally and/or evolutionarily distinct entities.

4.3. Analysis of microRNA-related hairpins in the human and chimpanzee genomes

The most common microRNA hairpin database sequences resulting from BLAST hits against the Sanger database are shown in Figure 1. There was no strong preference for the plus or minus strand among the intronic sequences. A subset of the frequently occurring human intronic hairpin sequences (colored) versus complete microRNA hairpin sequences obtained from the Sanger database is depicted, with representative alignments of known microRNAs to numerous pseudomicroRNAs. The sequences corresponding to mature microRNAs are highlighted in yellow. Figure 2 represents a summary of multiple sequence alignments (13) of twenty-five hsa-mir-566 homologs with the most significant BLAST E-values (< 1 x 10-30), and indicates a much higher degree of conservation among hsa-mir-566 hairpin homologs versus either upstream (5') or downstream (3') DNA.

A high degree of similarity between the most common intronic hairpin sequences and corresponding, previously established microRNAs was observed. The two most prevalent, but previously unappreciated microRNA-related motifs represent over 280,000 occurrences of hsa-mir-566 or hsa-mir-619 paralogs in the human genome, and more than 290,000 in chimpanzee.

As an additional test of the prevalence and conservation of a specific microRNA hairpin, the first sequence of Figure 1 (sub-sequence of hsa-mir-566; occurring 19,521 times) was used as a query to retrieve common and related motifs. The fifteen most frequent among 10,861 resulting human sequences are shown in Table 5. All of these are essentially the same except for minor base substitutions. The most prevalent (which occurs 362 times) was aligned with hsa-mir-566 (Figure 3). The region includes a sub-sequence that differs from the mature hsa-mir-566 by only three bases. When the same procedure was used to retrieve common intronic sequences similar to hsa-mir-619, the fifteen most common (of 783 intronic sequences) again exhibited only minor base substitutions (not shown).

5. DISCUSSION

Although once viewed as irrelevant, intronic DNA is presently known to serve a number of crucial functions (14-25). Among these is a role for introns and their imbedded transposable elements as a repository for DNA variation (7,9,10). For example, the microRNA hsa-mir-566 is derived from Alu Sg and found on human chromosome 3. Hsa-mir-619 occurs on chromosome 12, and is derived from the more ancient Alu Sx and the LINE, L1MC4. Alus Sx and J, the earliest ancestors of all Alus, had their period of greatest amplification around 55 million years ago (mya) at which time approximately 850,000 thousand copies are thought to have appeared during the initial stages of the primate radiation. Approximately 40,000 copies of Alu Sg1, a descendant of Alus Sx and J, appeared later - around 35 mya (26). Although an approximately twenty-one fold difference exists in the amplification of these two Alus during the primate radiation, the ratio of microRNA-related hairpin sequences derived from these retrotransposons (hsamir-619 and hsa-mir-566) has been maintained at a constant ratio (1.1:1.0) across every chromosome of both the human and chimpanzee genome. The explanation for this remarkable degree of conservation among hsa-mir-619 and -566 is not known (Table 3 and Table 4). However, the ratio of sequences similar to known microRNAs is clearly a function of both the invasive success of the transposons, and the degree to which the associated microRNAs have diverged from a common founder sequence. In some cases (e.g. hsa-mir-566), the sequence of a mature microRNA has been highly conserved within numerous introns. In others, such as those related to hsa-mir-619, the transposon-derived intronic sequences do not resemble known mature microRNAs, but maintain strong homology to a microRNA hairpin, and have their ancestral basis in a microRNA paradigm. Notably, despite common origins within transposable elements, the hairpins themselves are much more highly conserved than surrounding DNA sequence, a finding that supports evolutionarily preservation specifically of the hairpin related motifs, rather than nearby sequences within the ancestral Alu or other repetitive elements.

The finding that hundreds of thousands of microRNA sequences have expanded in concert with the primate radiation, and persist as a predominant feature of the genome, may have significant implications with regard to the evolution of human DNA. For example, it is reasonable to imagine that a diaspora of transposable elements encoding microRNAs capable of gene network regulation might promote substantial organism diversity (and complexity), despite a comparatively modest number of discrete genes. High level sequence conservation (Table 3, Figure 1) and stoichiometry (Table 3) among microRNA hairpins suggest a recondite selective advantage important to maintenance or evolution of the primate genome. Stem-loop models of the most prevalent hairpins support the notion that hsa-mir-566 and hsa-mir-619 form structures that expose complementary bases to each other (as well as to their respective reverse complements; an example is provided for hsa-mir-566 in Figure 4). Such structures could promote RNA binding interactions, such as an adaptive role silencing transposable element expression (10). On the other hand, this sort of stem-loop complementarity might instead contribute to chromosomal pairing during meiosis as proposed earlier by Forsdyke (15, although the circumstances under which this might occur are not known with certainty). The relative paucity (by 3-15 fold; Table 1) of microRNA-related hairpin sequences on the Y chromosome (which does not extensively pair during meiosis) could be compatible with the latter interpretation. In either case, the observation that legions of microRNA-related hairpins occur in the primate genome, including tens of thousands that encode a mature microRNA, will need to be reconciled with current models of transcriptional regulation and specificity. This is especially true given recent evidence for nonconventional pathways that process microRNAs, such as those that do not require Drosha (11). For example, although hsa-mir-566 has been implicated as contributing to human hematopoesis, the very high incidence of paralogs shown here has not been previously considered (27). Similarly, hsa-mir-548 related family members are known to regulate cancer gene expression (28). The specificity of this observation may need to be reevaluated in light of the thousands of genomic paralogs described by the current report.

In summary, a massive number of under-appreciated microRNA-related hairpin structures are present in human and chimpanzee genomes. It is not yet known whether these sequences played a role during human evolution and are now dormant, or retain partial activity and contribute to existing genetic networks in man. The findings suggest an expanded role for introns and transposable elements as a source of microRNA, and that variants of known regulatory RNAs may contribute to expression on a genome-wide basis.

6. ACKNOWLEDGEMENTS

We thank Ms. Jenny Mott for her role in preparing the manuscript. This work was supported by the National Institutes of Health (P30DK72482 to E.S.); and the Cystic Fibrosis Foundation (R464-CR11 to E.S.).

7. REFERENCES

1. Mattick, J. S. RNA regulation: a new genetics? Nat Rev Genet 5, 316-323 (2004)
http://dx.doi.org/10.1038/nrg1321
PMid:15131654

2. Mattick, J. S., J. Gagen. Accelerating networks in biology, engineering, and society. Science 307, 856-858 (2005)
http://dx.doi.org/10.1126/science.1103737
PMid:15705831

3. Mattick, J. S., M. J. Gagen. The evolution of controlled multitasked gene networks: The role of introns and other noncoding RNAs in the development of complex organisms. Mol Biol Evol 18, 1611-1630 (2001)
http://dx.doi.org/10.1093/oxfordjournals.molbev.a003951
PMid:11504843

4. Mattick, J. S. What makes a human? The Scientist 19, 32 (2005)

5. Hill, A. E., J. S. Hong, H. Wen, L. Teng, D. T. McPherson, S. A. McPherson, D. N. Levasseur, E. J. Sorscher. Micro-RNA-like effects of complete intronic sequences. Front Biosci 11, 1998-2006 (2006)
http://dx.doi.org/10.2741/1941
PMid:16368574

6. Guimbellot, J. S., S. W. Erickson, T. Mehta, H. Wen, G. P. Page, E. J. Sorscher, J. S. Hong. Correlation of microRNA levels during hypoxia with predicted target mRNAs through genome-wide microarray analysis. BMC Med Genomics 2, 15 (2009)
http://dx.doi.org/10.1186/1755-8794-2-15
PMid:19320992 �� PMCid:2667434

7. Rodriguez, A., S. Griffiths-Jones, J. L. Ashurst, A. Bradley. Identification of mammalian microRNA host genes and transcription units. Genome Res 14, 1902-1910 (2004)
http://dx.doi.org/10.1101/gr.2722704
PMid:15364901 �� PMCid:524413

8. Roger, A. J., P. J. Keeling, W. F. Doolittle. Introns, the broken transposons. Soc Gen Physiol Ser 49, 27-37 (1994)
PMid:7939900

9. Sela, N., B. Mersch, N. Gal-Mark, G. Lev-Maor, A. Hotz-Wagenblatt, G. Ast. Comparative analysis of transposed element insertion within human and mouse genomes reveals Alu's unique role in shaping the human transcriptome. Genome Biol 8, R127 (2007)
http://dx.doi.org/10.1186/gb-2007-8-6-r127
PMid:17594509 �� PMCid:2394776

10. Piriyapongsa, J., L. Marino-Ramirez, I. K. Jordan. Origin and evolution of human microRNAs from transposable elements. Genetics 176, 1323-1337 (2007)
http://dx.doi.org/10.1534/genetics.107.072553
PMid:17435244 �� PMCid:1894593

11. Ruby, J. G., C. H. Jan, D. P. Bartel. Intronic microRNA precursors that bypass Drosha processing. Nature 5, 83-86 (2007)
http://dx.doi.org/10.1038/nature05983
PMid:17589500 �� PMCid:2475599

12. Griffiths-Jones, S., H. K. Saini, S. van Dongen, A. J. Enright. miRBase: tools for microRNA genomics. NAR 36 (Database Issue), D154-D158 (2008)
PMid:17991681 �� PMCid:2238936

13. Poirot, O., E. O'Toole, C. Notredame. Tcoffee@igs: A web server for computing, evaluating and combining multiple sequence alignments. Nucleic Acids Res 31, 3503- 3506 (2003)
http://dx.doi.org/10.1093/nar/gkg522
PMid:12824354 �� PMCid:168929

14. Forsdyke, D. Are introns in-series error-detecting sequences? J Theor Biol 93, 861-866 (1980)
http://dx.doi.org/10.1016/0022-5193(81)90344-1

15. Forsdyke, D. A stem-loop "kissing" model for the initiation of recombination and the origin of introns. Mol Biol Evol 12, 949-958 (1995)
PMid:7476142

16. Forsdyke, D. An alternative way of thinking about stem-loops in DNA. A case study of the human G0S2 gene. J Theor Biol 192, 489-504 (1998)
http://dx.doi.org/10.1006/jtbi.1998.0674
PMid:9680722

17. Barrette, I. H., S. McKenna, D. R. Taylor, D. Forsdyke. Introns resolve the conflict between base order dependent stem-loop potential and the encoding of RNA or protein: further evidence from overlapping genes. Gene 270, 181-189 (2001)
http://dx.doi.org/10.1016/S0378-1119(01)00477-2

18. Doyle, G. G. A general theory of chromosome pairing based on the palindromic DNA model of Sobell with modifications and amplifications. J Theor Biol 70, 171-184 (1978)
http://dx.doi.org/10.1016/0022-5193(78)90345-4

19. Ares, M., L. Grate, M. H. Pauling. A handful of intron-containing genes produce the lion's share of yeast mRNA. RNA 5, 1138-1139 (1999)
http://dx.doi.org/10.1017/S1355838299991379
PMid:10496214 �� PMCid:1369836

20. Frederickson, R. M., M. R. Micheau, A. Iwamoto, N. G. Miyamoto. 5' flanking and first intron sequences of the human beta-actin gene required for efficient promoter activity. Nucleic Acids Res 17, 253-270 (1989)
http://dx.doi.org/10.1093/nar/17.1.253
PMid:2911466 �� PMCid:331549

21. Luo, M., R. Reed. Splicing is required for rapid and efficient mRNA export in metazoans. Proc Natl Acad Sci USA 96, 14937-14942 (1999)
http://dx.doi.org/10.1073/pnas.96.26.14937

22. Fong, Y. W., Q. Zhou. Stimulatory effect of splicing factors on transcriptional elongation. Nature 414, 929-933 (2001)
http://dx.doi.org/10.1038/414929a
PMid:11780068

23. Maniatis, R., R. Reed. An extensive network of coupling among gene expression machines. Nature 416, 499-506 (2002)
http://dx.doi.org/10.1038/416499a
PMid:11932736

24. Hormuzdi, S., R. Penttinen, R. Jaenisch, P. Bornstein. A gene-targeting approach identifies a function for the first intron in expression of the alpha1(I) collagen gene. Mol Cell Biol 18, 3368-3375 (1998)
PMid:9584177 �� PMCid:108918

25. Beaton, M. J., T. Cavalier-Smith. Eukaryotic noncoding DNA is functional: evidence from the differential scaling of cryptomonad genomes. Proc R Soc Lond B Biol Sci 266, 2053-2059 (1999)
http://dx.doi.org/10.1098/rspb.1999.0886
PMid:10902541 �� PMCid:1690321

26. Roy-Engel, A. M., M. A. Batzer, P. L. Deininger. Evolution of Human Retrosequences: Alu. In Encyclopedia of Life Sciences (ELS) Wiley, Chichester, DOI: 10.1002/9780470015902.a0005131.pub2 (2008)
http://dx.doi.org/10.1002/9780470015902.a0005131.pub2

27. Kim, Y. C., Q. Wu, J. Chen, Z. Xuan, Y. C. Jung, M. Q. Zhang, J. D. Rowley, S. M. Wang. The transcriptome of human CD34+ hematopoietic stem-progenitor cells. Proc. Natl. Acad. Sci. U S A 106, 8278-83 (2009)
http://dx.doi.org/10.1073/pnas.0903390106
PMid:19416867 �� PMCid:2688877

28. Pinyapongsa, J., I. K. Jordan. A family of human microRNA genes from miniature inverted-repeat transposable elements. PLoS ONE 2, e203 (2007)
http://dx.doi.org/10.1371/journal.pone.0000203
PMid:17301878 �� PMCid:1784062

29. Notredame, C., C. Abergel. Using Multiple Alignment Methods to Assess the Quality of Genomic Data Analysis. In Andrade,M. (ed.), Bioinformatics and Genomes: Current Perspectives. Horizon Scientific Press, Norwich, pp. 30-50 (2003)

30. Zuker, M. Mfold web server for nucleic acid folding and hybridization prediction. Nucleic Acids Res 31, 3406-15 (2003)
http://dx.doi.org/10.1093/nar/gkg595
PMid:12824337 �� PMCid:169194

31. Mathews, D. H., J. Sabina, M. Zuker, D. H. Turner. Expanded sequence dependence of thermodynamic parameters improves prediction of RNA secondary structure. J Mol Biol 288, 911-940 (1999)
http://dx.doi.org/10.1006/jmbi.1999.2700
PMid:10329189

32. Kent, W. J., C. W. Sugnet, T. S. Furey, K. M. Roskin, T. H. Pringle, A. M. Zahler, D. Haussler. The human genome browser at UCSC. Genome Res 12, 996-1006 (2002)
PMid:12045153 �� PMCid:186604

33. Altschul, S. F., T. L. Madden, A. A. Schaffer, J. Zhang, Z. Zhang, W. Miller, D. J. Lipman. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs, Nucleic Acids Res 25, 3389-3402 (1997)
http://dx.doi.org/10.1093/nar/25.17.3389
PMid:9254694 �� PMCid:146917

Key Words: microRNAs, Introns, Gene Regulation, Transposons

Send correspondence to: Aubrey Hill, MCLM 794, 1530 3rd Avenue South, Birmingham, AL 35294, Tel: 205-996-4136, Fax: 205-934-5473, E-mail:ahill@uab.edu