[Frontiers in Bioscience S4, 1333-1343, June 1, 2012]

Computational methods for the analysis of tag sequences in metagenomics studies

Qin Chang1, Yihui Luan1, Ting Chen2,3, Jed A. Fuhrman4, Fengzhu Sun2,3

1School of Mathematics, Shandong University, Jinan, Shandong 250100, PR China, 2Molecular and Computational Biology Program, University of Southern California, Los Angeles, CA 90089-2910, USA, 3TNLIST/Department of Automation, Tsinghua University, Beijing 100084, PR China,4Department of Biological Sciences and Wrigley Institute for Environmental Studies, University of Southern California, Los Angeles, CA 90089-2910, USA

TABLE OF CONTENTS

1. Abstract
2. Introduction
3. Operational taxonomic units (OTUs) -based analysis of metagenomics communities
3.1. Computational methods for the identification of OTUs
3.2. Comparison of communities based on OTUs
4. Phylogeny-based methods for comparing metagenomics communities
4.1. The test and the phylogenetic(P) test
4.2. UniFrac, weighted UniFrac, and variance adjusted weighted UniFrac
5. Association networks of OTUs and environmental factors
6. Discussion
7. Acknowledgements
8. References

1. ABSTRACT

Metagenomics commonly refers to the study of genetic materials directly derived from environments without culturing. Several ongoing large-scale metagenomics projects related to human and marine life, as well as pedology studies, have generated enormous amounts of data, posing a key challenge for efficient analysis, as we try to 1) understand microbial organism assemblage under different conditions, 2) compare different communities, and 3) understand how microbial organisms associate with each other and the environment. To address such questions, investigators are using new sequencing technologies, including Sanger, Illumina Solexa, and Roche 454, to sequence either particular genes, called tag sequences, mostly 16S or 18S ribosomal RNA sequences or other conserved genes, or whole metagenome shotgun sequences of all the genetic materials in a given community. In this paper, we review computational methods used for the analysis of tag sequences.