[Frontiers in Bioscience 14, 4058-4070, January 1, 2009]

A strategy for meta-analysis of short time series microarray datasets

Ruping Sun1, Xuping Fu1, Fenghua Guo2, Zhaorong Ma1, Chris Goulbourne3, Mei Jiang1, Yao Li1, Yi Xie1, Yumin Mao1

1State Key Laboratory of Genetic Engineering, Institute of Genetics, School of Life Science, Fudan University, Shanghai 200433 PR, China, 2Shanghai BioStar Genechip Institute, Shanghai 200092, P.R. China, and 3Department of Biological and Biomedical Sciences, University of Durham, Durham, UK

TABLE OF CONTENTS

1. Abstract
2. Introduction
3. Materials and Methods
3.1. Data collection and preprocessing
3.2. Differential expression analysis in each dataset
3.3. Meta-analysis and identification of commonly transient heat stress sensitive genes
3.4. Selective combination of time series data for further bioinformatic analysis
3.5. Clustering method and promoter analysis
4. Results
4.1. Collection of short time series microarray datasets
4.2. Differential expression analysis in each dataset
4.3. Meta-analysis and identification of commonly transient heat stress sensitive genes (CTHS genes)
4.4. Combination of time series data finds overall genetic response tendency
4.5. Clustering analysis of a case category identifies detailed response trends
4.6. Promoter analysis of different heat shock elements
5. Discussion
6. Summary
7. Acknowledgement
8. References

1. ABSTRACT

Many time series microarray experiments have relatively short (less than ten) time points and lack in repeats, weakening the confidence of results. Combining the microarray data from different groups may improve the statistical power of detecting differentially expressed genes. However, few efforts have been taken to combine or compare the time-course array datasets generated by independent groups. Here we demonstrated a suitable strategy for meta-analysis of short time series microarray datasets and implemented this strategy on four published heat shock microarray datasets of Saccharomyces Cerevisiae. We first assessed the significance of each gene in each datasets based on area calculation and the null distribution of the areas. Then the similarity of significance values across datasets was assessed with meta-analysis methods, yielding a set of transient heat shock stress sensitive genes. Following correlation calculation helped us to combine the transformed data at the same time points of each gene. Further bioinformatic investigation showed the significance of our strategy, and also indicated some interesting features of regulatory systems in S. cerevisiae during transient heat stress.