[Frontiers in Bioscience 7, a90-98, May 1, 2002]

STATISTICAL METHODS FOR ANALYSIS OF TIME COURSE GENE EXPRESSION DATA

Hongzhe Li, Yihui Luan, Fangxin Hong, Yueju Li

Rowe Program in Human Genetics, Departments of Medicine and Statistics, University of California, Davis, CA 95616, USA.

TABLE OF CONTENTS

1. Abstract
2. Introduction
3. Materials and Methods
3.1. Time lagged correlation coefficient analysis
3.2. The mixed-effects model for time-course gene expression data
3.3. Methods for clustering genes based on the mixed-effects model
3.4. Estimation of gene expression trajectory and missing data
3.5. Aligning several time-course gene expression profiles
4. Results
4.1. Yeast cell-cycle gene expression data
4.2. Time-lagged correlation analysis for yeast cell-cycle data
4.2.1. Time-lagged correlations for cell-cycle regulated genes
4.2.2. Time-lagged correlations and protein-protein interactions
4.3. Application of the mixed-effects model to yeast cell-cycle data
4.3.1. Clustering analysis of yeast cell-cycle data
4.3.2. Estimation of gene expression trajectory and missing data
4.4. Identification of genes that are regulated by two yeast forkhead genes
5. Other Related Work
6. Discussions and Future Directions
7. Acknowledgement
8. References

1. ABSTRACT

Since many biological systems or regulatory networks are dynamic systems, gene expression levels measured over different time points during a given biological process can often provide more insights about the underlying system. These gene expression data measured over time are often called the time-course gene expression data. One unique feature of such data is the time dependency of the gene expression levels for a given gene at different times or between two different genes. Statistical analysis needs to account for such dependency in order to make valid inferences. This paper presents several statistical methods for analyzing such time-course gene expression data, including the time-lagged correlation coefficient for analyzing the relationship between genes, a mixed-effects model with splines for clustering genes and for estimating missing gene expression data, and a new method for aligning gene expression profiles obtained under two experimental conditions and for identifying gene clusters that show significant changes between two experimental conditions. We used the yeast cell cycle gene expression data sets to illustrate these methods and obtained the biologically meaningful conclusions from these analyses.