Genetics Workshop

Query large scale microarray gene expression datasets using Bayesian model-based method with variable selection

Zhaohui Qin
Dept. of Biostatistics, University of Michigan

In microarray gene expression data analysis, it is often of interest to identify genes that share similar expression profiles with a particular gene such as a key regulator or a disease-associated gene. Multiple studies have been conducted using various correlation measures to identify such co-expressed genes. While working well for small datasets, the heterogeneity introduced from increased sample size inevitably reduces the sensitivity and specificity of these approaches. This is because most co-expression relationships do not extend to all experimental conditions. To identify functionally related genes from large and diverse microarray gene expression datasets is a key challenge. We develop a model-based gene expression query algorithm built under the Bayesian model selection framework. It is capable of detecting co-expression profiles under a subset of samples/experimental conditions. In addition, it allows linearly transformed expression patterns to be recognized and is robust against sporadic outliers. Both features are critically important for increasing the power of identifying co-expressed genes in large scale gene expression datasets. Application to the Escherichia coli compendium data identifies majority of known regulons as well as novel potential target genes of the transcription factor Lucien-responsive regulatory protein (Lrp).

Michigan State University | Department of Statistics and Probability | Statistical Genetics Lab
l>