Chris Stoeckert's home page

Computational Biology and Informatics Lab (CBIL)

View My GitHub Profile

COGRIM - Clustering of Genes into Regulons using Integrated Modeling

The computational approaches that are used to identify regulatory modules and networks have traditionally used information either from expression data, sequence features (ChIP binding data or binding motif data) of transcription factors (TF). Although those approaches have been proven useful, their power is inherently limited by the fact that each data resource provides only partial information: expression data provides only functional or indirect evidence, whereas binding data or binding motifs only provide physical location information. Recent efforts on integrating these data types have drawbacks, such as arbitrary parameter cutoffs or too heuristic with little systematic modeling.

We present a Bayesian hierarchical model and Markov Chain Monte Carlo implementation that integrates heterogeneous information including expression data, sequence features in a principled and robust fashion. Our model, COGRIM, does not require the prior clustering of expression data or many of the arbitrary parameter thresholds of previous methods.

Our applications represent both unicellular and mammalian organisms as well as several scenarios of available data. We apply our model to S. cerevisiae, where large amounts of ChIP binding data and gene expression data are available. Our validation analyses show that our predicted gene-TF interactions are very likely to be biologically relevant. We also examine two transcription factors in mammals: C/EBP-beta where TF binding site data, ChIP binding data and expression data are all available, and SRF, where only TF binding site data and gene expression data are available. In both of these applications, we demonstrate the ability to predict gene-TF interactions with reduced levels of false positives.

Our general approach of Bayesian modeling for integrating heterogeneous biological data to discover regulatory networks provides a framework for overcoming the intrinsic limitations of available methods, and should prove useful in applications to other organisms.

Citation

G. Chen, S. T. Jensen, C. Stoeckert, “Clustering of Genes into Regulons using Integrated Modeling-COGRIM”, Genome Biology, 2007, Jan. 4;8(1):R4 PMID:17204163