|
Organizers |
Incorporating prior knowledge to the dynamic Bayesian networks modeling of pancreas development gene expression data
by
Xujing Wang
Max McGee National Research Center for Juvenile Diabetes & Human and Molecular Genetics Center, Medical College of Wisconsin, 8701 Watertown Plank Road, Milwaukee, WI, 53226, USA
Coauthors: Shouguo Gao
The importance of the network structure underlies genes and proteins is gaining increasing appreciation. This is not only fundamental to the understanding of genetic regulation and its functional structure, but also critical to dissect complex diseases. Time series gene expression data offer a rich source for network inference. We have adopted the dynamic Bayesian network (DBYN) approach to model transcription regulatory and co-expression networks, and developed new algorithms to incorporate existing biological information (co-citation, GO (gene ontology) similarity, positional and binding information, etc) in public databases as prior knowledge. We introduced, for the first time, fuzzy theory-based rules to the MCMC learning of DBYN in order to efficiently incorporate the prior biological knowledge, which are often incomplete and plagued with quality issues. Further we defined gene expression (phase) synchronization module and utilized it to assist initial network structure construction. We show that these lead to significantly improved performance. We then applied the algorithm to investigate the pancreatic development. We first compiled a list of pancreas-specific genes by: (1) Manually collecting curated genes that are known to be involved in pancreas development from the literature; (2) Tissue specific gene expression data was downloaded from http://www.t1dbase.org. We then determined for each gene the Z score of its expression in pancreas versus the mean in all tissues. We focus on those with pancreas Z>0.2, and being annotated to GO:0032502 (developmental process) or its descendant categories. Together we obtained a total of 45 genes. Two data sets were obtained from RNA Abundance Database (www.cbil.upenn.edu/RAD) to perform network reconstruction: (1) study id 2, expression of mouse pancreas development at 7 time points: E14.5, E16.5, E18.5, birth, postnatal day 7, and at adulthood; (2) study id 1790, mouse pancreas at 12, 24 and 48 hrs after 50% Ppx or sham operation, which also received Ex-4 or vehicle every 24 hours. We found that with GO and co-citation information our DBYN predicted number of experimental established relationships were improved 1.5 to 2 fold. The improvement is more when we used the experimentally confirmed gene interaction as an initial structure to train the Bayesian network.
Date received: May 14, 2008
Copyright © 2008 by the author(s). The author(s) of this document and the organizers of the conference have granted their consent to include this abstract in Atlas Conferences Inc. Document # caxj-02.