High-order dynamic Bayesian network learning with hidden common causes for causal gene regulatory network

Leung Yau LO, Man Leung WONG, Kin Hong LEE, Kwong Sak LEUNG

Research output: Journal PublicationsJournal Article (refereed)

8 Citations (Scopus)

Abstract

Background: Inferring gene regulatory network (GRN) has been an important topic in Bioinformatics. Many computational methods infer the GRN from high-throughput expression data. Due to the presence of time delays in the regulatory relationships, High-Order Dynamic Bayesian Network (HO-DBN) is a good model of GRN. However, previous GRN inference methods assume causal sufficiency, i.e. no unobserved common cause. This assumption is convenient but unrealistic, because it is possible that relevant factors have not even been conceived of and therefore un-measured. Therefore an inference method that also handles hidden common cause(s) is highly desirable. Also, previous methods for discovering hidden common causes either do not handle multi-step time delays or restrict that the parents of hidden common causes are not observed genes.

Results: We have developed a discrete HO-DBN learning algorithm that can infer also hidden common cause(s) from discrete time series expression data, with some assumptions on the conditional distribution, but is less restrictive than previous methods. We assume that each hidden variable has only observed variables as children and parents, with at least two children and possibly no parents. We also make the simplifying assumption that children of hidden variable(s) are not linked to each other. Moreover, our proposed algorithm can also utilize multiple short time series (not necessarily of the same length), as long time series are difficult to obtain.

Conclusions: We have performed extensive experiments using synthetic data on GRNs of size up to 100, with up to 10 hidden nodes. Experiment results show that our proposed algorithm can recover the causal GRNs adequately given the incomplete data. Using the limited real expression data and small subnetworks of the YEASTRACT network, we have also demonstrated the potential of our algorithm on real data, though more time series expression data is needed.
Original languageEnglish
Article number395
Pages (from-to)1-28
Number of pages28
JournalBMC Bioinformatics
Volume16
Early online date25 Nov 2015
DOIs
Publication statusPublished - 2015

Fingerprint

Dynamic Bayesian Networks
Gene Regulatory Networks
Gene Regulatory Network
Bayesian networks
Genes
Learning
Higher Order
Time series
Hidden Variables
Parents
Time Delay
Time delay
Incomplete Data
Sufficiency
Network Algorithms
Bioinformatics
Synthetic Data
Conditional Distribution
Computational methods
Computational Methods

Bibliographical note

This research is partially supported by GRF Grant (Project References 414413) and GRF grant (LU310111) from the Research Grant Council of the Hong Kong Special Administrative Region.

Keywords

  • Causality inference
  • Gene regulatory network
  • Hidden common cause
  • High-order dynamic Bayesian Network

Cite this

@article{78583c91e77a4196820413a2ff080f7b,
title = "High-order dynamic Bayesian network learning with hidden common causes for causal gene regulatory network",
abstract = "Background: Inferring gene regulatory network (GRN) has been an important topic in Bioinformatics. Many computational methods infer the GRN from high-throughput expression data. Due to the presence of time delays in the regulatory relationships, High-Order Dynamic Bayesian Network (HO-DBN) is a good model of GRN. However, previous GRN inference methods assume causal sufficiency, i.e. no unobserved common cause. This assumption is convenient but unrealistic, because it is possible that relevant factors have not even been conceived of and therefore un-measured. Therefore an inference method that also handles hidden common cause(s) is highly desirable. Also, previous methods for discovering hidden common causes either do not handle multi-step time delays or restrict that the parents of hidden common causes are not observed genes.Results: We have developed a discrete HO-DBN learning algorithm that can infer also hidden common cause(s) from discrete time series expression data, with some assumptions on the conditional distribution, but is less restrictive than previous methods. We assume that each hidden variable has only observed variables as children and parents, with at least two children and possibly no parents. We also make the simplifying assumption that children of hidden variable(s) are not linked to each other. Moreover, our proposed algorithm can also utilize multiple short time series (not necessarily of the same length), as long time series are difficult to obtain.Conclusions: We have performed extensive experiments using synthetic data on GRNs of size up to 100, with up to 10 hidden nodes. Experiment results show that our proposed algorithm can recover the causal GRNs adequately given the incomplete data. Using the limited real expression data and small subnetworks of the YEASTRACT network, we have also demonstrated the potential of our algorithm on real data, though more time series expression data is needed.",
keywords = "Causality inference, Gene regulatory network, Hidden common cause, High-order dynamic Bayesian Network",
author = "LO, {Leung Yau} and WONG, {Man Leung} and LEE, {Kin Hong} and LEUNG, {Kwong Sak}",
note = "This research is partially supported by GRF Grant (Project References 414413) and GRF grant (LU310111) from the Research Grant Council of the Hong Kong Special Administrative Region.",
year = "2015",
doi = "10.1186/s12859-015-0823-6",
language = "English",
volume = "16",
pages = "1--28",
journal = "BMC Bioinformatics",
issn = "1471-2105",
publisher = "BioMed Central Ltd.",

}

High-order dynamic Bayesian network learning with hidden common causes for causal gene regulatory network. / LO, Leung Yau; WONG, Man Leung; LEE, Kin Hong; LEUNG, Kwong Sak.

In: BMC Bioinformatics, Vol. 16, 395, 2015, p. 1-28.

Research output: Journal PublicationsJournal Article (refereed)

TY - JOUR

T1 - High-order dynamic Bayesian network learning with hidden common causes for causal gene regulatory network

AU - LO, Leung Yau

AU - WONG, Man Leung

AU - LEE, Kin Hong

AU - LEUNG, Kwong Sak

N1 - This research is partially supported by GRF Grant (Project References 414413) and GRF grant (LU310111) from the Research Grant Council of the Hong Kong Special Administrative Region.

PY - 2015

Y1 - 2015

N2 - Background: Inferring gene regulatory network (GRN) has been an important topic in Bioinformatics. Many computational methods infer the GRN from high-throughput expression data. Due to the presence of time delays in the regulatory relationships, High-Order Dynamic Bayesian Network (HO-DBN) is a good model of GRN. However, previous GRN inference methods assume causal sufficiency, i.e. no unobserved common cause. This assumption is convenient but unrealistic, because it is possible that relevant factors have not even been conceived of and therefore un-measured. Therefore an inference method that also handles hidden common cause(s) is highly desirable. Also, previous methods for discovering hidden common causes either do not handle multi-step time delays or restrict that the parents of hidden common causes are not observed genes.Results: We have developed a discrete HO-DBN learning algorithm that can infer also hidden common cause(s) from discrete time series expression data, with some assumptions on the conditional distribution, but is less restrictive than previous methods. We assume that each hidden variable has only observed variables as children and parents, with at least two children and possibly no parents. We also make the simplifying assumption that children of hidden variable(s) are not linked to each other. Moreover, our proposed algorithm can also utilize multiple short time series (not necessarily of the same length), as long time series are difficult to obtain.Conclusions: We have performed extensive experiments using synthetic data on GRNs of size up to 100, with up to 10 hidden nodes. Experiment results show that our proposed algorithm can recover the causal GRNs adequately given the incomplete data. Using the limited real expression data and small subnetworks of the YEASTRACT network, we have also demonstrated the potential of our algorithm on real data, though more time series expression data is needed.

AB - Background: Inferring gene regulatory network (GRN) has been an important topic in Bioinformatics. Many computational methods infer the GRN from high-throughput expression data. Due to the presence of time delays in the regulatory relationships, High-Order Dynamic Bayesian Network (HO-DBN) is a good model of GRN. However, previous GRN inference methods assume causal sufficiency, i.e. no unobserved common cause. This assumption is convenient but unrealistic, because it is possible that relevant factors have not even been conceived of and therefore un-measured. Therefore an inference method that also handles hidden common cause(s) is highly desirable. Also, previous methods for discovering hidden common causes either do not handle multi-step time delays or restrict that the parents of hidden common causes are not observed genes.Results: We have developed a discrete HO-DBN learning algorithm that can infer also hidden common cause(s) from discrete time series expression data, with some assumptions on the conditional distribution, but is less restrictive than previous methods. We assume that each hidden variable has only observed variables as children and parents, with at least two children and possibly no parents. We also make the simplifying assumption that children of hidden variable(s) are not linked to each other. Moreover, our proposed algorithm can also utilize multiple short time series (not necessarily of the same length), as long time series are difficult to obtain.Conclusions: We have performed extensive experiments using synthetic data on GRNs of size up to 100, with up to 10 hidden nodes. Experiment results show that our proposed algorithm can recover the causal GRNs adequately given the incomplete data. Using the limited real expression data and small subnetworks of the YEASTRACT network, we have also demonstrated the potential of our algorithm on real data, though more time series expression data is needed.

KW - Causality inference

KW - Gene regulatory network

KW - Hidden common cause

KW - High-order dynamic Bayesian Network

UR - http://commons.ln.edu.hk/sw_master/2071

UR - https://www.scopus.com/inward/record.uri?eid=2-s2.0-84961121475&doi=10.1186%2fs12859-015-0823-6&partnerID=40&md5=f49aa225419fffd99b39d14ddc5a3f76

U2 - 10.1186/s12859-015-0823-6

DO - 10.1186/s12859-015-0823-6

M3 - Journal Article (refereed)

C2 - 26608050

VL - 16

SP - 1

EP - 28

JO - BMC Bioinformatics

JF - BMC Bioinformatics

SN - 1471-2105

M1 - 395

ER -