Time delayed causal gene regulatory network inference with hidden common causes

Leung Yau LO, Man Leung WONG, Kin Hong LEE, Kwong Sak LEUNG

Research output: Journal PublicationsJournal Article (refereed)Researchpeer-review

5 Citations (Scopus)

Abstract

Inferring the gene regulatory network (GRN) is crucial to understanding the working of the cell. Many computational methods attempt to infer the GRN from time series expression data, instead of through expensive and time-consuming experiments. However, existing methods make the convenient but unrealistic assumption of causal sufficiency, i.e. all the relevant factors in the causal network have been observed and there are no unobserved common cause. In principle, in the real world, it is impossible to be certain that all relevant factors or common causes have been observed, because some factors may not have been conceived of, and therefore are impossible to measure. In view of this, we have developed a novel algorithm named HCC-CLINDE to infer an GRN from time series data allowing the presence of hidden common cause(s). We assume there is a sparse causal graph (possibly with cycles) of interest, where the variables are continuous and each causal link has a delay (possibly more than one time step). A small but unknown number of variables are not observed. Each unobserved variable has only observed variables as children and parents, with at least two children, and the children are not linked to each other. Since it is difficult to obtain very long time series, our algorithm is also capable of utilizing multiple short time series, which is more realistic. To our knowledge, our algorithm is far less restrictive than previous works. We have performed extensive experiments using synthetic data on GRNs of size up to 100, with up to 10 hidden nodes. The results show that our algorithm can adequately recover the true causal GRN and is robust to slight deviation from Gaussian distribution in the error terms. We have also demonstrated the potential of our algorithm on small YEASTRACT subnetworks using limited real data.
Original languageEnglish
Article numbere0138596
Pages (from-to)1-47
Number of pages47
JournalPLoS ONE
Volume10
Issue number9
DOIs
Publication statusPublished - 22 Sep 2015

Fingerprint

Gene Regulatory Networks
Genes
Time series
time series analysis
Gaussian distribution
Computational methods
Normal Distribution
Experiments
gene regulatory networks
Parents
methodology
cells

Bibliographical note

This work was supported by The Research Grants Council of the Hong Kong Special Administrative Region (http://www.ugc.edu.hk/eng/rgc/index.htm), Project References 414413: KSL and LU310111: MLW.

Cite this

LO, Leung Yau ; WONG, Man Leung ; LEE, Kin Hong ; LEUNG, Kwong Sak. / Time delayed causal gene regulatory network inference with hidden common causes. In: PLoS ONE. 2015 ; Vol. 10, No. 9. pp. 1-47.
@article{247540a996334b6a9879815c9be72a12,
title = "Time delayed causal gene regulatory network inference with hidden common causes",
abstract = "Inferring the gene regulatory network (GRN) is crucial to understanding the working of the cell. Many computational methods attempt to infer the GRN from time series expression data, instead of through expensive and time-consuming experiments. However, existing methods make the convenient but unrealistic assumption of causal sufficiency, i.e. all the relevant factors in the causal network have been observed and there are no unobserved common cause. In principle, in the real world, it is impossible to be certain that all relevant factors or common causes have been observed, because some factors may not have been conceived of, and therefore are impossible to measure. In view of this, we have developed a novel algorithm named HCC-CLINDE to infer an GRN from time series data allowing the presence of hidden common cause(s). We assume there is a sparse causal graph (possibly with cycles) of interest, where the variables are continuous and each causal link has a delay (possibly more than one time step). A small but unknown number of variables are not observed. Each unobserved variable has only observed variables as children and parents, with at least two children, and the children are not linked to each other. Since it is difficult to obtain very long time series, our algorithm is also capable of utilizing multiple short time series, which is more realistic. To our knowledge, our algorithm is far less restrictive than previous works. We have performed extensive experiments using synthetic data on GRNs of size up to 100, with up to 10 hidden nodes. The results show that our algorithm can adequately recover the true causal GRN and is robust to slight deviation from Gaussian distribution in the error terms. We have also demonstrated the potential of our algorithm on small YEASTRACT subnetworks using limited real data.",
author = "LO, {Leung Yau} and WONG, {Man Leung} and LEE, {Kin Hong} and LEUNG, {Kwong Sak}",
note = "This work was supported by The Research Grants Council of the Hong Kong Special Administrative Region (http://www.ugc.edu.hk/eng/rgc/index.htm), Project References 414413: KSL and LU310111: MLW.",
year = "2015",
month = "9",
day = "22",
doi = "10.1371/journal.pone.0138596",
language = "English",
volume = "10",
pages = "1--47",
journal = "PLoS ONE",
issn = "1932-6203",
publisher = "Public Library of Science",
number = "9",

}

Time delayed causal gene regulatory network inference with hidden common causes. / LO, Leung Yau; WONG, Man Leung; LEE, Kin Hong; LEUNG, Kwong Sak.

In: PLoS ONE, Vol. 10, No. 9, e0138596, 22.09.2015, p. 1-47.

Research output: Journal PublicationsJournal Article (refereed)Researchpeer-review

TY - JOUR

T1 - Time delayed causal gene regulatory network inference with hidden common causes

AU - LO, Leung Yau

AU - WONG, Man Leung

AU - LEE, Kin Hong

AU - LEUNG, Kwong Sak

N1 - This work was supported by The Research Grants Council of the Hong Kong Special Administrative Region (http://www.ugc.edu.hk/eng/rgc/index.htm), Project References 414413: KSL and LU310111: MLW.

PY - 2015/9/22

Y1 - 2015/9/22

N2 - Inferring the gene regulatory network (GRN) is crucial to understanding the working of the cell. Many computational methods attempt to infer the GRN from time series expression data, instead of through expensive and time-consuming experiments. However, existing methods make the convenient but unrealistic assumption of causal sufficiency, i.e. all the relevant factors in the causal network have been observed and there are no unobserved common cause. In principle, in the real world, it is impossible to be certain that all relevant factors or common causes have been observed, because some factors may not have been conceived of, and therefore are impossible to measure. In view of this, we have developed a novel algorithm named HCC-CLINDE to infer an GRN from time series data allowing the presence of hidden common cause(s). We assume there is a sparse causal graph (possibly with cycles) of interest, where the variables are continuous and each causal link has a delay (possibly more than one time step). A small but unknown number of variables are not observed. Each unobserved variable has only observed variables as children and parents, with at least two children, and the children are not linked to each other. Since it is difficult to obtain very long time series, our algorithm is also capable of utilizing multiple short time series, which is more realistic. To our knowledge, our algorithm is far less restrictive than previous works. We have performed extensive experiments using synthetic data on GRNs of size up to 100, with up to 10 hidden nodes. The results show that our algorithm can adequately recover the true causal GRN and is robust to slight deviation from Gaussian distribution in the error terms. We have also demonstrated the potential of our algorithm on small YEASTRACT subnetworks using limited real data.

AB - Inferring the gene regulatory network (GRN) is crucial to understanding the working of the cell. Many computational methods attempt to infer the GRN from time series expression data, instead of through expensive and time-consuming experiments. However, existing methods make the convenient but unrealistic assumption of causal sufficiency, i.e. all the relevant factors in the causal network have been observed and there are no unobserved common cause. In principle, in the real world, it is impossible to be certain that all relevant factors or common causes have been observed, because some factors may not have been conceived of, and therefore are impossible to measure. In view of this, we have developed a novel algorithm named HCC-CLINDE to infer an GRN from time series data allowing the presence of hidden common cause(s). We assume there is a sparse causal graph (possibly with cycles) of interest, where the variables are continuous and each causal link has a delay (possibly more than one time step). A small but unknown number of variables are not observed. Each unobserved variable has only observed variables as children and parents, with at least two children, and the children are not linked to each other. Since it is difficult to obtain very long time series, our algorithm is also capable of utilizing multiple short time series, which is more realistic. To our knowledge, our algorithm is far less restrictive than previous works. We have performed extensive experiments using synthetic data on GRNs of size up to 100, with up to 10 hidden nodes. The results show that our algorithm can adequately recover the true causal GRN and is robust to slight deviation from Gaussian distribution in the error terms. We have also demonstrated the potential of our algorithm on small YEASTRACT subnetworks using limited real data.

UR - http://commons.ln.edu.hk/sw_master/5055

UR - https://www.scopus.com/inward/record.uri?eid=2-s2.0-84947720982&doi=10.1371%2fjournal.pone.0138596&partnerID=40&md5=afe98963a4db41179b5cf973ac4a84fc

U2 - 10.1371/journal.pone.0138596

DO - 10.1371/journal.pone.0138596

M3 - Journal Article (refereed)

VL - 10

SP - 1

EP - 47

JO - PLoS ONE

JF - PLoS ONE

SN - 1932-6203

IS - 9

M1 - e0138596

ER -