Causal Discovery from Heterogeneous/Nonstationary Data


Research output: Journal PublicationsJournal Article (refereed)peer-review

91 Citations (Scopus)


It is commonplace to encounter heterogeneous or nonstationary data, of which the underlying generating process changes across domains or over time. Such a distribution shift feature presents both challenges and opportunities for causal discovery. In this paper, we develop a framework for causal discovery from such data, called Constraint-based causal Discovery from heterogeneous/NOnstationary Data (CD-NOD), to find causal skeleton and directions and estimate the properties of mechanism changes. First, we propose an enhanced constraint-based procedure to detect variables whose local mechanisms change and recover the skeleton of the causal structure over observed variables. Second, we present a method to determine causal orientations by making use of independent changes in the data distribution implied by the underlying causal model, benefiting from information carried by changing distributions. After learning the causal structure, next, we investigate how to efficiently estimate the “driving force” of the nonstationarity of a causal mechanism. That is, we aim to extract from data a low-dimensional representation of changes. The proposed methods are nonparametric, with no hard restrictions on data distributions and causal mechanisms, and do not rely on window segmentation. Furthermore, we find that data heterogeneity benefits causal structure identification even with particular types of confounders. Finally, we show the connection between heterogeneity/nonstationarity and soft intervention in causal discovery. Experimental results on various synthetic and real-world data sets (task-fMRI and stock market data) are presented to demonstrate the efficacy of the proposed methods.
Original languageEnglish
Number of pages53
JournalJournal of Machine Learning Research
Publication statusPublished - May 2020

Bibliographical note

We are grateful to the anonymous reviewers whose careful comments and suggestions helped improve this manuscript. We would like to acknowledge the support by National Institutes of Health under Contract No. NIH-1R01EB022858-01, FAIN-R01EB022858, NIH1R01LM012087, NIH5U54HG008540-02, and FAIN-U54HG008540, and by the United States Air Force under Contract No. FA8650-17-C7715. The National Institutes of Health and the U.S. Air Force are not responsible for the views reported in this article. JZ’s research was supported in part by the Research Grants Council of Hong Kong under the General Research Fund LU342213.


  • Causal discovery
  • Confounder
  • Driving force estimation
  • Heterogeneous/nonstationary data
  • Independent-change principle
  • Kernel distribution embedding


Dive into the research topics of 'Causal Discovery from Heterogeneous/Nonstationary Data'. Together they form a unique fingerprint.

Cite this