Skip to content

[KDD2025] Modeling Time-evolving Causality over Data Streams (to appear).

License

Notifications You must be signed in to change notification settings

C-Naoki/ModePlait

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ModePlait: Modeling Time-Evolving Causality over Data Streams

Implementation of "Modeling Time-Evolving Causality over Data Streams," Naoki Chihara, Yasuko Matsubara, Ren Fujiwara, and Yasushi Sakurai. The 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD2025 (to appear).

Introduction

We focus on causal relationships that evolve over time in data streams and refer such relationships as "time-evolving causality." We presented ModePlait, which aims to discover time-evolving causalities in multivariate co-evolving data streams, and forecast future values in a stream fashion simultaneously. The overview of our proposed model is following:

The following preview of our results shows the effectiveness of ModePlait over an epidemiological data stream. We would refer you to our paper for more details of these results and proposed methods.

Requirements

This source code is tested with the following dependencies:

  • Python == 3.9.15
  • numpy == 1.23.5
  • pandas == 1.5.3
  • matplotlib == 3.8.2
  • scikit-learn == 1.1.3
  • scipy == 1.11.4

Usage

  1. Clone this repository.

    git clone https://github.com/C-Naoki/ModePlait.git
  2. Construct a virtual environment and install the required packages.

    make install
    • Note that it requires to pyenv and poetry.
    • If you prefer not to use them, you can also use requirements.txt created based on pyproject.toml.

    Specifically, the above command performs the following steps:

    1. if necessary, install Python 3.9.15 using pyenv, and then switch to this version.
    2. tell poetry to use python 3.9.15.
    3. install packages in pyproject.toml.
    4. attach the path file (i.e., *.pth) in the site-packages/ for extending module search path.

    Please check the Makefile for more details.

  3. Run quick demos of ModePlait

    sh bin/google.sh

    If you want the command to continue running after logging out, you prepare nohup/ directory and use -n option as shown below (using nohup).

    mkdir nohup
    sh bin/google.sh -n
    • The execution log is saved in nohup/ directory.

Datasets

  1. covid19 [link]
  2. web-search [link]
  3. chicken-dance, exercise [link]
  • All datasets except 1. covid19 are placed in the folder ./data
  • If you execute the command sh bin/covid19.sh, the 1. covid19 is automatically downloaded from Google COVID-19 Open Data Repository and saved in the folder ./data.

Experiments

Baselines

We compared our algorithm with the following seven state-of-the-art baselines for causal discovering, namely CASPER, DARING, NoCurl, NOTEARS-MLP (NO-MLP), NOTEARS, LiNGAM, and GES. We also compared with the following five leading competitors in time series forecasting, namely TimesNet, PatchTST, DeepAR, OrbitMap, and ARIMA.

Q1. Causal discovering

We ran experiments on synthetic datasets with multiple temporal sequences to encompass various types of scenarios and ModePlait outperformed all competitors for every setting.

Q2. Forecasting

ModePlait achieved a high forecasting accuracy for every dataset, including both synthetic and real-world datasets.

Q3. Ablation study

We can see that discovering the time-evolving causality adaptively is very helpful when forecasting in a streaming fashion.

Experimental setup

We conducted all above experiments on an Intel Xeon Platinum 8268 2.9GHz quad core CPU with 512GB of memory and running Linux.

Citation

If you use this code for your research, please consider citing our paper.

About

[KDD2025] Modeling Time-evolving Causality over Data Streams (to appear).

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published