This project aims to learn brain connectome data via an existing dynamic graph learning method, named DyGFormer [2]. We've constructed a functional connectivity (FC) matrix with a functional magnetic resonance imaging (fMRI) images, by slightly modifying the approach in [1].
Our implementation is built on top of DyGFormer repository.
We've conducted two downstream tasks,(1) link prediction and (2) graph regression.
- The result showed that the DyGFormer model successfully predicted the future links in a dynamic graph, even with a novel connectome dataset.
- Also, the graph regression task suggested that DyGFormer model can generate a graph embedding that captures the dynamic features of brain connectome data.
Hence, the novelty of our project are two folds: (1) applying DyGFormer to a novel connectome dataset and (2) extending DyGFormer for graph-level prediction tasks
We've generated dynamic graphs from raw fMRI images. The raw fMRI images are "100 unrelated subjects" version of HCP Young Adult Dataset [3]. We specifically downloaded the language task fMRI dataset, which was measured while a subject is conducting a Language Task [4]. However, due to the computational cost, we decided to use data from 50 subjects.
We've generated a binary adjacency matrix from a functional connectivity (FC) matrix for each subject.
- The FC matrix was constructed following the sliding-window approach in 1. We set the parameter values (length of window and starting points) to capture connectivity in 50 timepoints.
- We extracted top 5-percentile values of FC matrix to construct the binary adjacency matrix.
Since the number of nodes in our brain graph is 400, the preprocessed data has a shape of (50, 50, 400, 400
) representing (#subjects, #windows, #nodes, #nodes
) with each adjacency matrix having 8000 edges. Then, we transformed the preprocessed data into into .csv files having the format that DyGFormer necessitates, which is described in DATASETS_README.md
under DG_data
folder.
The goal of this task is to predict the possibility of connectivity (output) between the given brain nodes (Regions of Interest, RoIs) (inputs). The training procedure is as follows.
- Source node IDs and destination node IDs of a particular batch are fed into DyGFormer to output source node embeddings and destination node embeddings
- The two types of node embeddings are added pair-wise to obtain the unified node embeddings of the batch.
- Node embeddings of the batch are fed into a simple MLP network to output the possibility.
- As negative sampling, we repeat 1, 2 and 3 but with incorrect destination node IDs.
- The possibility should be close to one when matched IDs are used but zero otherwise, and the loss is calculated by Binary Cross Entropy.
- We repeat these steps for all batches.
The goal of this task is to predict the language task score (output) of a subject given the connections between brain nodes across time (input). The training procedure is as follows.
- Source node IDs and destination node IDs of a particular batch are fed into DyGFormer to output source node embeddings and destination node embeddings
- The two types of node embeddings are averaged to obtain the unified node embeddings of the batch.
- All node embeddings of the batch undergo mean pooling to obtain the graph embedding of the batch.
- Graph embeddings vectors for all batches are fed into a simple CNN model followed by MLP layers to produce a single value of language accuracy score, and the loss is calculated by Mean Squared Error.
- We repeat these steps for all subjects.
DG_data
- contains the extracted dataset from raw fMRI images in the format of .csv files.models
- contains the models in the format of .py files.preprocess_data
- contains the code for preprocessing the raw dataset in DG_data folder.processed_data
- contains the preprocessed dataset, which can be directly fed into the graph regression model.saved_models
- contains the saved models for each run. The numbers represent the index of repeated runs.utils
- This folder contains the code used for utility in our model and training. Originally from [2].
Dataset.ipynb
- generates a brain connectome dataset with files downloaded from [3].train_graph_regression.ipynb
- trains a DyGFormer model for the graph regression task with the dataset generated by Dataset.ipynb.train_link_prediction.ipynb
- trains a DyGFormer model for the link predictiont task with the same dataset.
-
You can directly see our results by running the cells under
Testing
sections of the two training notebooks with the checkpoints fromsaved_models
folder after running necessary cells. -
If you train the models, note that
load_checkpoint
parameter isFalse
by default. -
We chose not to include the raw fMRI image files (in the format of
.nii.gz
) in this repo due to its sheer size. To generate data by yourself usingDataset.ipynb
, please add a shortcut to our folder, CS471 Project, to your own drive first. The details are described in the notebook.
[1] Learning dynamic graph representation of brain connectome with spatio-temporal attention, Kim et al., NeurIPS 2021.
https://doi.org/10.48550/arXiv.2105.13495
[2] Towards better dynamic graph learning: new architecture and unified library
Yu et al., NeurIPS 2023.
https://doi.org/10.48550/arXiv.2303.13047
[3] HCP Young Adult Dataset
https://www.humanconnectome.org/study/hcp-young-adult
[4] Mapping Anterior Temporal Lobe Language Areas with FMRI: A Multi-Center Normative Study
https://doi.org/10.1016/j.neuroimage.2010.09.048
This project is the result of equal contributions by Kim Junyup (ytrewq271828@kaist.ac.kr) and Kyaw Ye Thu (kyawyethu@kaist.ac.kr)