Skip to content
/ STEA Public

Code and data for paper "Dependency-aware Self-training for Entity Alignment".

Notifications You must be signed in to change notification settings

uqbingliu/STEA

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

STEA

This repo contains the source code of paper "Dependency-aware Self-training for Entity Alignment", which has been accepted at WSDM 2023.

Download the used data from this Dropbox directory. Decompress it and put it under STEA_code/ as shown in the folder structure below.

📌 The code has been tested. Feel free to create issues if you cannot run it successfully. Thanks!

Structure of Folders

STEA_code/
  |- datasets/
  |- OpenEA/
  |- scripts/
  |- stea/
    |- Dual_AMN/
    |- GCN-Align/
    |- RREA/
  |- environment.yml
  |- README.md

After you run a certain script, the program will automatically create one folder output/ which stores the evaluation results.

Device

The configurations of my devices are as below:

  • The experiments on 15K datasets were run on one GPU server, which is configured with an Intel(R) Xeon(R) Gold 6128 3.40GHz CPU, 128GB memory, 3 NVIDIA GeForce GTX 2080Ti GPUs and Ubuntu 20.04 OS.
  • The experiments on 100K datasets were run on one computing cluster, which runs CentOS 7.8.2003, and allocates us 200GB memory and 2 NVidia Volta V100 SXM2 GPUs.

I think one basic configuration can be: 12GB GPU for 15K datasets, and 32GB GPU for 100K datasets.

Install Conda Environment

cd to the project directory first. Then, run the following command to install the major environment packages.

conda env create -f environment.yml

Activate the env via conda activate stea, and then install package graph-tool:

conda install -c conda-forge graph-tool==2.29

(It seems slow to install this package. So be patient.)

With the installed environment above, you can run STEA for Dual-AMN, RREA and GCN-Align.

If you also want to run STEA for AliNet, please also install the following packages with pip:

pip install igraph
pip install python-Levenshtein
pip install dataclasses

Run Scripts

Some shell scripts with parameter settings are provided under scripts/ folder. Some brief

  • run_{Self-training_method}_w_{EA_Model}.sh. Run a certain self-training method with a certain EA model. You can set the name of dataset, the annotation amount, and other settings as you need.
  • run_analyze_paramK.sh. Analyze the sensitivity to the hyperparameter K.
  • run_analyze_norm_minmax.sh. Replace the softmax-based normalisation module with a MinMax scaler for analyzing the necessity of our normalisation module.

For each task, the evaluation results as well as some other outputs can be found in a certain folder under the output/ directory.

Note: AliNet runs much slower than the other EA models. So you can explore the self-training methods with the other EA models first.

You Want to Report Issues?

We are willing to hear from you if you have any problem in running our code, or find inconsistency between your running results and what reported in the paper.

Citation

Please cite this paper if you use the released code in your work.

@inproceedings{DBLP:conf/wsdm/0025LHZ23,
  author    = {Bing Liu and
               Tiancheng Lan and
               Wen Hua and
               Guido Zuccon},
  editor    = {Tat{-}Seng Chua and
               Hady W. Lauw and
               Luo Si and
               Evimaria Terzi and
               Panayiotis Tsaparas},
  title     = {Dependency-aware Self-training for Entity Alignment},
  booktitle = {Proceedings of the Sixteenth {ACM} International Conference on Web
               Search and Data Mining, {WSDM} 2023, Singapore, 27 February 2023 -
               3 March 2023},
  pages     = {796--804},
  publisher = {{ACM}},
  year      = {2023},
  url       = {https://doi.org/10.1145/3539597.3570370},
  doi       = {10.1145/3539597.3570370},
  timestamp = {Fri, 24 Feb 2023 13:56:00 +0100},
  biburl    = {https://dblp.org/rec/conf/wsdm/0025LHZ23.bib},
  bibsource = {dblp computer science bibliography, https://dblp.org}
}

Acknowledgement

We used the source codes of RREA, Dual-AMN, OpenEA, and GCN-Align.

About

Code and data for paper "Dependency-aware Self-training for Entity Alignment".

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published