State-change Relation Extraction Dataset (StaRE)

This is the state-change relation extraction dataset of paper "Enhanced Distant Supervision with State-Change Information for Relation Extraction" in LREC 2022.

Data

This git repo requires access to the LDC2011T07 (English Gigaword Fifth Edition) dataset, and we provide manual annotations for our usecase. In addition, we provide indices of the data and script so that the actual data can be retrieved from Gigaword corpus.

You can run the ./get_original.py to get the actual sentences, tokens and subject-object names in sentences by giving the dataset directory as input path.

You can run:

conda create --name myenv python=3.9.12

conda activate myenv

conda install pip

pip install -r ./requirements.txt

conda install -c conda-forge cupy

python -m spacy download en_core_web_trf

python ./get_original.py <path_to_gigaword_unzipped_directory> <path_where_you_want_to_save_the_data>

In ./data, there are 5 folders :

train contains all the training data with the format - train_windowsize_relationtype.txt which contains all the positives referring to a particular relation. It also contains a train_negatives files which contains 10k fixed negatives which are to be used in addition to the positives for training each sceanrio.
val contains all the static validation data for each relation type.
test contains all the static test data for each relation type.
dynamic_val contains all the dynamic validation data for each relation type.
dynamic_test contains all the dynamic test data for each relation type.

Relationtype in dynamic:

0.txt represents P26@start
1.txt represents P26@end
3.txt represents P35@start
7.txt represents P54@start
5.txt represents P463@start

Relationtype in static:

Merging 0.txt and 1.txt represents P26
3.txt represents P35
7.txt represents P54
5.txt represents P463

License

The code is released under the under terms of the Apache-2.0 License.

Name		Name	Last commit message	Last commit date
Latest commit History 83 Commits
data		data
COPYRIGHT.txt		COPYRIGHT.txt
README.md		README.md
get_original.py		get_original.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

State-change Relation Extraction Dataset (StaRE)

Data

Relationtype in dynamic:

Relationtype in static:

License

About

Releases

Packages

Contributors 3

Languages

License

iesl/state-change-re

Folders and files

Latest commit

History

Repository files navigation

State-change Relation Extraction Dataset (StaRE)

Data

Relationtype in dynamic:

Relationtype in static:

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages