Prefix-tuning for entity matching, error detection and data imputation

Our paper on prefix-tuning for data wrangling tasks is currently in review.

Prefix-tuning is based on this paper and code by Li et al. The language modelling setup for data wrangling tasks is based on this paper and code by HazyResearch. The implementation of T5 is based on this project by ChainsmokersAI.

Instructions

Clone the repo and download the data.

The data can also be manually downloaded from here and should be extracted in the new 'data' directory.

git clone https://github.com/davidvos/prefix-tuning-for-data-management.git
cd prefix-tuning-for-data-management
mkdir data
wget https://fm-data-tasks.s3.us-west-1.amazonaws.com/datasets.tar.gz -P data
tar xvf data/datasets.tar.gz -C data/

You should also create folders for output data, metrics and models.

mkdir -p outputs/{models,metrics,data}

Install the necessary packages

pip install -e transformers/
pip install -r requirements.txt

Setup

Edit the config.yaml file to support WandB logging. If you want to run without WandB, please manually remove all WandB statements in main.py.

wandb:
  project_name: '<YOUR WandB PROJECT NAME>'
  entity: '<YOUR WandB ENTITY NAME>'

Run

To run training, evaluation and testing, execute the following commands. For --finetune-type, either choose 'prefix' or 'fine'. For --task, choose' entity-matching', 'error-detection' or 'data-imputation'. Details on all possible arguments can be found in 'utils/utils.py'.

python main.py \
    --data_dir 'data/datasets/entity_matching/structured/iTunes-Amazon' \
    --prefix_size 10 \
    --finetune_type 'prefix' \
    --task 'entity-matching' \
    --n_samples 0 \
    --batch_size 16 \
    --n_epochs 10 \
    --lr 5e-5 \
    --seed 1234

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
transformers		transformers
utils		utils
.gitignore		.gitignore
README.md		README.md
__init__.py		__init__.py
config.yaml		config.yaml
dataset.py		dataset.py
main.py		main.py
prefix_model.py		prefix_model.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Prefix-tuning for entity matching, error detection and data imputation

Instructions

Clone the repo and download the data.

Install the necessary packages

Setup

Run

About

Releases

Packages

Languages

davidvos/prefix-tuning-for-data-management

Folders and files

Latest commit

History

Repository files navigation

Prefix-tuning for entity matching, error detection and data imputation

Instructions

Clone the repo and download the data.

Install the necessary packages

Setup

Run

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages