Skip to content

honghanhh/templateNER

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

L3i++ at Semeval 2023-Task 2: CoNER

Introduction

This repository contains the source code for the L3i++ team at Semeval 2023-Task 2: CoNER.

Datasets

We use the dataset from the SemEval 2023 Task 2: MultiCoNER II Multilingual Complex Named Entity Recognition, which is available at here. This dataset contains 12 languages (English, Spanish, Swedish, Ukrainian, Portuguese, French, Farsi, German, Chinese, Hindi, Bangla, and Italian), divided into 3 parts: train, dev, and test. Each part contains a set of CoNLL files, which are the input data for the model. The CoNLL files are in the following format:

# id 0d88e010-c6e8-4409-9dec-a785e43eac16	domain=de
sie _ _ O
war _ _ O
die _ _ O
erste _ _ O
frau _ _ O
die _ _ O
beim _ _ O
großes _ _ B-Facility
auge _ _ I-Facility
beobachtet _ _ O
durfte _ _ O
. _ _ O

See the sample files in the public_data/DE-German/ folder.

Requirements

Run the following command to install the required packages:

pip install -r requirements.txt

Usage

To preprocess the data, run the following command:

python ./models/preprocess.py --input_dir './public_data/DE-German/' --output_dir './preprocessed_data/' --lang 'de'

See the sample files after preprocessing steps in the preprocessed_data folder.

To train the model, run the following command:

python  ./models/train.py --train './preprocessed_data/de-train.csv' --test './preprocessed_data/de-dev.csv' --output_dir './bart_de' --model 'bart'

You can also access the monolingual English trained model at here as an example of how model is saved.

To inference the model and export the results, run the following command:

python  ./models/inference.py --data_path './public_data/DE-German/de_test.conll' --word_max_length 4 --model 'mbart' --model_path './best_model/' --output_path './de.pred.conll'

If you are lazy to run theses 3 above commands, you can run the following command to end-to-end reproduce the results:

chmod +x run.sh
./run.sh

Results

We will update the results after the leaderboard is released.

Contributors

About

No description or website provided.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published