L3i++ at Semeval 2023-Task 2: CoNER

Introduction

This repository contains the source code for the L3i++ team at Semeval 2023-Task 2: CoNER.

Datasets

We use the dataset from the SemEval 2023 Task 2: MultiCoNER II Multilingual Complex Named Entity Recognition, which is available at here. This dataset contains 12 languages (English, Spanish, Swedish, Ukrainian, Portuguese, French, Farsi, German, Chinese, Hindi, Bangla, and Italian), divided into 3 parts: train, dev, and test. Each part contains a set of CoNLL files, which are the input data for the model. The CoNLL files are in the following format:

# id 0d88e010-c6e8-4409-9dec-a785e43eac16	domain=de
sie _ _ O
war _ _ O
die _ _ O
erste _ _ O
frau _ _ O
die _ _ O
beim _ _ O
großes _ _ B-Facility
auge _ _ I-Facility
beobachtet _ _ O
durfte _ _ O
. _ _ O

See the sample files in the public_data/DE-German/ folder.

Requirements

Run the following command to install the required packages:

pip install -r requirements.txt

Usage

To preprocess the data, run the following command:

python ./models/preprocess.py --input_dir './public_data/DE-German/' --output_dir './preprocessed_data/' --lang 'de'

See the sample files after preprocessing steps in the preprocessed_data folder.

To train the model, run the following command:

python  ./models/train.py --train './preprocessed_data/de-train.csv' --test './preprocessed_data/de-dev.csv' --output_dir './bart_de' --model 'bart'

You can also access the monolingual English trained model at here as an example of how model is saved.

To inference the model and export the results, run the following command:

python  ./models/inference.py --data_path './public_data/DE-German/de_test.conll' --word_max_length 4 --model 'mbart' --model_path './best_model/' --output_path './de.pred.conll'

If you are lazy to run theses 3 above commands, you can run the following command to end-to-end reproduce the results:

chmod +x run.sh
./run.sh

Results

We will update the results after the leaderboard is released.

Contributors

🐮 TRAN Thi Hong Hanh 🐮

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

L3i++ at Semeval 2023-Task 2: CoNER

Introduction

Datasets

Requirements

Usage

Results

Contributors

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
models		models
preprocessed_data		preprocessed_data
public_data		public_data
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
run.sh		run.sh

honghanhh/templateNER

Folders and files

Latest commit

History

Repository files navigation

L3i++ at Semeval 2023-Task 2: CoNER

Introduction

Datasets

Requirements

Usage

Results

Contributors

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages