Skip to content

Commit

Permalink
Create README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
mmaghajani authored Aug 1, 2021
1 parent 28218d8 commit a3bdb29
Showing 1 changed file with 45 additions and 0 deletions.
45 changes: 45 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
# Turku NER corpus

An open, broad-coverage corpus for informal Persian named entity recognition collected from Twitter.

![Example of ParsTwiNER corpus annotation](https://github.com/overfit-ir/persian-twitter-ner/raw/master/docs/example.png)

## Releases

Version 1.0:

* zip package: [ParsTwiNER-v1.0.zip](https://github.com/overfit-ir/persian-twitter-ner/releases/download/v1.0.0/ParsTwiNER.zip)

Recommended. This is the first complete, stable release of the corpus and the version used in our experiments with the data.

## Quickstart

A version of the corpus data is found in CoNLL-like format in the following files:

* `twitter_data/persian-ner-twitter-data/train.txt`: training data
* `twitter_data/persian-ner-twitter-data/dev.txt`: development data
* `twitter_data/persian-ner-twitter-data/test.txt`: test data

These files are in a simple two-column tab-separated format with IOB2 tags:

```
این O
تاج‌الدین B-PER
همونه O
که O
دخترش O
دور O
قبل O
نماینده O
اصفهان B-LOC
بود O
```

The corpus annotation marks mentions of person (`PER`), organization (`ORG`), location (`LOC`), nations (`NAT`), political groups (`POG`), and event (`EVENT`) names.


## Guidelines

The [ParsTwiNER annotations instructions](https://github.com/overfit-ir/persian-twitter-ner/blob/master/docs/README.md) are available in MD format.

## Reference

0 comments on commit a3bdb29

Please sign in to comment.