-
Notifications
You must be signed in to change notification settings - Fork 3
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
28218d8
commit a3bdb29
Showing
1 changed file
with
45 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,45 @@ | ||
# Turku NER corpus | ||
|
||
An open, broad-coverage corpus for informal Persian named entity recognition collected from Twitter. | ||
|
||
![Example of ParsTwiNER corpus annotation](https://github.com/overfit-ir/persian-twitter-ner/raw/master/docs/example.png) | ||
|
||
## Releases | ||
|
||
Version 1.0: | ||
|
||
* zip package: [ParsTwiNER-v1.0.zip](https://github.com/overfit-ir/persian-twitter-ner/releases/download/v1.0.0/ParsTwiNER.zip) | ||
|
||
Recommended. This is the first complete, stable release of the corpus and the version used in our experiments with the data. | ||
|
||
## Quickstart | ||
|
||
A version of the corpus data is found in CoNLL-like format in the following files: | ||
|
||
* `twitter_data/persian-ner-twitter-data/train.txt`: training data | ||
* `twitter_data/persian-ner-twitter-data/dev.txt`: development data | ||
* `twitter_data/persian-ner-twitter-data/test.txt`: test data | ||
|
||
These files are in a simple two-column tab-separated format with IOB2 tags: | ||
|
||
``` | ||
این O | ||
تاجالدین B-PER | ||
همونه O | ||
که O | ||
دخترش O | ||
دور O | ||
قبل O | ||
نماینده O | ||
اصفهان B-LOC | ||
بود O | ||
``` | ||
|
||
The corpus annotation marks mentions of person (`PER`), organization (`ORG`), location (`LOC`), nations (`NAT`), political groups (`POG`), and event (`EVENT`) names. | ||
|
||
|
||
## Guidelines | ||
|
||
The [ParsTwiNER annotations instructions](https://github.com/overfit-ir/persian-twitter-ner/blob/master/docs/README.md) are available in MD format. | ||
|
||
## Reference |