Annotated FKC Corpus

Overview

This is a Japanese text corpus that consists of Fuman (complaints) documents with various linguistic annotations. FKC stands for Fuman Kaitori Center, which is a Japanese consumer opinion data collection and analysis service. This corpus contains complaint documents with various genres, such as consumer electronics, hospital, information technology (IT), supermarket, trip, and traffic. It comprises 654 documents, which correspond to 1,282 sentences.

The linguistic annotations consist of annotations of morphology, named entities, dependencies, predicate-argument structures including zero anaphora, and coreferences. All the annotations were given by manually modifying automatic analyses of the morphological analyzer Juman++ and the dependency, case structure and anaphora analyzer.

Notes on annotation guidelines

The annotation guidelines for this corpus are written in the manuals found in the doc directory in ku-nlp/KWDLC. The guidelines for morphology and dependencies are described in syn_guideline.pdf, those for predicate-argument structures and coreferences are described in rel_guideline.pdf. The guidelines for named entities are available at the IREX website (http://nlp.cs.nyu.edu/irex/).

Distributed files

knp/: the corpus annotated with morphology, named entities, dependencies, predicate-argument structures, and coreferences
org/: the raw corpus
id/: document id files providing train/test split

Statistics

	# of documents	# of sentences	# of morphemes	# of named entities	# of predicates	# of coreferring mentions
train	454	885	12,496	72	4,105	565
dev	100	195	2,653	9	867	146
test	100	202	2,850	16	961	140
total	654	1,282	17,999	97	5,933	851

Format of the corpus annotated with annotations of morphology, named entities, dependencies, predicate-argument structures, and coreferences

Annotations of this corpus are given in the following format.

# S-ID:fuman-trip-0000000001-1
* 2D
+ 3D
太郎 たろう 太郎 名詞 6 人名 5 * 0 * 0
は は は 助詞 9 副助詞 2 * 0 * 0
* 2D
+ 2D
京都 きょうと 京都 名詞 6 地名 4 * 0 * 0
+ 3D <NE:ORGANIZATION:京都大学>
大学 だいがく 大学 名詞 6 普通名詞 1 * 0 * 0
に に に 助詞 9 格助詞 1 * 0 * 0
* -1D
+ -1D <rel type="ガ" target="太郎" sid="fuman-trip-0000000001-1" id="0"/><rel type="ニ" target="大学" sid="fuman-trip-0000000001-1" id="2"/>
行った いった 行く 動詞 2 * 0 子音動詞カ行促音便形 3 タ形 10
EOS

A description of this format can be found in the documentation of KWDLC.

References

萩行正嗣, 河原大輔, 黒橋禎夫. 多様な文書の書き始めに対する意味関係タグ付きコーパスの構築とその分析, 自然言語処理, Vol.21, No.2, pp.213-248, 2014. https://doi.org/10.5715/jnlp.21.213

Acknowledgment

The creation of this corpus was supported by Insight Tech Inc. We deeply appreciate their support.

Contact

If you have any questions or problems about this corpus, please send an email to nl-resource at nlp.ist.i.kyoto-u.ac.jp.

Copyright

The copyright of the complaint documents belongs to Insight Tech Inc. The copyright of the annotation information belongs to Kurohashi Lab, Kyoto University.

License

The license for this corpus is subject to CC BY-NC-SA 4.0. The purpose of using this corpus is limited to academic research.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
id		id
knp		knp
org		org
scripts		scripts
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Annotated FKC Corpus

Overview

Notes on annotation guidelines

Distributed files

Statistics

Format of the corpus annotated with annotations of morphology, named entities, dependencies, predicate-argument structures, and coreferences

References

Acknowledgment

Contact

Copyright

License

About

Releases

Packages

Contributors 2

Languages

ku-nlp/AnnotatedFKCCorpus

Folders and files

Latest commit

History

Repository files navigation

Annotated FKC Corpus

Overview

Notes on annotation guidelines

Distributed files

Statistics

Format of the corpus annotated with annotations of morphology, named entities, dependencies, predicate-argument structures, and coreferences

References

Acknowledgment

Contact

Copyright

License

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages