Skip to content

下游任务数据集

zhezhaoa edited this page Jan 6, 2021 · 26 revisions

English | 中文

CLUE benchmark

CLUE is a Chinese Language Understanding Evaluation benchmark which contains classification and machine reading comprehension tasks. The datasets in CLUE are in JSON format. For classification datasets, we convert the JSON format to TSV format so that UER can load them directly. For machine reading comprehension, the original format is retained and the dataset pre-processing is included in the project.

Classification:

Dataset Link
TNEWS https://share.weiyun.com/maExfIeO
CSL https://share.weiyun.com/LftIGlIT
CMNLI https://share.weiyun.com/hn3kTeKm
OCNLI https://share.weiyun.com/3DlKxB3q
AFQMC https://share.weiyun.com/CdlEKMON
IFLYTEK https://share.weiyun.com/ldiLjnZJ
CLUEWSC2020 https://share.weiyun.com/RLL1ShBi

Machine reading comprehension:

Dataset Link
CMRC2018 https://share.weiyun.com/p3Y9INyC
C3 in the project
ChID https://share.weiyun.com/Mix4q2ns

Named entity recognition:

Dataset Link
CLUENER2020 https://share.weiyun.com/smSMtLkn

Baidu ERNIE

ERNIE provides 5 Chinese datasets in its first version and use them to test ERNIE's performance.

Dataset Link
ChnSentiCorp in the project
LCQMC https://share.weiyun.com/5Fmf2SZ
XNLI https://share.weiyun.com/mcd8EApl
MSRA-NER in the project
NLPCC-DBQA https://share.weiyun.com/5HJMbih
Clone this wiki locally