-
Notifications
You must be signed in to change notification settings - Fork 527
下游任务数据集
zhezhaoa edited this page Jan 6, 2021
·
26 revisions
CLUE is a Chinese Language Understanding Evaluation benchmark which contains classification and machine reading comprehension tasks. The datasets in CLUE are in JSON format. For classification datasets, we convert the JSON format to TSV format so that UER can load them directly. For machine reading comprehension, the original format is retained and the dataset pre-processing is included in the project.
Classification:
Dataset | Link |
TNEWS | https://share.weiyun.com/maExfIeO |
CSL | https://share.weiyun.com/LftIGlIT |
CMNLI | https://share.weiyun.com/hn3kTeKm |
OCNLI | https://share.weiyun.com/3DlKxB3q |
AFQMC | https://share.weiyun.com/CdlEKMON |
IFLYTEK | https://share.weiyun.com/ldiLjnZJ |
CLUEWSC2020 | https://share.weiyun.com/RLL1ShBi |
Machine reading comprehension:
Dataset | Link |
CMRC2018 | https://share.weiyun.com/p3Y9INyC |
C3 | in the project |
ChID | https://share.weiyun.com/Mix4q2ns |
Named entity recognition:
Dataset | Link |
CLUENER2020 | https://share.weiyun.com/smSMtLkn |
ERNIE provides 5 Chinese datasets in its first version and use them to test ERNIE's performance.
Dataset | Link |
ChnSentiCorp | in the project |
LCQMC | https://share.weiyun.com/5Fmf2SZ |
XNLI | https://share.weiyun.com/mcd8EApl |
MSRA-NER | in the project |
NLPCC-DBQA | https://share.weiyun.com/5HJMbih |