下游任务数据集

CLUE benchmark

CLUE is a Chinese Language Understanding Evaluation benchmark which contains classification and machine reading comprehension tasks. The datasets in CLUE are in JSON format. For classification datasets, we convert the JSON format to TSV format so that UER can load them directly. For machine reading comprehension, the original format is retained and the dataset pre-processing is included in the project.

Classification:

Dataset	Link
TNEWS	https://share.weiyun.com/maExfIeO
CSL	https://share.weiyun.com/LftIGlIT
CMNLI	https://share.weiyun.com/hn3kTeKm
OCNLI	https://share.weiyun.com/3DlKxB3q
AFQMC	https://share.weiyun.com/CdlEKMON
IFLYTEK	https://share.weiyun.com/ldiLjnZJ
CLUEWSC2020	https://share.weiyun.com/RLL1ShBi

Machine reading comprehension:

Dataset	Link
CMRC2018	https://share.weiyun.com/p3Y9INyC
C3	in the project
ChID	https://share.weiyun.com/Mix4q2ns

Named entity recognition:

Dataset	Link
CLUENER2020	https://share.weiyun.com/smSMtLkn

Baidu ERNIE

ERNIE provides 5 Chinese datasets in its first version and use them to test ERNIE's performance.

Dataset	Link
ChnSentiCorp	in the project
LCQMC	https://share.weiyun.com/5Fmf2SZ
XNLI	https://share.weiyun.com/mcd8EApl
MSRA-NER	in the project
NLPCC-DBQA	https://share.weiyun.com/5HJMbih

Home
主页
- 项目特色
- 依赖环境
- 快速上手
- 预训练数据
- 下游任务数据集
- 预训练模型仓库
- 使用说明
- 竞赛解决方案
  - 中文任务测评基准CLUE
  - SMP2020-EWECT
  - SMP2019-ECISA
  - CCF-BDCI2021-面向黑灰产治理的恶意短信变体字还原
  - 英文任务测评基准GLUE
- 引用

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

下游任务数据集

CLUE benchmark

Baidu ERNIE

Clone this wiki locally