name: kcbert
fullname: KcBERT Pre-Training Corpus
lang: ko
category: informal
description: KcBERT Pre-Training Corpus (Korean News Comments)
license: MIT License
homepage: https://github.com/Beomi/KcBERT
version: 1.0.0
num_docs: 82990213
num_docs_before_processing: 86246286
num_segments: 82990213
num_sents: 82990213
num_words: 1088177367
size_in_bytes: 12289121362
num_bytes_before_processing: 12391020706
size_in_human_bytes: 11.45 GiB
data_files_modified: '2022-02-23 10:07:00'
info_updated: '2022-02-26 03:06:08'
data_files:
train: kcbert-train.parquet
meta_files: {}
features:
columns:
id: id
text: text
data:
id: int
text: str
This repository has been archived by the owner on May 9, 2024. It is now read-only.