name: kcc
fullname: Korean Contemporary Corpus of Written Sentences
lang: ko
category: formal
description: KCC150, KCCq28, KCC940 -- Korean Contemporary Corpus of Written Sentences
Total 732 million words (48,878,948 sentences)
license: MIT License
homepage: http://nlp.kookmin.ac.kr/kcc/
version: 1.0.0
num_docs: 46529987
num_docs_before_processing: 48878952
num_segments: 46529987
num_sents: 46529987
num_words: 703222627
size_in_bytes: 7300422420
num_bytes_before_processing: 7707141509
size_in_human_bytes: 6.80 GiB
data_files_modified: '2022-02-23 09:49:42'
info_updated: '2022-02-26 03:06:09'
data_files:
train: kcc-train.parquet
meta_files: {}
features:
columns:
id: id
text: text
data:
id: int
text: str
This repository has been archived by the owner on May 9, 2024. It is now read-only.