Skip to content
This repository has been archived by the owner on May 9, 2024. It is now read-only.

Latest commit

 

History

History
38 lines (36 loc) · 1.03 KB

kcc.md

File metadata and controls

38 lines (36 loc) · 1.03 KB

Korean Contemporary Corpus of Written Sentences

Sample

name: kcc
fullname: Korean Contemporary Corpus of Written Sentences
lang: ko
category: formal
description: KCC150, KCCq28, KCC940 -- Korean Contemporary Corpus of Written Sentences
  Total 732 million words (48,878,948 sentences)
license: MIT License
homepage: http://nlp.kookmin.ac.kr/kcc/
version: 1.0.0
num_docs: 46529987
num_docs_before_processing: 48878952
num_segments: 46529987
num_sents: 46529987
num_words: 703222627
size_in_bytes: 7300422420
num_bytes_before_processing: 7707141509
size_in_human_bytes: 6.80 GiB
data_files_modified: '2022-02-23 09:49:42'
info_updated: '2022-02-26 03:06:09'
data_files:
  train: kcc-train.parquet
meta_files: {}
features:
  columns:
    id: id
    text: text
  data:
    id: int
    text: str