name: pmc_comm
fullname: PubMed Central Open Access Corpus - Commercial Use
lang: en
category: academic
description: PMC OA Subset - Commercial Use
license: CC0, CC BY, CC BY-SA, and CC BY-ND
homepage: https://www.ncbi.nlm.nih.gov
version: 1.0.0
num_docs: 51276102
num_docs_before_processing: 51584182
num_segments: 51276102
num_sents: 297884818
num_words: 7365607900
size_in_bytes: 48595041150
num_bytes_before_processing: 48647910919
size_in_human_bytes: 45.26 GiB
data_files_modified: '2022-02-21 09:23:30'
meta_files_modified: '2022-01-13 09:50:02'
info_updated: '2022-02-26 03:06:09'
data_files:
train: pmc_comm-train.parquet
meta_files:
train: meta-pmc_comm-train.parquet
features:
columns:
id: id
text: text
data:
id: int
text: str
meta:
id: int
section: str
filename: str
This repository has been archived by the owner on May 9, 2024. It is now read-only.