Skip to content
This repository has been archived by the owner on May 9, 2024. It is now read-only.

Latest commit

 

History

History
43 lines (41 loc) · 1.16 KB

pmc_noncomm.md

File metadata and controls

43 lines (41 loc) · 1.16 KB

PubMed Central Open Access Corpus - Non-Commercial Use

Sample

name: pmc_noncomm
fullname: PubMed Central Open Access Corpus - Non-Commercial Use
lang: en
category: academic
description: PMC OA Subset - Non-Commercial Use Only
license: CC0, CC BY, CC BY-SA, and CC BY-ND
homepage: https://www.ncbi.nlm.nih.gov
version: 1.0.0
num_docs: 14142294
num_docs_before_processing: 14208453
num_segments: 14142294
num_sents: 79748279
num_words: 1923415913
size_in_bytes: 12759929691
num_bytes_before_processing: 12767112097
size_in_human_bytes: 11.88 GiB
data_files_modified: '2022-02-21 06:46:50'
meta_files_modified: '2022-01-13 07:18:51'
info_updated: '2022-02-26 03:06:08'
data_files:
  train: pmc_noncomm-train.parquet
meta_files:
  train: meta-pmc_noncomm-train.parquet
features:
  columns:
    id: id
    text: text
  data:
    id: int
    text: str
  meta:
    id: int
    section: str
    filename: str