Skip to content
This repository has been archived by the owner on May 9, 2024. It is now read-only.

Latest commit

 

History

History
52 lines (50 loc) · 1.72 KB

bigpatent.md

File metadata and controls

52 lines (50 loc) · 1.72 KB

BigPatent - U.S. Patent Documents

Sample

name: bigpatent
fullname: BigPatent - U.S. Patent Documents
lang: en
category: formal
description: 'BIGPATENT, consisting of 1.3 million records of U.S. patent documents
  along with human written abstractive summaries. Each US patent application is filed
  under a Cooperative Patent Classification (CPC) code. There are nine such classification
  categories: A (Human Necessities), B (Performing Operations; Transporting), C (Chemistry;
  Metallurgy), D (Textiles; Paper), E (Fixed Constructions), F (Mechanical Engineering;
  Lightning; Heating; Weapons; Blasting), G (Physics), H (Electricity), and Y (General
  tagging of new or cross-sectional technology)'
license: 35 USC 2
homepage: https://evasharma.github.io/bigpatent/
version: 1.0.0
num_docs: 1244053
num_docs_before_processing: 1341362
num_segments: 2488106
num_sents: 2488106
num_words: 4613882925
size_in_bytes: 24120599512
num_bytes_before_processing: 25941858276
size_in_human_bytes: 22.46 GiB
data_files_modified: '2022-02-22 01:00:10'
meta_files_modified: '2022-02-22 00:58:25'
info_updated: '2022-02-26 03:06:08'
data_files:
  train: bigpatent-train.parquet
  test: bigpatent-test.parquet
  val: bigpatent-val.parquet
meta_files:
  train: meta-bigpatent-train.parquet
  test: meta-bigpatent-test.parquet
  val: meta-bigpatent-val.parquet
features:
  columns:
    id: id
    text: text
  data:
    id: int
    text: str
  meta:
    id: int
    publication_number: str