Skip to content

The original tooling for the GlotCC/OSCAR corpus rewritten in Rust

License

Notifications You must be signed in to change notification settings

cisnlp/oscar-tools

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

cisnlp/OSCAR-tools

This is a new set of tools to do common tasks on the GlotCC/OSCAR.

The program has a different set of tools for each corpus version:

  • v1: OSCAR 2019-like, text only (.txt files)
  • v2: OSCAR 22.01-like, JSONLines, document-oriented with annotations and line-level identifications

About

The original tooling for the GlotCC/OSCAR corpus rewritten in Rust

Topics

Resources

License

Code of conduct

Stars

Watchers

Forks

Languages

  • Rust 100.0%