Skip to content

v1.2

Compare
Choose a tag to compare
@cpalenmichel cpalenmichel released this 12 May 15:32
· 30 commits to main since this release
91162df
  • Added segmentation for all languages except: ben, bod, kat, kur
  • Better publication date coverage
  • Remove zero-width space in segmentation and tokenization output for Thai, Lao, Khmer (zero-width space is kept in the original text in paragraphs
  • Release as described in camera-ready LREC 2022 paper