Skip to content

John Snow Labs Spark-NLP 1.8.4: Chunk annotators match content by sentence, sentences include id

Compare
Choose a tag to compare
@saif-ellafi saif-ellafi released this 31 Mar 08:09

This release is meant to push downstream a few improvements from 2.0.x to the 1.8.x branch, mostly with the objective of keeping the stable branch line stable, and solving a few serious issues that were pending. This makes 1.8.4 an ideal version for stable deployments.


Enhancements

  • CHUNK type annotators now match content within sentence bounds, improves accuracy
  • Improved CHUNK type annotators to include sentence index information in metadata. May be used to improve matching accuracy.
  • Doc2Chunk annotator now has new params to failOnMissing, lowerCase match or startCol is token indexed
  • SentenceDetector and DeepSentenceDetector now disabled maxLength by default, also works appropriately to split in whitespaces
  • SentenceDetector include in metadata they sentence id