Skip to content

John Snow Labs Spark-NLP 1.4.2: Fixed protocol reading, improved Windows support and more bug fixes

Compare
Choose a tag to compare
@saif-ellafi saif-ellafi released this 12 Mar 06:23
· 8176 commits to master since this release

Overview

This release does not include any new improvements or features, but is instead focused on fixing bugs and consolidating the 1.4.0 release. Among the bug fixes, we improved Windows support across the library by fixing a few End of Line character issues. We also fixed an issue affecting word embeddings and some annotators, which prevented reading from external sources located in different storage types, such as S3 or HDFS. Finally, this release reorganizes Model Downloader content and functions in order to have a more consistent API.


Bugfixes

  • Filesystem protocols now properly read across the library, fixed use case for S3:// protocol (thanks @avenka11)
  • Library now works properly in Windows environments
  • PySpark annotator param getters now work properly when retrieving default values
  • Fixed stemmer serialization due to misspelled param name
  • Fixed Tokenizer infixPattern param name to infixPatterns, leading to broken pyspark serialization of such param
  • Added missing addInfixPattern() function to PySpark, to allow adding patterns to current value
  • Model Downloader clearCache now properly removes both .zip files and extracted content
  • Model Downloader is now capable of reading all types of models properly
  • Added missing clearCache function into PySpark

Developer API

  • Function names in model downloader code has been refactored consistently

Other

  • RocksDB rolled back to previous version to support Windows
  • NerCRF unittest modified to reduce time to test
  • Removed training scripts from repository
  • Updated build spark and scala version