Skip to content

Releases: vngrs-ai/vnlp

Python3.10 support and small fixes

02 Mar 13:55
Compare
Choose a tag to compare
  • cyhunspell is replaced by spylls. Consequently, VNLP now supports Python 3.10. However, Python3.6 support is dropped now.
  • Newer versions of Tensorflow does not rely on Keras-Preprocessing anymore. This had caused issues since our tokenizers were saved via pickle. Instead, they are stored as json now, and are loaded in a tf version agnostic way.
  • Tensorflow warnings are suppressed.
  • Readthedocs build and files are updated due to tensorboard, protobuf and grpcio dependency issues.

SPUContext Models

15 Jun 17:53
Compare
Choose a tag to compare
  • SentencePiece Unigram Context (SPUContext) models are added for Named Entity Recognition, Dependency Parsing, Part of Speech Tagging and Sentiment Analysis. These are the default models now.
  • SPUContext models are even more compact, up to 4x faster and perform significantly better. See metrics table on the main page for comparison.
  • SPUContext models use SentencePiece Unigram tokenization.
  • Wheel file is 80% smaller now, and each model downloads its weights when it is initialized for the first time.
  • In order to evaluate a DL based model, use "evaluate = True" flag while initializing, e.g., NamedEntityRecognizer(model = 'CharNER', evaluate = True). This will load the weights that are NOT trained with test sets.
  • Former Python API has become a generic user API, creating an abstraction for the implemented methods. Desired model can be initialized using the "model" argument, e.g., NamedEntityRecognizer(model = 'CharNER').