Release John Snow Labs Spark-NLP 1.8.1: ML SentenceDetector, improved ContextSpellChecker and bugfixes · JohnSnowLabs/spark-nlp

Overview

This hotfix version of Spark-NLP improves framework support by adding Maven coordinates for OCR and allowing S3 retrieval of files.
We also included code for generating Graphs for NerDL and also for creating your own metadata files for a private model downloader.
As new features, we are including a new experimental machine learning based sentence detector, which uses NER for bounds detections.
Aside from this, we are including a few bug fixes and OCR improvements. Enjoy! and thanks again for community contributions!

New Features

New DeepSentenceDetector annotator takes Spark-NLP's NER Deep Learning models as a base to improve sentence detection

Enhancements

Improved accuracy of ContextSpellChecker by enabling re-ranking of candidate words according to a weighted levenshtein distance
OCR process now defaults to split content in rows whether paragraphs or pages are identified for improved parallelism. Maybe turned off

Examples and use cases

Added Scala examples for Sentiment analysis and Lemmatizer in Italian (Thanks Vincenzo Gaudenzi from DXC.technology for dataset and model contribution!!!)

Bugfixes

Fixed a bug in Norvig and Symmetric SpellCheckers where the pattern parameter was not provided properly in Scala side (Thanks @johnmccain for reporting!)

Framework

Added hadoop-aws dependency for remote download capabilities (e.g. word embeddings sets)

Other

Metadata files for pretrained model downloads code is now included. This may be useful if anyone wants to set up their own private local model downloader service
NerDL Graphs generation code is now included in the library. This allows the usage of custom word embedding dimensions and feature counts.

Special mentions

Vincenzo Gaudenzi (DXC.technology) for contributing Italian datasets and models. @maziyarpanahi for creating examples with them.
@correlator from Deep6.ai for contributing feedback in slack and features feedback in general
@johnmccain for reporting bugs in spell checker
@rohit-nlp for delivering maven coordinates for OCR
@haimco10 for contributing a sentence detector improvement with apostrophe's use case. Not merged due specific issues involved.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

John Snow Labs Spark-NLP 1.8.1: ML SentenceDetector, improved ContextSpellChecker and bugfixes