JohnSnowLabs · maziyarpanahi · Feb 27, 2024 · Jan 31, 2024 · Feb 6, 2024 · Feb 6, 2024
diff --git a/CHANGELOG b/CHANGELOG
@@ -1,3 +1,36 @@
+========
+5.3.0
+========
+----------------
+New Features & Enhancements
+----------------
+* **NEW:** Introducing Llama-2 and all the models fine-tuned based on this architecutre. This our very first CasualLM annotator in ONNX and it comes with support for quantization in INT4 and INT8 for CPUs.
+* **NEW:** Introducing `MPNetForSequenceClassification` annotator for sequence classification tasks. This annotator is based on the MPNet architecture and is designed to classify sequences of text into a set of predefined classes.
+* **NEW:** Introducing `MPNetForQuestionAnswering` annotator for question answering tasks. This annotator is based on the MPNet architecture and is designed to answer questions based on a given context.
+* **NEW:** Introducing `M2M100` state-of-the-art multilingual translation. M2M100 is a multilingual encoder-decoder (seq-to-seq) model trained for Many-to-Many multilingual translation. The model can directly translate between the 9,900 directions of 100 languages.
+* **NEW:** Introducing a new `DeBertaForZeroShotClassification` annotator for zero-shot classification tasks. This annotator is based on the DeBERTa architecture and is designed to classify sequences of text into a set of predefined classes.
+* **NEW:** Implement retreival feature in our `DocumentSimilarity`annotator. The new DocumentSimilarity ranker is a powerful tool for ranking documents based on their similarity to a given query document. It is designed to be efficient and scalable, making it ideal for a variety of RAG applications/
+* Add ONNNX support for `BertForZeroShotClassification` annotator.
+* Add support for in-memory use of `WordEmbeddingsModel` annotator in server-less cluster. We initially introduced in-memory feature for this annotator for users inside Kubernetes cluster without any `HDFS`, however, today it runs without any issue `locally`, Google `Colab`, `Kaggle`, `Databricks`, `AWS EMR`, `GCP`, and `AWS Glue`.
+* New Whisper Large and Distil models.
+* Update ONNX Runtime to 1.17.0
+* Support new Databricks Runtimes of 14.2, 14.3, 14.2 ML, 14.3 ML, 14.2 GPU, 14.3 GPU
+* Support new EMR 6.15.0 and 7.0.0 versions
+* Add nobteook to fine-tune a BERT for Sentence Embeddings in Hugging Face and import it to Spark NLP
+* Add notebook to import BERT for Zero-Shot classification from Hugging Face
+* Add notebook to import DeBERTa for Zero-Shot classification from Hugging Face
+* Update EntityRuler documentation
+* Improve SBT project and resolve warnings (almost!)
+
+----------------
+Bug Fixes
+----------------
+* Fix Spark NLP Configuration's to set `cluster_tmp_dir` on Databricks' DBFS via `spark.jsl.settings.storage.cluster_tmp_dir` https://github.com/JohnSnowLabs/spark-nlp/issues/14129
+* Fix score calculation in `RoBertaForQuestionAnswering` annotator https://github.com/JohnSnowLabs/spark-nlp/pull/14147
+* Fix optional input col validations https://github.com/JohnSnowLabs/spark-nlp/pull/14153
+* Fix notebooks for importing DeBERTa classifiers https://github.com/JohnSnowLabs/spark-nlp/pull/14154
+* Fix GPT2 deserialization over the cluster (Databricks) https://github.com/JohnSnowLabs/spark-nlp/pull/14177
+
 ========
 5.2.3
 ========

diff --git a/README.md b/README.md
diff --git a/build.sbt b/build.sbt
@@ -6,7 +6,7 @@ name := getPackageName(is_silicon, is_gpu, is_aarch64)
 
 organization := "com.johnsnowlabs.nlp"
 
-version := "5.2.3"
+version := "5.3.0"
 
 (ThisBuild / scalaVersion) := scalaVer
 
@@ -144,13 +144,17 @@ lazy val utilDependencies = Seq(
     exclude ("com.fasterxml.jackson.core", "jackson-annotations")
     exclude ("com.fasterxml.jackson.core", "jackson-databind")
     exclude ("com.fasterxml.jackson.core", "jackson-core")
+    exclude ("com.fasterxml.jackson.dataformat", "jackson-dataformat-cbor")
     exclude ("commons-configuration", "commons-configuration"),
   liblevenshtein
     exclude ("com.google.guava", "guava")
     exclude ("org.apache.commons", "commons-lang3")
     exclude ("com.google.code.findbugs", "annotations")
     exclude ("org.slf4j", "slf4j-api"),
-  gcpStorage,
+  gcpStorage
+    exclude ("com.fasterxml.jackson.core", "jackson-core")
+    exclude ("com.fasterxml.jackson.dataformat", "jackson-dataformat-cbor")
+  ,
   greex,
   azureIdentity,
   azureStorage)

diff --git a/conda/meta.yaml b/conda/meta.yaml
@@ -1,13 +1,13 @@
 {% set name = "spark-nlp" %}
-{% set version = "5.2.3" %}
+{% set version = "5.3.0" %}
 
 package:
   name: {{ name|lower }}
   version: {{ version }}
 
 source:
   url: https://pypi.io/packages/source/{{ name[0] }}/{{ name }}/spark-nlp-{{ version }}.tar.gz
-  sha256: bdad9912c6f4fa36aef2169a4d7e4c33cd32d79d6ff0c628c04876d9354252e9
+  sha256: 2fa182f1850026fa7f9d5fbb7b92939856f78ddcc2cb2d87d56af5e2e90b97f0
 
 build:
   noarch: python