Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

release/530-release-candidate #14164

Merged
merged 41 commits into from
Feb 27, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
41 commits
Select commit Hold shift + click to select a range
283be9a
fixed all sbt warnings
ahmedlone127 Jan 31, 2024
9377bb3
remove file system url prefix (#14132)
jiamaozheng Feb 6, 2024
db55524
SPARKNLP-942: MPNet Classifiers (#14147)
DevinTDHa Feb 6, 2024
37c4df2
adding import notebook + changing default model + adding onnx support…
ahmedlone127 Feb 6, 2024
54d6455
Sparknlp 876: Introducing LLAMA2 (#14148)
prabod Feb 6, 2024
54d4605
Doc sim rank as retriever (#14149)
wolliq Feb 6, 2024
6566239
812 implement de berta for zero shot classification annotator (#14151)
ahmedlone127 Feb 6, 2024
2e8410a
Add notebook for fine tuning sbert (#14152)
DevinTDHa Feb 6, 2024
c97e877
[SPARKNLP-986] Fixing optional input col validations (#14153)
danilojsl Feb 6, 2024
0e01a2c
[SPARKNLP-984] Fixing Deberta notebooks URIs (#14154)
danilojsl Feb 6, 2024
a2cb06b
SparkNLP 933: Introducing M2M100 : multilingual translation model (#1…
prabod Feb 6, 2024
2efa215
SPARKNLP-985: Add flexible naming for onnx_data (#14165)
DevinTDHa Feb 8, 2024
8d66d3b
Add LLAMA2Transformer and M2M100Transformer to annotator
maziyarpanahi Feb 8, 2024
41d2e1b
Add LLAMA2Transformer and M2M100Transformer to ResourceDownloader
maziyarpanahi Feb 8, 2024
bb9f58b
Merge branch 'release/530-release-candidate' of github.com:johnsnowla…
maziyarpanahi Feb 8, 2024
08e9211
bump version to 5.3.0 [skip test]
maziyarpanahi Feb 8, 2024
6010244
SPARKNLP-999: Fix remote model loading for some onnx models
DevinTDHa Feb 10, 2024
0e9b54d
used filesystem to check for the onnx_data file (#14169)
prabod Feb 11, 2024
219fc19
[SPARKNLP-940] Adding changes to correctly copy cluster index storage…
danilojsl Feb 11, 2024
f00f11a
[SPARKNLP-988] Updating EntityRuler documentation (#14168)
danilojsl Feb 11, 2024
1175050
[SPARKNLP-940] Adding changes to support storage temp directory (clus…
danilojsl Feb 14, 2024
b148e79
SPARKNLP-1000: Disable init_all_tables for GPT2 (#14177)
DevinTDHa Feb 19, 2024
3cff1f8
fixes python documentation (#14172)
ahmedlone127 Feb 19, 2024
4e59301
revert MarianTransformer.scala
maziyarpanahi Feb 19, 2024
47ab709
revert HasBatchedAnnotate.scala
maziyarpanahi Feb 19, 2024
e5cfd63
revert Preprocessor.scala
maziyarpanahi Feb 19, 2024
1bf9220
Revert ViTClassifier.scala
maziyarpanahi Feb 19, 2024
eb91fde
disable hard exception
maziyarpanahi Feb 19, 2024
5067417
Merge pull request #14156 from JohnSnowLabs/SPARKNLP-975-Fix-all-the-…
maziyarpanahi Feb 19, 2024
94f6900
Replace hard exception with soft logs (#14179)
maziyarpanahi Feb 19, 2024
59e98b3
move the example from root to examples/ [skip test]
maziyarpanahi Feb 20, 2024
67917f0
Cleanup some code [skip test]
maziyarpanahi Feb 25, 2024
e4f3310
Update onnxruntime to 1.17.0 [skip test]
maziyarpanahi Feb 25, 2024
318c3b2
Fix M2M100 default model's name [skip test]
maziyarpanahi Feb 26, 2024
e38f15e
Update docs [run doc]
maziyarpanahi Feb 26, 2024
bbbddd3
Update Scala and Python APIs
actions-user Feb 26, 2024
71ee817
Fix unit test for DocSim [skip test]
maziyarpanahi Feb 26, 2024
89457f0
Merge branch 'release/530-release-candidate' of github.com:johnsnowla…
maziyarpanahi Feb 26, 2024
e9fdbe6
Fix onnx try/catch for MPNet classifier [ski test]
maziyarpanahi Feb 26, 2024
af1536b
Update CHANGELOG [run doc]
maziyarpanahi Feb 26, 2024
fa2cb23
Publish 5.3.0 on Conda [skip test]
maziyarpanahi Feb 27, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
33 changes: 33 additions & 0 deletions CHANGELOG
Original file line number Diff line number Diff line change
@@ -1,3 +1,36 @@
========
5.3.0
========
----------------
New Features & Enhancements
----------------
* **NEW:** Introducing Llama-2 and all the models fine-tuned based on this architecutre. This our very first CasualLM annotator in ONNX and it comes with support for quantization in INT4 and INT8 for CPUs.
* **NEW:** Introducing `MPNetForSequenceClassification` annotator for sequence classification tasks. This annotator is based on the MPNet architecture and is designed to classify sequences of text into a set of predefined classes.
* **NEW:** Introducing `MPNetForQuestionAnswering` annotator for question answering tasks. This annotator is based on the MPNet architecture and is designed to answer questions based on a given context.
* **NEW:** Introducing `M2M100` state-of-the-art multilingual translation. M2M100 is a multilingual encoder-decoder (seq-to-seq) model trained for Many-to-Many multilingual translation. The model can directly translate between the 9,900 directions of 100 languages.
* **NEW:** Introducing a new `DeBertaForZeroShotClassification` annotator for zero-shot classification tasks. This annotator is based on the DeBERTa architecture and is designed to classify sequences of text into a set of predefined classes.
* **NEW:** Implement retreival feature in our `DocumentSimilarity`annotator. The new DocumentSimilarity ranker is a powerful tool for ranking documents based on their similarity to a given query document. It is designed to be efficient and scalable, making it ideal for a variety of RAG applications/
* Add ONNNX support for `BertForZeroShotClassification` annotator.
* Add support for in-memory use of `WordEmbeddingsModel` annotator in server-less cluster. We initially introduced in-memory feature for this annotator for users inside Kubernetes cluster without any `HDFS`, however, today it runs without any issue `locally`, Google `Colab`, `Kaggle`, `Databricks`, `AWS EMR`, `GCP`, and `AWS Glue`.
* New Whisper Large and Distil models.
* Update ONNX Runtime to 1.17.0
* Support new Databricks Runtimes of 14.2, 14.3, 14.2 ML, 14.3 ML, 14.2 GPU, 14.3 GPU
* Support new EMR 6.15.0 and 7.0.0 versions
* Add nobteook to fine-tune a BERT for Sentence Embeddings in Hugging Face and import it to Spark NLP
* Add notebook to import BERT for Zero-Shot classification from Hugging Face
* Add notebook to import DeBERTa for Zero-Shot classification from Hugging Face
* Update EntityRuler documentation
* Improve SBT project and resolve warnings (almost!)

----------------
Bug Fixes
----------------
* Fix Spark NLP Configuration's to set `cluster_tmp_dir` on Databricks' DBFS via `spark.jsl.settings.storage.cluster_tmp_dir` https://github.com/JohnSnowLabs/spark-nlp/issues/14129
* Fix score calculation in `RoBertaForQuestionAnswering` annotator https://github.com/JohnSnowLabs/spark-nlp/pull/14147
* Fix optional input col validations https://github.com/JohnSnowLabs/spark-nlp/pull/14153
* Fix notebooks for importing DeBERTa classifiers https://github.com/JohnSnowLabs/spark-nlp/pull/14154
* Fix GPT2 deserialization over the cluster (Databricks) https://github.com/JohnSnowLabs/spark-nlp/pull/14177

========
5.2.3
========
Expand Down
141 changes: 72 additions & 69 deletions README.md

Large diffs are not rendered by default.

8 changes: 6 additions & 2 deletions build.sbt
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ name := getPackageName(is_silicon, is_gpu, is_aarch64)

organization := "com.johnsnowlabs.nlp"

version := "5.2.3"
version := "5.3.0"

(ThisBuild / scalaVersion) := scalaVer

Expand Down Expand Up @@ -144,13 +144,17 @@ lazy val utilDependencies = Seq(
exclude ("com.fasterxml.jackson.core", "jackson-annotations")
exclude ("com.fasterxml.jackson.core", "jackson-databind")
exclude ("com.fasterxml.jackson.core", "jackson-core")
exclude ("com.fasterxml.jackson.dataformat", "jackson-dataformat-cbor")
exclude ("commons-configuration", "commons-configuration"),
liblevenshtein
exclude ("com.google.guava", "guava")
exclude ("org.apache.commons", "commons-lang3")
exclude ("com.google.code.findbugs", "annotations")
exclude ("org.slf4j", "slf4j-api"),
gcpStorage,
gcpStorage
exclude ("com.fasterxml.jackson.core", "jackson-core")
exclude ("com.fasterxml.jackson.dataformat", "jackson-dataformat-cbor")
,
greex,
azureIdentity,
azureStorage)
Expand Down
4 changes: 2 additions & 2 deletions conda/meta.yaml
Original file line number Diff line number Diff line change
@@ -1,13 +1,13 @@
{% set name = "spark-nlp" %}
{% set version = "5.2.3" %}
{% set version = "5.3.0" %}

package:
name: {{ name|lower }}
version: {{ version }}

source:
url: https://pypi.io/packages/source/{{ name[0] }}/{{ name }}/spark-nlp-{{ version }}.tar.gz
sha256: bdad9912c6f4fa36aef2169a4d7e4c33cd32d79d6ff0c628c04876d9354252e9
sha256: 2fa182f1850026fa7f9d5fbb7b92939856f78ddcc2cb2d87d56af5e2e90b97f0

build:
noarch: python
Expand Down
Loading
Loading