diff --git a/CHANGELOG b/CHANGELOG index 8019da238a5cea..0d7d77f262619b 100644 --- a/CHANGELOG +++ b/CHANGELOG @@ -1,3 +1,18 @@ +======== +2.0.3 +======== +--------------- +Overview +--------------- +Short after 2.0.2, a hotfix release was made to address two bugs that prevented users from using pretrained tensorflow models in clusters. +Please read release notes for 2.0.2 to catch up! + +--------------- +Bugfixes +--------------- +* Fixed logger serializable, causing issues in executors to serialize TensorflowWrapper +* Fixed contrib loading in cluster, when retrieving a Tensorflow session + ======== 2.0.2 ======== diff --git a/README.md b/README.md index 90f6e7194e1c33..40fa52f80d7dcb 100644 --- a/README.md +++ b/README.md @@ -43,14 +43,14 @@ Take a look at our official spark-nlp page: http://nlp.johnsnowlabs.com/ for use ## Apache Spark Support -Spark-NLP *2.0.2* has been built on top of Apache Spark 2.4.0 +Spark-NLP *2.0.3* has been built on top of Apache Spark 2.4.0 Note that Spark is not retrocompatible with Spark 2.3.x, so models and environments might not work. If you are still stuck on Spark 2.3.x feel free to use [this assembly jar](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/spark-2.3.2-nlp-assembly-1.8.0.jar) instead. Support is limited. For OCR module, [this](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/spark-2.3.2-nlp-ocr-assembly-1.8.0.jar) is for spark `2.3.x`. -| Spark NLP | Spark 2.0.2 / Spark 2.3.x | Spark 2.4 | +| Spark NLP | Spark 2.0.3 / Spark 2.3.x | Spark 2.4 | |-------------|-------------------------------------|--------------| | 2.x.x |NO |YES | | 1.8.x |Partially |YES | @@ -68,18 +68,18 @@ This library has been uploaded to the [spark-packages repository](https://spark- Benefit of spark-packages is that makes it available for both Scala-Java and Python -To use the most recent version just add the `--packages JohnSnowLabs:spark-nlp:2.0.2` to you spark command +To use the most recent version just add the `--packages JohnSnowLabs:spark-nlp:2.0.3` to you spark command ```sh -spark-shell --packages JohnSnowLabs:spark-nlp:2.0.2 +spark-shell --packages JohnSnowLabs:spark-nlp:2.0.3 ``` ```sh -pyspark --packages JohnSnowLabs:spark-nlp:2.0.2 +pyspark --packages JohnSnowLabs:spark-nlp:2.0.3 ``` ```sh -spark-submit --packages JohnSnowLabs:spark-nlp:2.0.2 +spark-submit --packages JohnSnowLabs:spark-nlp:2.0.3 ``` This can also be used to create a SparkSession manually by using the `spark.jars.packages` option in both Python and Scala @@ -147,7 +147,7 @@ Our package is deployed to maven central. In order to add this package as a depe com.johnsnowlabs.nlp spark-nlp_2.11 - 2.0.2 + 2.0.3 ``` @@ -158,7 +158,7 @@ and com.johnsnowlabs.nlp spark-nlp-ocr_2.11 - 2.0.2 + 2.0.3 ``` @@ -166,14 +166,14 @@ and ```sbtshell // https://mvnrepository.com/artifact/com.johnsnowlabs.nlp/spark-nlp -libraryDependencies += "com.johnsnowlabs.nlp" %% "spark-nlp" % "2.0.2" +libraryDependencies += "com.johnsnowlabs.nlp" %% "spark-nlp" % "2.0.3" ``` and ```sbtshell // https://mvnrepository.com/artifact/com.johnsnowlabs.nlp/spark-nlp-ocr -libraryDependencies += "com.johnsnowlabs.nlp" %% "spark-nlp-ocr" % "2.0.2" +libraryDependencies += "com.johnsnowlabs.nlp" %% "spark-nlp-ocr" % "2.0.3" ``` Maven Central: [https://mvnrepository.com/artifact/com.johnsnowlabs.nlp](https://mvnrepository.com/artifact/com.johnsnowlabs.nlp) @@ -187,7 +187,7 @@ Maven Central: [https://mvnrepository.com/artifact/com.johnsnowlabs.nlp](https:/ If you installed pyspark through pip, you can install `spark-nlp` through pip as well. ```bash -pip install spark-nlp==2.0.2 +pip install spark-nlp==2.0.3 ``` PyPI [spark-nlp package](https://pypi.org/project/spark-nlp/) @@ -210,7 +210,7 @@ spark = SparkSession.builder \ .master("local[4]")\ .config("spark.driver.memory","4G")\ .config("spark.driver.maxResultSize", "2G") \ - .config("spark.jars.packages", "JohnSnowLabs:spark-nlp:2.0.2")\ + .config("spark.jars.packages", "JohnSnowLabs:spark-nlp:2.0.3")\ .config("spark.kryoserializer.buffer.max", "500m")\ .getOrCreate() ``` @@ -224,7 +224,7 @@ Use either one of the following options * Add the following Maven Coordinates to the interpreter's library list ```bash -com.johnsnowlabs.nlp:spark-nlp_2.11:2.0.2 +com.johnsnowlabs.nlp:spark-nlp_2.11:2.0.3 ``` * Add path to pre-built jar from [here](#pre-compiled-spark-nlp-and-spark-nlp-ocr) in the interpreter's library list making sure the jar is available to driver path @@ -234,7 +234,7 @@ com.johnsnowlabs.nlp:spark-nlp_2.11:2.0.2 Apart from previous step, install python module through pip ```bash -pip install spark-nlp==2.0.2 +pip install spark-nlp==2.0.3 ``` Or you can install `spark-nlp` from inside Zeppelin by using Conda: @@ -260,7 +260,7 @@ export PYSPARK_PYTHON=python3 export PYSPARK_DRIVER_PYTHON=jupyter export PYSPARK_DRIVER_PYTHON_OPTS=notebook -pyspark --packages JohnSnowLabs:spark-nlp:2.0.2 +pyspark --packages JohnSnowLabs:spark-nlp:2.0.3 ``` Alternatively, you can mix in using `--jars` option for pyspark + `pip install spark-nlp` diff --git a/build.sbt b/build.sbt index 6a9e1a3579daa6..b6650a339ac06d 100644 --- a/build.sbt +++ b/build.sbt @@ -16,7 +16,7 @@ if(is_gpu.equals("false")){ organization:= "com.johnsnowlabs.nlp" -version := "2.0.2" +version := "2.0.3" scalaVersion in ThisBuild := scalaVer @@ -178,7 +178,7 @@ assemblyMergeStrategy in assembly := { lazy val ocr = (project in file("ocr")) .settings( name := "spark-nlp-ocr", - version := "2.0.2", + version := "2.0.3", test in assembly := {}, diff --git a/docs/quickstart.html b/docs/quickstart.html index bbe6096fc9f785..9df796b64a1f3f 100644 --- a/docs/quickstart.html +++ b/docs/quickstart.html @@ -112,14 +112,14 @@

Requirements & Setup

To start using the library, execute any of the following lines depending on your desired use case:

-
spark-shell --packages JohnSnowLabs:spark-nlp:2.0.2
-pyspark --packages JohnSnowLabs:spark-nlp:2.0.2
-spark-submit --packages JohnSnowLabs:spark-nlp:2.0.2
+                                
spark-shell --packages JohnSnowLabs:spark-nlp:2.0.3
+pyspark --packages JohnSnowLabs:spark-nlp:2.0.3
+spark-submit --packages JohnSnowLabs:spark-nlp:2.0.3
 

Straight forward Python on jupyter notebook

Use pip to install (after you pip installed numpy and pyspark)

-
pip install spark-nlp==2.0.2
+                                
pip install spark-nlp==2.0.3
 jupyter notebook

The easiest way to get started, is to run the following code:

import sparknlp
@@ -131,21 +131,21 @@ 

Straight forward Python on jupyter notebook

.appName('OCR Eval') \ .config("spark.driver.memory", "6g") \ .config("spark.executor.memory", "6g") \ - .config("spark.jars.packages", "JohnSnowLabs:spark-nlp:2.0.2") \ + .config("spark.jars.packages", "JohnSnowLabs:spark-nlp:2.0.3") \ .getOrCreate()

Databricks cloud cluster & Apache Zeppelin

Add the following maven coordinates in the dependency configuration page:

-
com.johnsnowlabs.nlp:spark-nlp_2.11:2.0.2
+
com.johnsnowlabs.nlp:spark-nlp_2.11:2.0.3

For Python in Apache Zeppelin you may need to setup SPARK_SUBMIT_OPTIONS utilizing --packages instruction shown above like this

-
export SPARK_SUBMIT_OPTIONS="--packages JohnSnowLabs:spark-nlp:2.0.2"
+
export SPARK_SUBMIT_OPTIONS="--packages JohnSnowLabs:spark-nlp:2.0.3"

Python Jupyter Notebook with PySpark

export SPARK_HOME=/path/to/your/spark/folder
 export PYSPARK_DRIVER_PYTHON=jupyter
 export PYSPARK_DRIVER_PYTHON_OPTS=notebook
 
-pyspark --packages JohnSnowLabs:spark-nlp:2.0.2
+pyspark --packages JohnSnowLabs:spark-nlp:2.0.3

S3 based standalone cluster (No Hadoop)

If your distributed storage is S3 and you don't have a standard hadoop configuration (i.e. fs.defaultFS) @@ -442,7 +442,7 @@

Utilizing Spark NLP OCR Module

Spark NLP OCR Module is not included within Spark NLP. It is not an annotator and not an extension to Spark ML. You can include it with the following coordinates for Maven: -

com.johnsnowlabs.nlp:spark-nlp-ocr_2.11:2.0.2
+
com.johnsnowlabs.nlp:spark-nlp-ocr_2.11:2.0.3

Creating Spark datasets from PDF (To be used with Spark NLP)

diff --git a/python/setup.py b/python/setup.py index 664b00409988b0..cbaff2c2d28b92 100644 --- a/python/setup.py +++ b/python/setup.py @@ -40,7 +40,7 @@ # For a discussion on single-sourcing the version across setup.py and the # project code, see # https://packaging.python.org/en/latest/single_source_version.html - version='2.0.2', # Required + version='2.0.3', # Required # This is a one-line description or tagline of what your project does. This # corresponds to the "Summary" metadata field: diff --git a/python/sparknlp/__init__.py b/python/sparknlp/__init__.py index 29a949b3d06eb8..2411335ac6371e 100644 --- a/python/sparknlp/__init__.py +++ b/python/sparknlp/__init__.py @@ -36,8 +36,8 @@ def start(include_ocr=False): .config("spark.serializer", "org.apache.spark.serializer.KryoSerializer") if include_ocr: - builder.config("spark.jars.packages", "JohnSnowLabs:spark-nlp:2.0.2,com.johnsnowlabs.nlp:spark-nlp-ocr_2.11:2.0.2") + builder.config("spark.jars.packages", "JohnSnowLabs:spark-nlp:2.0.3,com.johnsnowlabs.nlp:spark-nlp-ocr_2.11:2.0.3") else: - builder.config("spark.jars.packages", "JohnSnowLabs:spark-nlp:2.0.2") \ + builder.config("spark.jars.packages", "JohnSnowLabs:spark-nlp:2.0.3") \ return builder.getOrCreate() diff --git a/src/main/scala/com/johnsnowlabs/nlp/SparkNLP.scala b/src/main/scala/com/johnsnowlabs/nlp/SparkNLP.scala index d8663b42346551..3cf12f78328b89 100644 --- a/src/main/scala/com/johnsnowlabs/nlp/SparkNLP.scala +++ b/src/main/scala/com/johnsnowlabs/nlp/SparkNLP.scala @@ -12,9 +12,9 @@ object SparkNLP { .config("spark.serializer", "org.apache.spark.serializer.KryoSerializer") if (includeOcr) { - build.config("spark.jars.packages", "JohnSnowLabs:spark-nlp:2.0.2,com.johnsnowlabs.nlp:spark-nlp-ocr_2.11:2.0.2") + build.config("spark.jars.packages", "JohnSnowLabs:spark-nlp:2.0.3,com.johnsnowlabs.nlp:spark-nlp-ocr_2.11:2.0.3") } else { - build.config("spark.jars.packages", "JohnSnowLabs:spark-nlp:2.0.2") + build.config("spark.jars.packages", "JohnSnowLabs:spark-nlp:2.0.3") } build.getOrCreate() diff --git a/src/main/scala/com/johnsnowlabs/util/Build.scala b/src/main/scala/com/johnsnowlabs/util/Build.scala index e236e79daf1fa7..203ad9a71ce922 100644 --- a/src/main/scala/com/johnsnowlabs/util/Build.scala +++ b/src/main/scala/com/johnsnowlabs/util/Build.scala @@ -11,6 +11,6 @@ object Build { if (version != null && version.nonEmpty) version else - "2.0.2" + "2.0.3" } } \ No newline at end of file