Skip to content

Commit

Permalink
Merge pull request #496 from JohnSnowLabs/203-release-candidate
Browse files Browse the repository at this point in the history
Release Candidate 2.0.3
  • Loading branch information
saif-ellafi authored Apr 29, 2019
2 parents 8a09ee2 + e1c5c4a commit c41f341
Show file tree
Hide file tree
Showing 8 changed files with 47 additions and 32 deletions.
15 changes: 15 additions & 0 deletions CHANGELOG
Original file line number Diff line number Diff line change
@@ -1,3 +1,18 @@
========
2.0.3
========
---------------
Overview
---------------
Short after 2.0.2, a hotfix release was made to address two bugs that prevented users from using pretrained tensorflow models in clusters.
Please read release notes for 2.0.2 to catch up!

---------------
Bugfixes
---------------
* Fixed logger serializable, causing issues in executors to serialize TensorflowWrapper
* Fixed contrib loading in cluster, when retrieving a Tensorflow session

========
2.0.2
========
Expand Down
30 changes: 15 additions & 15 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -43,14 +43,14 @@ Take a look at our official spark-nlp page: http://nlp.johnsnowlabs.com/ for use

## Apache Spark Support

Spark-NLP *2.0.2* has been built on top of Apache Spark 2.4.0
Spark-NLP *2.0.3* has been built on top of Apache Spark 2.4.0

Note that Spark is not retrocompatible with Spark 2.3.x, so models and environments might not work.

If you are still stuck on Spark 2.3.x feel free to use [this assembly jar](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/spark-2.3.2-nlp-assembly-1.8.0.jar) instead. Support is limited.
For OCR module, [this](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/spark-2.3.2-nlp-ocr-assembly-1.8.0.jar) is for spark `2.3.x`.

| Spark NLP | Spark 2.0.2 / Spark 2.3.x | Spark 2.4 |
| Spark NLP | Spark 2.0.3 / Spark 2.3.x | Spark 2.4 |
|-------------|-------------------------------------|--------------|
| 2.x.x |NO |YES |
| 1.8.x |Partially |YES |
Expand All @@ -68,18 +68,18 @@ This library has been uploaded to the [spark-packages repository](https://spark-

Benefit of spark-packages is that makes it available for both Scala-Java and Python

To use the most recent version just add the `--packages JohnSnowLabs:spark-nlp:2.0.2` to you spark command
To use the most recent version just add the `--packages JohnSnowLabs:spark-nlp:2.0.3` to you spark command

```sh
spark-shell --packages JohnSnowLabs:spark-nlp:2.0.2
spark-shell --packages JohnSnowLabs:spark-nlp:2.0.3
```

```sh
pyspark --packages JohnSnowLabs:spark-nlp:2.0.2
pyspark --packages JohnSnowLabs:spark-nlp:2.0.3
```

```sh
spark-submit --packages JohnSnowLabs:spark-nlp:2.0.2
spark-submit --packages JohnSnowLabs:spark-nlp:2.0.3
```

This can also be used to create a SparkSession manually by using the `spark.jars.packages` option in both Python and Scala
Expand Down Expand Up @@ -147,7 +147,7 @@ Our package is deployed to maven central. In order to add this package as a depe
<dependency>
<groupId>com.johnsnowlabs.nlp</groupId>
<artifactId>spark-nlp_2.11</artifactId>
<version>2.0.2</version>
<version>2.0.3</version>
</dependency>
```

Expand All @@ -158,22 +158,22 @@ and
<dependency>
<groupId>com.johnsnowlabs.nlp</groupId>
<artifactId>spark-nlp-ocr_2.11</artifactId>
<version>2.0.2</version>
<version>2.0.3</version>
</dependency>
```

### SBT

```sbtshell
// https://mvnrepository.com/artifact/com.johnsnowlabs.nlp/spark-nlp
libraryDependencies += "com.johnsnowlabs.nlp" %% "spark-nlp" % "2.0.2"
libraryDependencies += "com.johnsnowlabs.nlp" %% "spark-nlp" % "2.0.3"
```

and

```sbtshell
// https://mvnrepository.com/artifact/com.johnsnowlabs.nlp/spark-nlp-ocr
libraryDependencies += "com.johnsnowlabs.nlp" %% "spark-nlp-ocr" % "2.0.2"
libraryDependencies += "com.johnsnowlabs.nlp" %% "spark-nlp-ocr" % "2.0.3"
```

Maven Central: [https://mvnrepository.com/artifact/com.johnsnowlabs.nlp](https://mvnrepository.com/artifact/com.johnsnowlabs.nlp)
Expand All @@ -187,7 +187,7 @@ Maven Central: [https://mvnrepository.com/artifact/com.johnsnowlabs.nlp](https:/
If you installed pyspark through pip, you can install `spark-nlp` through pip as well.

```bash
pip install spark-nlp==2.0.2
pip install spark-nlp==2.0.3
```

PyPI [spark-nlp package](https://pypi.org/project/spark-nlp/)
Expand All @@ -210,7 +210,7 @@ spark = SparkSession.builder \
.master("local[4]")\
.config("spark.driver.memory","4G")\
.config("spark.driver.maxResultSize", "2G") \
.config("spark.jars.packages", "JohnSnowLabs:spark-nlp:2.0.2")\
.config("spark.jars.packages", "JohnSnowLabs:spark-nlp:2.0.3")\
.config("spark.kryoserializer.buffer.max", "500m")\
.getOrCreate()
```
Expand All @@ -224,7 +224,7 @@ Use either one of the following options
* Add the following Maven Coordinates to the interpreter's library list

```bash
com.johnsnowlabs.nlp:spark-nlp_2.11:2.0.2
com.johnsnowlabs.nlp:spark-nlp_2.11:2.0.3
```

* Add path to pre-built jar from [here](#pre-compiled-spark-nlp-and-spark-nlp-ocr) in the interpreter's library list making sure the jar is available to driver path
Expand All @@ -234,7 +234,7 @@ com.johnsnowlabs.nlp:spark-nlp_2.11:2.0.2
Apart from previous step, install python module through pip

```bash
pip install spark-nlp==2.0.2
pip install spark-nlp==2.0.3
```

Or you can install `spark-nlp` from inside Zeppelin by using Conda:
Expand All @@ -260,7 +260,7 @@ export PYSPARK_PYTHON=python3
export PYSPARK_DRIVER_PYTHON=jupyter
export PYSPARK_DRIVER_PYTHON_OPTS=notebook

pyspark --packages JohnSnowLabs:spark-nlp:2.0.2
pyspark --packages JohnSnowLabs:spark-nlp:2.0.3
```

Alternatively, you can mix in using `--jars` option for pyspark + `pip install spark-nlp`
Expand Down
4 changes: 2 additions & 2 deletions build.sbt
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ if(is_gpu.equals("false")){

organization:= "com.johnsnowlabs.nlp"

version := "2.0.2"
version := "2.0.3"

scalaVersion in ThisBuild := scalaVer

Expand Down Expand Up @@ -178,7 +178,7 @@ assemblyMergeStrategy in assembly := {
lazy val ocr = (project in file("ocr"))
.settings(
name := "spark-nlp-ocr",
version := "2.0.2",
version := "2.0.3",

test in assembly := {},

Expand Down
18 changes: 9 additions & 9 deletions docs/quickstart.html
Original file line number Diff line number Diff line change
Expand Up @@ -112,14 +112,14 @@ <h2 class="section-title">Requirements & Setup</h2>
To start using the library, execute any of the following lines
depending on your desired use case:
</p>
<pre><code class="language-javascript">spark-shell --packages JohnSnowLabs:spark-nlp:2.0.2
pyspark --packages JohnSnowLabs:spark-nlp:2.0.2
spark-submit --packages JohnSnowLabs:spark-nlp:2.0.2
<pre><code class="language-javascript">spark-shell --packages JohnSnowLabs:spark-nlp:2.0.3
pyspark --packages JohnSnowLabs:spark-nlp:2.0.3
spark-submit --packages JohnSnowLabs:spark-nlp:2.0.3
</code></pre>
<p/>
<h3><b>Straight forward Python on jupyter notebook</b></h3>
<p>Use pip to install (after you pip installed numpy and pyspark)</p>
<pre><code class="language-javascript">pip install spark-nlp==2.0.2
<pre><code class="language-javascript">pip install spark-nlp==2.0.3
jupyter notebook</code></pre>
<p>The easiest way to get started, is to run the following code: </p>
<pre><code class="pytohn">import sparknlp
Expand All @@ -131,21 +131,21 @@ <h3><b>Straight forward Python on jupyter notebook</b></h3>
.appName('OCR Eval') \
.config("spark.driver.memory", "6g") \
.config("spark.executor.memory", "6g") \
.config("spark.jars.packages", "JohnSnowLabs:spark-nlp:2.0.2") \
.config("spark.jars.packages", "JohnSnowLabs:spark-nlp:2.0.3") \
.getOrCreate()</code></pre>
<h3><b>Databricks cloud cluster</b> & <b>Apache Zeppelin</b></h3>
<p>Add the following maven coordinates in the dependency configuration page: </p>
<pre><code class="language-javascript">com.johnsnowlabs.nlp:spark-nlp_2.11:2.0.2</code></pre>
<pre><code class="language-javascript">com.johnsnowlabs.nlp:spark-nlp_2.11:2.0.3</code></pre>
<p>
For Python in <b>Apache Zeppelin</b> you may need to setup <i><b>SPARK_SUBMIT_OPTIONS</b></i> utilizing --packages instruction shown above like this
</p>
<pre><code class="language-javascript">export SPARK_SUBMIT_OPTIONS="--packages JohnSnowLabs:spark-nlp:2.0.2"</code></pre>
<pre><code class="language-javascript">export SPARK_SUBMIT_OPTIONS="--packages JohnSnowLabs:spark-nlp:2.0.3"</code></pre>
<h3><b>Python Jupyter Notebook with PySpark</b></h3>
<pre><code class="language-javascript">export SPARK_HOME=/path/to/your/spark/folder
export PYSPARK_DRIVER_PYTHON=jupyter
export PYSPARK_DRIVER_PYTHON_OPTS=notebook

pyspark --packages JohnSnowLabs:spark-nlp:2.0.2</code></pre>
pyspark --packages JohnSnowLabs:spark-nlp:2.0.3</code></pre>
<h3>S3 based standalone cluster (No Hadoop)</h3>
<p>
If your distributed storage is S3 and you don't have a standard hadoop configuration (i.e. fs.defaultFS)
Expand Down Expand Up @@ -442,7 +442,7 @@ <h2 class="section-title">Utilizing Spark NLP OCR Module</h2>
<p>
Spark NLP OCR Module is not included within Spark NLP. It is not an annotator and not an extension to Spark ML.
You can include it with the following coordinates for Maven:
<pre><code class="java">com.johnsnowlabs.nlp:spark-nlp-ocr_2.11:2.0.2</code></pre>
<pre><code class="java">com.johnsnowlabs.nlp:spark-nlp-ocr_2.11:2.0.3</code></pre>
</p>
<h3 class="block-title">Creating Spark datasets from PDF (To be used with Spark NLP)</h3>
<p>
Expand Down
2 changes: 1 addition & 1 deletion python/setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@
# For a discussion on single-sourcing the version across setup.py and the
# project code, see
# https://packaging.python.org/en/latest/single_source_version.html
version='2.0.2', # Required
version='2.0.3', # Required

# This is a one-line description or tagline of what your project does. This
# corresponds to the "Summary" metadata field:
Expand Down
4 changes: 2 additions & 2 deletions python/sparknlp/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -36,8 +36,8 @@ def start(include_ocr=False):
.config("spark.serializer", "org.apache.spark.serializer.KryoSerializer")

if include_ocr:
builder.config("spark.jars.packages", "JohnSnowLabs:spark-nlp:2.0.2,com.johnsnowlabs.nlp:spark-nlp-ocr_2.11:2.0.2")
builder.config("spark.jars.packages", "JohnSnowLabs:spark-nlp:2.0.3,com.johnsnowlabs.nlp:spark-nlp-ocr_2.11:2.0.3")
else:
builder.config("spark.jars.packages", "JohnSnowLabs:spark-nlp:2.0.2") \
builder.config("spark.jars.packages", "JohnSnowLabs:spark-nlp:2.0.3") \

return builder.getOrCreate()
4 changes: 2 additions & 2 deletions src/main/scala/com/johnsnowlabs/nlp/SparkNLP.scala
Original file line number Diff line number Diff line change
Expand Up @@ -12,9 +12,9 @@ object SparkNLP {
.config("spark.serializer", "org.apache.spark.serializer.KryoSerializer")

if (includeOcr) {
build.config("spark.jars.packages", "JohnSnowLabs:spark-nlp:2.0.2,com.johnsnowlabs.nlp:spark-nlp-ocr_2.11:2.0.2")
build.config("spark.jars.packages", "JohnSnowLabs:spark-nlp:2.0.3,com.johnsnowlabs.nlp:spark-nlp-ocr_2.11:2.0.3")
} else {
build.config("spark.jars.packages", "JohnSnowLabs:spark-nlp:2.0.2")
build.config("spark.jars.packages", "JohnSnowLabs:spark-nlp:2.0.3")
}

build.getOrCreate()
Expand Down
2 changes: 1 addition & 1 deletion src/main/scala/com/johnsnowlabs/util/Build.scala
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,6 @@ object Build {
if (version != null && version.nonEmpty)
version
else
"2.0.2"
"2.0.3"
}
}

0 comments on commit c41f341

Please sign in to comment.