Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update document for pyspark #975

Merged
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 13 additions & 4 deletions python/README.md
Original file line number Diff line number Diff line change
@@ -1,16 +1,25 @@
## TiSpark (version >= 2.0) on PySpark:
**Note: If you are using TiSpark version less than 2.0, please read [this document](./README_spark2.1.md) instead**

pytispark will not be necessary since TiSpark version >= 2.0.
### Usage
There are currently two ways to use TiSpark on Python:

#### Directly via pyspark
This is the simplest way, just a decent Spark environment should be enough.
1. Make sure you have the latest version of [TiSpark](https://github.com/pingcap/tispark) and a `jar` with all TiSpark's dependencies.

2. Remember to add needed configurations listed in [README](../README.md) into your `$SPARK_HOME/conf/spark-defaults.conf`

3. Copy `./resources/session.py` to `$SPARK_HOME/python/pyspark/sql/session.py`
3. For spark-2.3.x please copy `./resources/spark-2.3/session.py` to `$SPARK_HOME/python/pyspark/sql/session.py`. For other Spark version please edit the file `$SPARK_HOME/python/pyspark/sql/session.py` and change it from
```python
jsparkSession = self._jvm.SparkSession(self._jsc.sc())
```

to

```python
jsparkSession = self._jvm.SparkSession.builder().getOrCreate()
```

4. Run this command in your `$SPARK_HOME` directory:
```
Expand All @@ -36,7 +45,7 @@ spark.sql("select count(*) from customer").show()
#### Via spark-submit
This way is useful when you want to execute your own Python scripts.

Because of an open issue **[SPARK-25003]** in Spark 2.3, using spark-submit for python files will only support following api
Because of an open issue **[SPARK-25003]** in Spark-2.3.x and Spark-2.4.x, using spark-submit for python files will only support following api

1. Use ```pip install pytispark``` in your console to install `pytispark`

Expand All @@ -46,7 +55,7 @@ Note that you may need reinstall `pytispark` if you meet `No plan for reation` e
```python
import pytispark.pytispark as pti
from pyspark.sql import SparkSession
spark = SparkSession.getOrCreate()
spark = SparkSession.builder.getOrCreate()
ti = pti.TiContext(spark)

ti.tidbMapDatabase("tpch_test")
Expand Down
18 changes: 0 additions & 18 deletions python/pytispark/__init__.py

This file was deleted.

46 changes: 0 additions & 46 deletions python/pytispark/pytispark.py

This file was deleted.

File renamed without changes.
2 changes: 0 additions & 2 deletions python/setup.cfg

This file was deleted.

27 changes: 0 additions & 27 deletions python/setup.py

This file was deleted.