Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-32094][PYTHON] Update cloudpickle to v1.5.0 #29114

Closed
wants to merge 1 commit into from

Conversation

HyukjinKwon
Copy link
Member

What changes were proposed in this pull request?

This PR aims to upgrade PySpark's embedded cloudpickle to the latest cloudpickle v1.5.0 (See https://github.com/cloudpipe/cloudpickle/blob/v1.5.0/cloudpickle/cloudpickle.py)

Why are the changes needed?

There are many bug fixes. For example, the bug described in the JIRA:

dill unpickling fails because they define types.ClassType, which is undefined in dill. This results in the following error:

Traceback (most recent call last):
  File "/usr/local/lib/python3.6/site-packages/apache_beam/internal/pickler.py", line 279, in loads
    return dill.loads(s)
  File "/usr/local/lib/python3.6/site-packages/dill/_dill.py", line 317, in loads
    return load(file, ignore)
  File "/usr/local/lib/python3.6/site-packages/dill/_dill.py", line 305, in load
    obj = pik.load()
  File "/usr/local/lib/python3.6/site-packages/dill/_dill.py", line 577, in _load_type
    return _reverse_typemap[name]
KeyError: 'ClassType'

See also cloudpipe/cloudpickle#82. This was fixed for cloudpickle 1.3.0+ (cloudpipe/cloudpickle#337), but PySpark's cloudpickle.py doesn't have this change yet.

More notably, now it supports C pickle implementation with Python 3.8 which hugely improve performance. This is already adopted in another project such as Ray.

Does this PR introduce any user-facing change?

Yes, as described above, the bug fixes. Internally, users also could leverage the fast cloudpickle backed by C pickle.

How was this patch tested?

Jenkins will test it out.

@@ -1,1362 +0,0 @@
"""
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are too many diffs so seems github doesn't know that it was renamed.

from __future__ import absolute_import


from pyspark.cloudpickle.cloudpickle import * # noqa
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything is same as the original cloudpickle except these two line diff. We should add pyspark to import.

@HyukjinKwon
Copy link
Member Author

This PR will be co-authored together with @codesue. Thanks for the initial efforts you made at #28950

@SparkQA

This comment has been minimized.

@HyukjinKwon
Copy link
Member Author

We should test with Python 3.8 in the CI too due to Python 3.8 specific optimization. I would like to block this PR by #29116

dongjoon-hyun pushed a commit that referenced this pull request Jul 15, 2020
…tions

### What changes were proposed in this pull request?

This PR aims to test PySpark with Python 3.8 in Github Actions. In the script side, it is already ready:

https://github.com/apache/spark/blob/4ad9bfd53b84a6d2497668c73af6899bae14c187/python/run-tests.py#L161

This PR includes small related fixes together:

1. Install Python 3.8
2. Only install one Python implementation instead of installing many for SQL and Yarn test cases because they need one Python executable in their test cases that is higher than Python 2.
3. Do not install Python 2 which is not needed anymore after we dropped Python 2 at SPARK-32138
4. Remove a comment about installing PyPy3 on Jenkins - SPARK-32278. It is already installed.

### Why are the changes needed?

Currently, only PyPy3 and Python 3.6 are being tested with PySpark in Github Actions. We should test the latest version of Python as well because some optimizations can be only enabled with Python 3.8+. See also #29114

### Does this PR introduce _any_ user-facing change?

No, dev-only.

### How was this patch tested?

Was not tested. Github Actions build in this PR will test it out.

Closes #29116 from HyukjinKwon/test-python3.8-togehter.

Authored-by: HyukjinKwon <gurwls223@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
@SparkQA

This comment has been minimized.

@SparkQA

This comment has been minimized.

@SparkQA
Copy link

SparkQA commented Jul 15, 2020

Test build #125878 has finished for PR 29114 at commit 465fd8a.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member Author

retest this please

1 similar comment
@HyukjinKwon
Copy link
Member Author

retest this please

@SparkQA
Copy link

SparkQA commented Jul 15, 2020

Test build #125883 has finished for PR 29114 at commit 465fd8a.

  • This patch fails Spark unit tests.
  • This patch does not merge cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member Author

retest this please

Copy link
Member

@viirya viirya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will cloudpickle_fast.py be used as automatically?

@HyukjinKwon
Copy link
Member Author

Yeah, it does with Python 3.8. It was clear before because it was conditionally chosen in __init__.py but it was changed in the latest version at cloudpipe/cloudpickle@938553f#diff-17e83ea5b8e0670ba384e6bc36815316L9

@viirya
Copy link
Member

viirya commented Jul 16, 2020

Oh I see. So you must import from cloudpickle_fast now.

@SparkQA
Copy link

SparkQA commented Jul 16, 2020

Test build #125938 has finished for PR 29114 at commit 77055c6.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 16, 2020

Test build #125945 has finished for PR 29114 at commit 74ba675.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member Author

I just did a very quick test. Seems like if your function to serialize is big, it can benefit best from it up to 1250%:

>>> from pyspark import cloudpickle
>>> import timeit
>>> x = list(range(10000000))
>>> def bigfunc(a):
...     b = x
...     return a
...
>>> timeit.timeit(lambda: spark.range(1).rdd.map(bigfunc).count(), number=1)

Before:

16.589810816000004 (sec)

After:

1.3231475199999991 (sec)

@HyukjinKwon
Copy link
Member Author

@BryanCutler, @viirya, @ueshin can you take a look when you're available?

@@ -229,7 +229,7 @@ BSD 3-Clause
------------

python/lib/py4j-*-src.zip
python/pyspark/cloudpickle.py
python/pyspark/cloudpickle/*.py
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: is it only for cloudpickle.py and cloudpickle_fast.py?

Copy link
Member Author

@HyukjinKwon HyukjinKwon Jul 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It catches the directory python/pyspark/cloudpickle/ :-).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't this use for license declaration?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is not big deal to have BSD 3 for other two files. :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we ported __init__.py and compat.py here too and it includes all.

Looks like their license wasn't changed from the very first release (https://github.com/cloudpipe/cloudpickle/blob/master/LICENSE). I guess it might be fine..

import typing
import warnings

from .compat import pickle
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks different to original change too?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I missed it. :)

Copy link
Member

@BryanCutler BryanCutler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, sounds like a good improvement. It's come up before, but what do others think about moving this to the pyspark/lib folder? Now that we have been using the releases as-is for a while and there are multiple files, it might be better fit there. Could be done as a followup to this too.

@HyukjinKwon
Copy link
Member Author

@BryanCutler, yes, I do think we should do it. I filed a JIRA - SPARK-32340

@HyukjinKwon
Copy link
Member Author

HyukjinKwon commented Jul 17, 2020

Merged to master. Thank you so much guys <3!

@HyukjinKwon
Copy link
Member Author

Ah .. my bad. I added an empty commit to credit together to @codesue as a co-author .. but it was gone during rebasing ..

@HyukjinKwon HyukjinKwon deleted the SPARK-32094 branch July 27, 2020 07:43
HyukjinKwon added a commit to HyukjinKwon/spark that referenced this pull request Aug 18, 2020
…tions

This PR aims to test PySpark with Python 3.8 in Github Actions. In the script side, it is already ready:

https://github.com/apache/spark/blob/4ad9bfd53b84a6d2497668c73af6899bae14c187/python/run-tests.py#L161

This PR includes small related fixes together:

1. Install Python 3.8
2. Only install one Python implementation instead of installing many for SQL and Yarn test cases because they need one Python executable in their test cases that is higher than Python 2.
3. Do not install Python 2 which is not needed anymore after we dropped Python 2 at SPARK-32138
4. Remove a comment about installing PyPy3 on Jenkins - SPARK-32278. It is already installed.

Currently, only PyPy3 and Python 3.6 are being tested with PySpark in Github Actions. We should test the latest version of Python as well because some optimizations can be only enabled with Python 3.8+. See also apache#29114

No, dev-only.

Was not tested. Github Actions build in this PR will test it out.

Closes apache#29116 from HyukjinKwon/test-python3.8-togehter.

Authored-by: HyukjinKwon <gurwls223@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
HyukjinKwon added a commit to HyukjinKwon/spark that referenced this pull request Aug 18, 2020
…tions

This PR aims to test PySpark with Python 3.8 in Github Actions. In the script side, it is already ready:

https://github.com/apache/spark/blob/4ad9bfd53b84a6d2497668c73af6899bae14c187/python/run-tests.py#L161

This PR includes small related fixes together:

1. Install Python 3.8
2. Only install one Python implementation instead of installing many for SQL and Yarn test cases because they need one Python executable in their test cases that is higher than Python 2.
3. Do not install Python 2 which is not needed anymore after we dropped Python 2 at SPARK-32138
4. Remove a comment about installing PyPy3 on Jenkins - SPARK-32278. It is already installed.

Currently, only PyPy3 and Python 3.6 are being tested with PySpark in Github Actions. We should test the latest version of Python as well because some optimizations can be only enabled with Python 3.8+. See also apache#29114

No, dev-only.

Was not tested. Github Actions build in this PR will test it out.

Closes apache#29116 from HyukjinKwon/test-python3.8-togehter.

Authored-by: HyukjinKwon <gurwls223@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
HyukjinKwon added a commit to HyukjinKwon/spark that referenced this pull request Aug 18, 2020
…tions

This PR aims to test PySpark with Python 3.8 in Github Actions. In the script side, it is already ready:

https://github.com/apache/spark/blob/4ad9bfd53b84a6d2497668c73af6899bae14c187/python/run-tests.py#L161

This PR includes small related fixes together:

1. Install Python 3.8
2. Only install one Python implementation instead of installing many for SQL and Yarn test cases because they need one Python executable in their test cases that is higher than Python 2.
3. Do not install Python 2 which is not needed anymore after we dropped Python 2 at SPARK-32138
4. Remove a comment about installing PyPy3 on Jenkins - SPARK-32278. It is already installed.

Currently, only PyPy3 and Python 3.6 are being tested with PySpark in Github Actions. We should test the latest version of Python as well because some optimizations can be only enabled with Python 3.8+. See also apache#29114

No, dev-only.

Was not tested. Github Actions build in this PR will test it out.

Closes apache#29116 from HyukjinKwon/test-python3.8-togehter.

Authored-by: HyukjinKwon <gurwls223@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
HyukjinKwon added a commit to HyukjinKwon/spark that referenced this pull request Aug 18, 2020
…tions

This PR aims to test PySpark with Python 3.8 in Github Actions. In the script side, it is already ready:

https://github.com/apache/spark/blob/4ad9bfd53b84a6d2497668c73af6899bae14c187/python/run-tests.py#L161

This PR includes small related fixes together:

1. Install Python 3.8
2. Only install one Python implementation instead of installing many for SQL and Yarn test cases because they need one Python executable in their test cases that is higher than Python 2.
3. Do not install Python 2 which is not needed anymore after we dropped Python 2 at SPARK-32138
4. Remove a comment about installing PyPy3 on Jenkins - SPARK-32278. It is already installed.

Currently, only PyPy3 and Python 3.6 are being tested with PySpark in Github Actions. We should test the latest version of Python as well because some optimizations can be only enabled with Python 3.8+. See also apache#29114

No, dev-only.

Was not tested. Github Actions build in this PR will test it out.

Closes apache#29116 from HyukjinKwon/test-python3.8-togehter.

Authored-by: HyukjinKwon <gurwls223@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
HyukjinKwon added a commit to HyukjinKwon/spark that referenced this pull request Aug 18, 2020
…tions

This PR aims to test PySpark with Python 3.8 in Github Actions. In the script side, it is already ready:

https://github.com/apache/spark/blob/4ad9bfd53b84a6d2497668c73af6899bae14c187/python/run-tests.py#L161

This PR includes small related fixes together:

1. Install Python 3.8
2. Only install one Python implementation instead of installing many for SQL and Yarn test cases because they need one Python executable in their test cases that is higher than Python 2.
3. Do not install Python 2 which is not needed anymore after we dropped Python 2 at SPARK-32138
4. Remove a comment about installing PyPy3 on Jenkins - SPARK-32278. It is already installed.

Currently, only PyPy3 and Python 3.6 are being tested with PySpark in Github Actions. We should test the latest version of Python as well because some optimizations can be only enabled with Python 3.8+. See also apache#29114

No, dev-only.

Was not tested. Github Actions build in this PR will test it out.

Closes apache#29116 from HyukjinKwon/test-python3.8-togehter.

Authored-by: HyukjinKwon <gurwls223@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
HyukjinKwon added a commit to HyukjinKwon/spark that referenced this pull request Aug 18, 2020
…tions

This PR aims to test PySpark with Python 3.8 in Github Actions. In the script side, it is already ready:

https://github.com/apache/spark/blob/4ad9bfd53b84a6d2497668c73af6899bae14c187/python/run-tests.py#L161

This PR includes small related fixes together:

1. Install Python 3.8
2. Only install one Python implementation instead of installing many for SQL and Yarn test cases because they need one Python executable in their test cases that is higher than Python 2.
3. Do not install Python 2 which is not needed anymore after we dropped Python 2 at SPARK-32138
4. Remove a comment about installing PyPy3 on Jenkins - SPARK-32278. It is already installed.

Currently, only PyPy3 and Python 3.6 are being tested with PySpark in Github Actions. We should test the latest version of Python as well because some optimizations can be only enabled with Python 3.8+. See also apache#29114

No, dev-only.

Was not tested. Github Actions build in this PR will test it out.

Closes apache#29116 from HyukjinKwon/test-python3.8-togehter.

Authored-by: HyukjinKwon <gurwls223@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
HyukjinKwon added a commit to HyukjinKwon/spark that referenced this pull request Aug 18, 2020
…tions

This PR aims to test PySpark with Python 3.8 in Github Actions. In the script side, it is already ready:

https://github.com/apache/spark/blob/4ad9bfd53b84a6d2497668c73af6899bae14c187/python/run-tests.py#L161

This PR includes small related fixes together:

1. Install Python 3.8
2. Only install one Python implementation instead of installing many for SQL and Yarn test cases because they need one Python executable in their test cases that is higher than Python 2.
3. Do not install Python 2 which is not needed anymore after we dropped Python 2 at SPARK-32138
4. Remove a comment about installing PyPy3 on Jenkins - SPARK-32278. It is already installed.

Currently, only PyPy3 and Python 3.6 are being tested with PySpark in Github Actions. We should test the latest version of Python as well because some optimizations can be only enabled with Python 3.8+. See also apache#29114

No, dev-only.

Was not tested. Github Actions build in this PR will test it out.

Closes apache#29116 from HyukjinKwon/test-python3.8-togehter.

Authored-by: HyukjinKwon <gurwls223@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
HyukjinKwon added a commit to HyukjinKwon/spark that referenced this pull request Aug 18, 2020
…tions

This PR aims to test PySpark with Python 3.8 in Github Actions. In the script side, it is already ready:

https://github.com/apache/spark/blob/4ad9bfd53b84a6d2497668c73af6899bae14c187/python/run-tests.py#L161

This PR includes small related fixes together:

1. Install Python 3.8
2. Only install one Python implementation instead of installing many for SQL and Yarn test cases because they need one Python executable in their test cases that is higher than Python 2.
3. Do not install Python 2 which is not needed anymore after we dropped Python 2 at SPARK-32138
4. Remove a comment about installing PyPy3 on Jenkins - SPARK-32278. It is already installed.

Currently, only PyPy3 and Python 3.6 are being tested with PySpark in Github Actions. We should test the latest version of Python as well because some optimizations can be only enabled with Python 3.8+. See also apache#29114

No, dev-only.

Was not tested. Github Actions build in this PR will test it out.

Closes apache#29116 from HyukjinKwon/test-python3.8-togehter.

Authored-by: HyukjinKwon <gurwls223@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
HyukjinKwon added a commit to HyukjinKwon/spark that referenced this pull request Aug 18, 2020
…tions

This PR aims to test PySpark with Python 3.8 in Github Actions. In the script side, it is already ready:

https://github.com/apache/spark/blob/4ad9bfd53b84a6d2497668c73af6899bae14c187/python/run-tests.py#L161

This PR includes small related fixes together:

1. Install Python 3.8
2. Only install one Python implementation instead of installing many for SQL and Yarn test cases because they need one Python executable in their test cases that is higher than Python 2.
3. Do not install Python 2 which is not needed anymore after we dropped Python 2 at SPARK-32138
4. Remove a comment about installing PyPy3 on Jenkins - SPARK-32278. It is already installed.

Currently, only PyPy3 and Python 3.6 are being tested with PySpark in Github Actions. We should test the latest version of Python as well because some optimizations can be only enabled with Python 3.8+. See also apache#29114

No, dev-only.

Was not tested. Github Actions build in this PR will test it out.

Closes apache#29116 from HyukjinKwon/test-python3.8-togehter.

Authored-by: HyukjinKwon <gurwls223@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
HyukjinKwon added a commit to HyukjinKwon/spark that referenced this pull request Aug 18, 2020
…tions

This PR aims to test PySpark with Python 3.8 in Github Actions. In the script side, it is already ready:

https://github.com/apache/spark/blob/4ad9bfd53b84a6d2497668c73af6899bae14c187/python/run-tests.py#L161

This PR includes small related fixes together:

1. Install Python 3.8
2. Only install one Python implementation instead of installing many for SQL and Yarn test cases because they need one Python executable in their test cases that is higher than Python 2.
3. Do not install Python 2 which is not needed anymore after we dropped Python 2 at SPARK-32138
4. Remove a comment about installing PyPy3 on Jenkins - SPARK-32278. It is already installed.

Currently, only PyPy3 and Python 3.6 are being tested with PySpark in Github Actions. We should test the latest version of Python as well because some optimizations can be only enabled with Python 3.8+. See also apache#29114

No, dev-only.

Was not tested. Github Actions build in this PR will test it out.

Closes apache#29116 from HyukjinKwon/test-python3.8-togehter.

Authored-by: HyukjinKwon <gurwls223@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
HyukjinKwon added a commit to HyukjinKwon/spark that referenced this pull request Aug 18, 2020
…tions

This PR aims to test PySpark with Python 3.8 in Github Actions. In the script side, it is already ready:

https://github.com/apache/spark/blob/4ad9bfd53b84a6d2497668c73af6899bae14c187/python/run-tests.py#L161

This PR includes small related fixes together:

1. Install Python 3.8
2. Only install one Python implementation instead of installing many for SQL and Yarn test cases because they need one Python executable in their test cases that is higher than Python 2.
3. Do not install Python 2 which is not needed anymore after we dropped Python 2 at SPARK-32138
4. Remove a comment about installing PyPy3 on Jenkins - SPARK-32278. It is already installed.

Currently, only PyPy3 and Python 3.6 are being tested with PySpark in Github Actions. We should test the latest version of Python as well because some optimizations can be only enabled with Python 3.8+. See also apache#29114

No, dev-only.

Was not tested. Github Actions build in this PR will test it out.

Closes apache#29116 from HyukjinKwon/test-python3.8-togehter.

Authored-by: HyukjinKwon <gurwls223@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
HyukjinKwon added a commit to HyukjinKwon/spark that referenced this pull request Aug 18, 2020
…tions

This PR aims to test PySpark with Python 3.8 in Github Actions. In the script side, it is already ready:

https://github.com/apache/spark/blob/4ad9bfd53b84a6d2497668c73af6899bae14c187/python/run-tests.py#L161

This PR includes small related fixes together:

1. Install Python 3.8
2. Only install one Python implementation instead of installing many for SQL and Yarn test cases because they need one Python executable in their test cases that is higher than Python 2.
3. Do not install Python 2 which is not needed anymore after we dropped Python 2 at SPARK-32138
4. Remove a comment about installing PyPy3 on Jenkins - SPARK-32278. It is already installed.

Currently, only PyPy3 and Python 3.6 are being tested with PySpark in Github Actions. We should test the latest version of Python as well because some optimizations can be only enabled with Python 3.8+. See also apache#29114

No, dev-only.

Was not tested. Github Actions build in this PR will test it out.

Closes apache#29116 from HyukjinKwon/test-python3.8-togehter.

Authored-by: HyukjinKwon <gurwls223@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
HyukjinKwon added a commit to HyukjinKwon/spark that referenced this pull request Aug 19, 2020
…tions

This PR aims to test PySpark with Python 3.8 in Github Actions. In the script side, it is already ready:

https://github.com/apache/spark/blob/4ad9bfd53b84a6d2497668c73af6899bae14c187/python/run-tests.py#L161

This PR includes small related fixes together:

1. Install Python 3.8
2. Only install one Python implementation instead of installing many for SQL and Yarn test cases because they need one Python executable in their test cases that is higher than Python 2.
3. Do not install Python 2 which is not needed anymore after we dropped Python 2 at SPARK-32138
4. Remove a comment about installing PyPy3 on Jenkins - SPARK-32278. It is already installed.

Currently, only PyPy3 and Python 3.6 are being tested with PySpark in Github Actions. We should test the latest version of Python as well because some optimizations can be only enabled with Python 3.8+. See also apache#29114

No, dev-only.

Was not tested. Github Actions build in this PR will test it out.

Closes apache#29116 from HyukjinKwon/test-python3.8-togehter.

Authored-by: HyukjinKwon <gurwls223@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
HyukjinKwon added a commit to HyukjinKwon/spark that referenced this pull request Aug 19, 2020
…tions

This PR aims to test PySpark with Python 3.8 in Github Actions. In the script side, it is already ready:

https://github.com/apache/spark/blob/4ad9bfd53b84a6d2497668c73af6899bae14c187/python/run-tests.py#L161

This PR includes small related fixes together:

1. Install Python 3.8
2. Only install one Python implementation instead of installing many for SQL and Yarn test cases because they need one Python executable in their test cases that is higher than Python 2.
3. Do not install Python 2 which is not needed anymore after we dropped Python 2 at SPARK-32138
4. Remove a comment about installing PyPy3 on Jenkins - SPARK-32278. It is already installed.

Currently, only PyPy3 and Python 3.6 are being tested with PySpark in Github Actions. We should test the latest version of Python as well because some optimizations can be only enabled with Python 3.8+. See also apache#29114

No, dev-only.

Was not tested. Github Actions build in this PR will test it out.

Closes apache#29116 from HyukjinKwon/test-python3.8-togehter.

Authored-by: HyukjinKwon <gurwls223@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
HyukjinKwon added a commit to HyukjinKwon/spark that referenced this pull request Aug 19, 2020
…tions

This PR aims to test PySpark with Python 3.8 in Github Actions. In the script side, it is already ready:

https://github.com/apache/spark/blob/4ad9bfd53b84a6d2497668c73af6899bae14c187/python/run-tests.py#L161

This PR includes small related fixes together:

1. Install Python 3.8
2. Only install one Python implementation instead of installing many for SQL and Yarn test cases because they need one Python executable in their test cases that is higher than Python 2.
3. Do not install Python 2 which is not needed anymore after we dropped Python 2 at SPARK-32138
4. Remove a comment about installing PyPy3 on Jenkins - SPARK-32278. It is already installed.

Currently, only PyPy3 and Python 3.6 are being tested with PySpark in Github Actions. We should test the latest version of Python as well because some optimizations can be only enabled with Python 3.8+. See also apache#29114

No, dev-only.

Was not tested. Github Actions build in this PR will test it out.

Closes apache#29116 from HyukjinKwon/test-python3.8-togehter.

Authored-by: HyukjinKwon <gurwls223@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
HyukjinKwon added a commit to HyukjinKwon/spark that referenced this pull request Aug 19, 2020
…tions

This PR aims to test PySpark with Python 3.8 in Github Actions. In the script side, it is already ready:

https://github.com/apache/spark/blob/4ad9bfd53b84a6d2497668c73af6899bae14c187/python/run-tests.py#L161

This PR includes small related fixes together:

1. Install Python 3.8
2. Only install one Python implementation instead of installing many for SQL and Yarn test cases because they need one Python executable in their test cases that is higher than Python 2.
3. Do not install Python 2 which is not needed anymore after we dropped Python 2 at SPARK-32138
4. Remove a comment about installing PyPy3 on Jenkins - SPARK-32278. It is already installed.

Currently, only PyPy3 and Python 3.6 are being tested with PySpark in Github Actions. We should test the latest version of Python as well because some optimizations can be only enabled with Python 3.8+. See also apache#29114

No, dev-only.

Was not tested. Github Actions build in this PR will test it out.

Closes apache#29116 from HyukjinKwon/test-python3.8-togehter.

Authored-by: HyukjinKwon <gurwls223@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
HyukjinKwon added a commit to HyukjinKwon/spark that referenced this pull request Aug 19, 2020
…tions

This PR aims to test PySpark with Python 3.8 in Github Actions. In the script side, it is already ready:

https://github.com/apache/spark/blob/4ad9bfd53b84a6d2497668c73af6899bae14c187/python/run-tests.py#L161

This PR includes small related fixes together:

1. Install Python 3.8
2. Only install one Python implementation instead of installing many for SQL and Yarn test cases because they need one Python executable in their test cases that is higher than Python 2.
3. Do not install Python 2 which is not needed anymore after we dropped Python 2 at SPARK-32138
4. Remove a comment about installing PyPy3 on Jenkins - SPARK-32278. It is already installed.

Currently, only PyPy3 and Python 3.6 are being tested with PySpark in Github Actions. We should test the latest version of Python as well because some optimizations can be only enabled with Python 3.8+. See also apache#29114

No, dev-only.

Was not tested. Github Actions build in this PR will test it out.

Closes apache#29116 from HyukjinKwon/test-python3.8-togehter.

Authored-by: HyukjinKwon <gurwls223@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
HyukjinKwon added a commit to HyukjinKwon/spark that referenced this pull request Aug 19, 2020
…tions

This PR aims to test PySpark with Python 3.8 in Github Actions. In the script side, it is already ready:

https://github.com/apache/spark/blob/4ad9bfd53b84a6d2497668c73af6899bae14c187/python/run-tests.py#L161

This PR includes small related fixes together:

1. Install Python 3.8
2. Only install one Python implementation instead of installing many for SQL and Yarn test cases because they need one Python executable in their test cases that is higher than Python 2.
3. Do not install Python 2 which is not needed anymore after we dropped Python 2 at SPARK-32138
4. Remove a comment about installing PyPy3 on Jenkins - SPARK-32278. It is already installed.

Currently, only PyPy3 and Python 3.6 are being tested with PySpark in Github Actions. We should test the latest version of Python as well because some optimizations can be only enabled with Python 3.8+. See also apache#29114

No, dev-only.

Was not tested. Github Actions build in this PR will test it out.

Closes apache#29116 from HyukjinKwon/test-python3.8-togehter.

Authored-by: HyukjinKwon <gurwls223@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
HyukjinKwon added a commit to HyukjinKwon/spark that referenced this pull request Aug 19, 2020
…tions

This PR aims to test PySpark with Python 3.8 in Github Actions. In the script side, it is already ready:

https://github.com/apache/spark/blob/4ad9bfd53b84a6d2497668c73af6899bae14c187/python/run-tests.py#L161

This PR includes small related fixes together:

1. Install Python 3.8
2. Only install one Python implementation instead of installing many for SQL and Yarn test cases because they need one Python executable in their test cases that is higher than Python 2.
3. Do not install Python 2 which is not needed anymore after we dropped Python 2 at SPARK-32138
4. Remove a comment about installing PyPy3 on Jenkins - SPARK-32278. It is already installed.

Currently, only PyPy3 and Python 3.6 are being tested with PySpark in Github Actions. We should test the latest version of Python as well because some optimizations can be only enabled with Python 3.8+. See also apache#29114

No, dev-only.

Was not tested. Github Actions build in this PR will test it out.

Closes apache#29116 from HyukjinKwon/test-python3.8-togehter.

Authored-by: HyukjinKwon <gurwls223@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
HyukjinKwon added a commit to HyukjinKwon/spark that referenced this pull request Aug 19, 2020
…tions

This PR aims to test PySpark with Python 3.8 in Github Actions. In the script side, it is already ready:

https://github.com/apache/spark/blob/4ad9bfd53b84a6d2497668c73af6899bae14c187/python/run-tests.py#L161

This PR includes small related fixes together:

1. Install Python 3.8
2. Only install one Python implementation instead of installing many for SQL and Yarn test cases because they need one Python executable in their test cases that is higher than Python 2.
3. Do not install Python 2 which is not needed anymore after we dropped Python 2 at SPARK-32138
4. Remove a comment about installing PyPy3 on Jenkins - SPARK-32278. It is already installed.

Currently, only PyPy3 and Python 3.6 are being tested with PySpark in Github Actions. We should test the latest version of Python as well because some optimizations can be only enabled with Python 3.8+. See also apache#29114

No, dev-only.

Was not tested. Github Actions build in this PR will test it out.

Closes apache#29116 from HyukjinKwon/test-python3.8-togehter.

Authored-by: HyukjinKwon <gurwls223@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants