chore: Support Python 3.10 and bump pandas 1.4 and pyarrow 6 #21002

EugeneTorap · 2022-08-07T08:05:10Z

Fix #19986 issue when a user tries to install superset using Python 3.10 because pyarrow 5.0.0 doesn't have a wheel for Python 3.10

SUMMARY

In order to use Python 3.10 in superset we need to bump PyArrow (from 5.0.0 to 6.0.1)
Also bump Pandas to latest minor (from 1.3.4 to 1.4.3).

Pandas 1.4 added a wheel for Python 3.9, Apple Silicon

Pandas 1.4 introduced support for using pyarrow as an engine for reading CSVs, which brings performance improvements (see https://pandas.pydata.org/docs/whatsnew/v1.4.0.html#multi-threaded-csv-reading-with-a-new-csv-engine-based-on-pyarrow for details). Therefore engine="pyarrow" has been added everywhere we're calling pd.read_csv.

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

TESTING INSTRUCTIONS

ADDITIONAL INFORMATION

Has associated issue:
Required feature flags:
Changes UI
Includes DB Migration (follow approval process in SIP-59)
- Migration is atomic, supports rollback & is backwards-compatible
- Confirm DB migration upgrade and downgrade tested
- Runtime estimates and downtime expectations provided
Introduces new feature or API
Removes existing feature or API

codecov · 2022-08-07T10:19:23Z

Codecov Report

Merging #21002 (122d691) into master (e214e1a) will decrease coverage by 0.09%.
The diff coverage is 63.98%.

❗ Current head 122d691 differs from pull request most recent head 213bf79. Consider uploading reports for the commit 213bf79 to get more accurate results

@@            Coverage Diff             @@
##           master   #21002      +/-   ##
==========================================
- Coverage   66.34%   66.25%   -0.10%     
==========================================
  Files        1767     1770       +3     
  Lines       67312    67526     +214     
  Branches     7144     7182      +38     
==========================================
+ Hits        44656    44737      +81     
- Misses      20828    20953     +125     
- Partials     1828     1836       +8

Flag	Coverage Δ
hive	`53.17% <45.76%> (+0.01%)`	⬆️
mysql	`80.96% <69.49%> (+0.04%)`	⬆️
postgres	`81.00% <69.49%> (+0.01%)`	⬆️
presto	`53.07% <45.76%> (+0.01%)`	⬆️
python	`81.43% <69.49%> (-0.04%)`	⬇️
sqlite	`?`
unit	`50.74% <52.54%> (+0.27%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
...packages/superset-ui-core/src/query/types/Query.ts	`100.00% <ø> (ø)`
...set-ui-core/src/ui-overrides/ExtensionsRegistry.ts	`100.00% <ø> (ø)`
...ackages/superset-ui-core/src/utils/featureFlags.ts	`100.00% <ø> (ø)`
...rts/src/BigNumber/BigNumberTotal/transformProps.ts	`0.00% <0.00%> (ø)`
...lugin-chart-echarts/src/BigNumber/BigNumberViz.tsx	`0.00% <0.00%> (ø)`
...lugin-chart-echarts/src/BoxPlot/EchartsBoxPlot.tsx	`0.00% <0.00%> (ø)`
.../plugins/plugin-chart-echarts/src/BoxPlot/types.ts	`0.00% <ø> (ø)`
.../plugin-chart-echarts/src/Funnel/EchartsFunnel.tsx	`0.00% <0.00%> (ø)`
...d/plugins/plugin-chart-echarts/src/Funnel/types.ts	`100.00% <ø> (ø)`
...ns/plugin-chart-echarts/src/Gauge/EchartsGauge.tsx	`0.00% <0.00%> (ø)`
... and 89 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

…d type NoneType.

EugeneTorap · 2022-08-08T20:07:29Z

@hughhhh @betodealmeida Can you review it?

betodealmeida

This looks awesome! But I'm very concerned with how in the unit tests some of the NaNs are now being returned as zeros, since it would lead to wrong results. Any idea why that is happening here?

tests/unit_tests/pandas_postprocessing/test_contribution.py

EugeneTorap · 2022-08-08T20:29:21Z

How should I fix this test?
Pandas returns 0 instead of nan for the API

betodealmeida · 2022-08-08T22:23:33Z

How should I fix this test? Pandas returns 0 instead of nan for the API

Taking another look, I guess 0 makes sense from a contribution point of view. It should be fine in this case.

betodealmeida

I took another look and have a few questions.

superset/charts/post_processing.py

superset/examples/bart_lines.py

tests/unit_tests/pandas_postprocessing/test_contribution.py

EugeneTorap · 2022-08-16T09:33:57Z

@betodealmeida @villebro Can you review again?

villebro

LGTM, thanks for all the iterations!

zhaoyongjie

LGTM! Thanks @EugeneTorap and @villebro

betodealmeida

This is great! Thanks for the work, @EugeneTorap!

betodealmeida · 2022-08-17T13:35:04Z

superset/examples/helpers.py

+def get_example_url(filepath: str) -> str:
+    return f"{BASE_URL}{filepath}?raw=true"


betodealmeida · 2022-08-17T13:35:26Z

superset/utils/pandas_postprocessing/contribution.py

@@ -49,6 +49,9 @@ def contribution(
    """
    contribution_df = df.copy()
    numeric_df = contribution_df.select_dtypes(include=["number", Decimal])
+    # TODO: copy needed due to following regression in 1.4, remove if not needed:
+    # https://github.com/pandas-dev/pandas/issues/48090
+    numeric_df = numeric_df.copy()


cwegener · 2022-08-22T03:19:36Z

Nice work! Going to test this out very soon.

I know that there used to be the problem of and empty result set from SQLalchemy causing an Exception in pandas when using PyArrow 6.0 and higher, leading to unfriendly error messages in Explore (and charts on dashboards) instead of the friendly "No data" message.

EugeneTorap added 4 commits August 7, 2022 10:03

Bump pandas 1.4 and pyarrow 6

fb67e8f

Use engine="pyarrow" for pd.read_csv()

78debe9

Refactoring

525551f

Refactoring

ed580ee

pull-request-size bot added the size/M label Aug 7, 2022

EugeneTorap added 5 commits August 7, 2022 11:09

Refactoring

9eb34bf

Use bytes in pd.read_json()

8f6519d

Fix test_contribution

8085e28

Merge branch 'master' into feat/support-python3.10

5acf946

Fix pandas issue when 'arrays' are empty but 'names' contain values

85369fe

fix: ValueError: For argument "ascending" expected type bool, receive…

f859ba4

…d type NoneType.

betodealmeida requested changes Aug 8, 2022

View reviewed changes

tests/unit_tests/pandas_postprocessing/test_contribution.py Outdated Show resolved Hide resolved

betodealmeida requested changes Aug 9, 2022

View reviewed changes

superset/charts/post_processing.py Outdated Show resolved Hide resolved

superset/examples/bart_lines.py Outdated Show resolved Hide resolved

Remove engine="pyarrow" and convert bytes to string

3d63498

EugeneTorap requested a review from betodealmeida August 9, 2022 06:02

villebro requested a review from zhaoyongjie August 10, 2022 13:32

villebro requested changes Aug 11, 2022

View reviewed changes

tests/unit_tests/pandas_postprocessing/test_contribution.py Outdated Show resolved Hide resolved

villebro and others added 2 commits August 16, 2022 08:06

make copy of selected df to fix regression

c67d43c

Simplify pd.read_json() and pd.read_csv() for example data

213bf79

pull-request-size bot added size/L and removed size/M labels Aug 16, 2022

EugeneTorap requested a review from villebro August 16, 2022 09:29

villebro approved these changes Aug 16, 2022

View reviewed changes

zhaoyongjie approved these changes Aug 16, 2022

View reviewed changes

betodealmeida approved these changes Aug 17, 2022

View reviewed changes

betodealmeida merged commit 76d6a9a into apache:master Aug 17, 2022

EugeneTorap deleted the feat/support-python3.10 branch August 17, 2022 13:38

cwegener mentioned this pull request Aug 30, 2022

ModuleNotFoundError: No module named 'werkzeug.wrappers.etag' #20723

Closed

john-bodley mentioned this pull request Jan 3, 2023

Python 3.10 support missed from 2.0 branch #22582

Closed

mistercrunch added the 🚢 2.1.3 label Feb 18, 2024

mistercrunch added 🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels 🚢 2.1.0 and removed 🚢 2.1.3 labels Mar 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore: Support Python 3.10 and bump pandas 1.4 and pyarrow 6 #21002

chore: Support Python 3.10 and bump pandas 1.4 and pyarrow 6 #21002

EugeneTorap commented Aug 7, 2022

codecov bot commented Aug 7, 2022 •

edited

Loading

EugeneTorap commented Aug 8, 2022

betodealmeida left a comment

EugeneTorap commented Aug 8, 2022 •

edited

Loading

betodealmeida commented Aug 8, 2022

betodealmeida left a comment

EugeneTorap commented Aug 16, 2022

villebro left a comment

zhaoyongjie left a comment

betodealmeida left a comment

betodealmeida Aug 17, 2022

betodealmeida Aug 17, 2022

cwegener commented Aug 22, 2022

		def get_example_url(filepath: str) -> str:
		return f"{BASE_URL}{filepath}?raw=true"

chore: Support Python 3.10 and bump pandas 1.4 and pyarrow 6 #21002

chore: Support Python 3.10 and bump pandas 1.4 and pyarrow 6 #21002

Conversation

EugeneTorap commented Aug 7, 2022

SUMMARY

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

TESTING INSTRUCTIONS

ADDITIONAL INFORMATION

codecov bot commented Aug 7, 2022 • edited Loading

Codecov Report

EugeneTorap commented Aug 8, 2022

betodealmeida left a comment

Choose a reason for hiding this comment

EugeneTorap commented Aug 8, 2022 • edited Loading

betodealmeida commented Aug 8, 2022

betodealmeida left a comment

Choose a reason for hiding this comment

EugeneTorap commented Aug 16, 2022

villebro left a comment

Choose a reason for hiding this comment

zhaoyongjie left a comment

Choose a reason for hiding this comment

betodealmeida left a comment

Choose a reason for hiding this comment

betodealmeida Aug 17, 2022

Choose a reason for hiding this comment

betodealmeida Aug 17, 2022

Choose a reason for hiding this comment

cwegener commented Aug 22, 2022

codecov bot commented Aug 7, 2022 •

edited

Loading

EugeneTorap commented Aug 8, 2022 •

edited

Loading