Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make the the Python package compatible with Athena PySpark #33

Conversation

jeancochrane
Copy link
Contributor

The Python package we introduced in #31 is not currently compatible with Athena PySpark since it depends on newer versions of Python and pandas that are not available in Athena's list of preinstalled packages. This PR updates the Python package to support the following lower bounds for tools that Athena PySpark includes in its environment:

  • Python 3.9
  • pandas 1.4
  • numpy 1.23

We also update the pytest-coverage GitHub workflow to ensure that we're testing the lower bounds of the package's dependencies as well.

Comment on lines 21 to 35
- python: "3.9"
pandas: "1.4.*"
numpy: "1.23.*"
- python: "3.10"
pandas: "2.2.*"
numpy: "2.1.*"
- python: "3.11"
pandas: "2.2.*"
numpy: "2.1.*"
- python: "3.12"
pandas: "2.2.*"
numpy: "2.1.*"
- python: "3.13"
pandas: "2.2.*"
numpy: "2.1.*"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This matrix feels a little bit clunky to me, given that we only care about specifying the pandas/numpy versions for Python 3.9, but I couldn't think of a clearer way of specifying it. Any ideas?

Comment on lines 12 to 13
"pandas>=1.4.0,<=2.3.0",
"numpy>=1.23.0,<=2.2.0"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The upper bounds are only present here because we're not testing anything higher in the pytest-coverage workflow, given the weird matrix in the workflow. If we can think of a way to remove the hardcoded upper bounds from the workflow matrix, we could remove the upper bound here. We could also just remove the upper bound from this config and accept that we're potentially not testing the latest versions in the pytest-coverage workflow.

Comment on lines 18 to 27
"mypy",
"pytest",
"ruff",
]
docs = [
"Sphinx>=8.1.3",
"myst-parser>=4.0.0",
"pydata-sphinx-theme>=0.16.0",
"sphinx-pyproject>=0.3.0"
"Sphinx",
"myst-parser",
"pydata-sphinx-theme",
"sphinx-pyproject",
"sphinx-autobuild"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since these dependencies are only for development and docs, I think we can afford to be less stringent with the requirements. By removing the version strings, uv will just install the latest version of each package that's compatible with whatever Python/pandas/numpy versions are installed.

@jeancochrane jeancochrane marked this pull request as ready for review November 27, 2024 20:47
@jeancochrane jeancochrane requested a review from a team as a code owner November 27, 2024 20:47
@jeancochrane jeancochrane requested a review from dfsnow November 27, 2024 20:47
Copy link
Member

@dfsnow dfsnow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jeancochrane Your PR here inspired me to look for a more flexible way to do this testing, and I think I cracked it in ccao-data/assesspy#28. Can you take a look at that PR and see if it makes sense to apply here before I do a full review?

@jeancochrane jeancochrane force-pushed the jeancochrane/make-python-package-spark-compatible branch 4 times, most recently from ce05828 to e9f5833 Compare December 2, 2024 18:20
@jeancochrane jeancochrane force-pushed the jeancochrane/make-python-package-spark-compatible branch from e9f5833 to e34b6d7 Compare December 2, 2024 18:34
@jeancochrane
Copy link
Contributor Author

Thanks for the great tox example @dfsnow! This PR should be configured to make use of it now, do you mind taking another look:?

@jeancochrane jeancochrane requested a review from dfsnow December 2, 2024 20:08
@jeancochrane jeancochrane marked this pull request as draft December 2, 2024 22:32
@jeancochrane jeancochrane removed the request for review from dfsnow December 2, 2024 22:32
[tox]
min_version = 4.0
envlist =
py{39, 310, 311}-lowest
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We end the lowest-direct tests at Python 3.11 since myst-parser didn't add support for 3.12 until 3.0.

@jeancochrane jeancochrane marked this pull request as ready for review December 2, 2024 23:42
@jeancochrane jeancochrane requested a review from dfsnow December 2, 2024 23:43
@jeancochrane
Copy link
Contributor Author

OK @dfsnow, ready for another look!

Copy link
Member

@dfsnow dfsnow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks set to go now @jeancochrane. Nice work!

@jeancochrane jeancochrane merged commit ca15900 into jeancochrane/further-python-package-migration Dec 4, 2024
13 checks passed
@jeancochrane jeancochrane deleted the jeancochrane/make-python-package-spark-compatible branch December 4, 2024 21:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants