-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chore: bump pyspark version #974
base: dev
Are you sure you want to change the base?
Conversation
371441b
to
630c0c9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
2 questions:
- Should we also give support to pyspark versions lower than 3.5? Afaik, there is nothing in the API that is not compatible. I know we have custom code to compare dataframes, something that Pyspark 3.5 supports
- We need to update the Dataproc version too. Should we make a pipeline run to test nothing breaks?
These are very nice points!
|
887b5f7
to
98d464d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really cool!! New PySpark, support for several Python versions (dev version set to 3.11), and migration to UV.
I suggest changing the PR title, this is clearly a lie haha
Setting up the dev environment was smooth simply by doing make setup-dev
(except that uv wasnt in my path, see minor comment below).
I haven't used uv
myself, so I can't really evaluate pros and cons. M only questions is to confirm that we are not missing anything from CI/CD after the migration?
|
||
echo "Fetching version changes..." | ||
git fetch | ||
if output=$(git status --porcelain) && [[ -n "$output" ]]; then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, I didn't know about --porcerlain
:D
"google-cloud-storage (>=2.14.0, <2.15.0)" | ||
] | ||
classifiers = [ | ||
"Programming Language :: Python :: 3.10", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🥳
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Gentropy image will be automatically updated, right?
if ! command -v uv &>/dev/null; then | ||
echo "uv was not found, installing uv..." | ||
curl -LsSf https://astral.sh/uv/install.sh | sh | ||
source $HOME/.local/bin/env |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After installing the environment, uv
wasn't on my path. Should we add export PATH="$HOME/.local/bin:$PATH"
here?
|
||
test: ## Run tests | ||
@echo "Running Tests..." | ||
@poetry run pytest | ||
@uv run pytest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please test this yourself, but the current command goes idle without running any tests
@uv run pytest | |
@uv run pytest . |
✨ Context
This PR closes #3189
This PR closes #3680
and links to opentargets/orchestration#94
🛠 What does this PR implement
This PR focuses on allowing pyspark>=3.5.0, <3.6 along with newest hail version, that supports it.
The changes above caused trouble (errors) for poetry to resolve the dependencies. This lead to more developments, the full list is described below:
X.Y.Z-devXX
releases on push todev
branchgoogle
as dependency, added more granulargoogle-cloud-storage
(To be removed in future release with other google dependent packages)dev cluster setup
Previously dev cluster was created from the package pushed to the google cloud storage bucket under the namespace that matched the git ref name. This has now changed, as cluster now install gentropy directly from the github branch (or tag) specified in the orchestration - see changes in opentargets/orchestration#94.
Benefits of new solution
cli.py
) in the GCS.uv pip install
(pip resolution of gentropy with support for multiple python versions takes ~ 20minutes leading to initialization actions timeout, uv does the same thing with around 1.5min).🙈 Missing
🚦 Before submitting
dev
branch?make test
)?poetry run pre-commit run --all-files
)?