-
Notifications
You must be signed in to change notification settings - Fork 244
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[hailctl] Fix dependency installation for hailctl dataproc clusters #12510
Conversation
test_dataproc only runs on deploys AFAIK |
Like a release? I think the terminology is a little overloaded. In services code we consider "deploy" as the CI pipeline that runs whenever main is updated. |
Ah I see, all these steps cut out early for non-release deploys. So we wouldn't have released a broken package which is good, but we'd still end up with a broken deploy, which is annoying. |
Sorry about the incorrect terminology! Your conclusion is my understanding, yes. |
hail/Makefile
Outdated
@@ -268,13 +268,13 @@ $(eval $(call ENV_VAR,cloud_base)) | |||
$(eval $(call ENV_VAR,wheel_cloud_path)) | |||
|
|||
python/hailtop/hailctl/deploy.yaml: env/cloud_base env/wheel_cloud_path | |||
python/hailtop/hailctl/deploy.yaml: $(resources) python/requirements.txt | |||
python/hailtop/hailctl/deploy.yaml: $(resources) python/pinned-requirements.txt |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this target does not exist.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it need to be? It's just a file in the repo
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like a foot-gun to me. Suppose I'm trying to experiment with a modified version of Hail. I've modified python/requirements.txt
but I've not run generate-pip-lockfile
. I run make install-hailctl
thinking that this will incorporate my changes into a new wheel file which I can use to start cluster. In reality, my changes are ignored and the old pinned requirements are used.
I think we need a rule like:
python/pinned-requirements.txt: python/requirements.txt
docker run --rm -it \
-v $(HAIL_HAIL_DIR):/hail \
python:3.7-slim \
/bin/bash -c "pip install pip-tools && cd /hail && \
pip-compile --upgrade python/requirements.txt \
--output-file=python/pinned-requirements.txt"
Or a rule like:
python/pinned-requirements.txt: generate-pip-lockfile
This latter rule will trigger generate-pip-lockfile
to always be run if we depend on the pinned-requirements.txt which isn't great. I think the best option is to have a rule for each requirements file and to also have generate-pip-lockfile as a special fast path for developers who want to update all the lock files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or, if we want to avoid invoking docker thrice, we could do this:
python/pinned-requirements.txt: update-pip-lockfiles
python/dev/pinned-requirements.txt: update-pip-lockfiles
python/hailtop/pinned-requirements.txt: update-pip-lockfiles
.PHONY: update-pip-lockfiles
update-pip-lockfiles: python/requirements.txt python/dev/requirements.txt python/hailtop/requirements.txt
...
And then to forcibly regenerate you gotta use -B
/--always-make
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the clarification, this makes a lot of sense! I will incorporate these changes.
hail/Makefile
Outdated
@@ -324,7 +324,7 @@ install: $(WHEEL) | |||
|
|||
.PHONY: install-on-cluster | |||
install-on-cluster: $(WHEEL) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this target can run without generated pinned requirements files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I understand entirely. Do you mean that install-on-cluster
should use unpinned requirements or is this a concern about the target / dependencies of the target?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is similar to my concerns above, there's an implicit dependency between python/requirements.txt and python/pinned-requirements.txt. At the very least we should error if the pinned-requirements.txt is older than the requirements.txt
Thank you! (this issue is blocking us now) |
hail/Makefile
Outdated
@@ -268,13 +268,13 @@ $(eval $(call ENV_VAR,cloud_base)) | |||
$(eval $(call ENV_VAR,wheel_cloud_path)) | |||
|
|||
python/hailtop/hailctl/deploy.yaml: env/cloud_base env/wheel_cloud_path | |||
python/hailtop/hailctl/deploy.yaml: $(resources) python/requirements.txt | |||
python/hailtop/hailctl/deploy.yaml: $(resources) python/pinned-requirements.txt |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like a foot-gun to me. Suppose I'm trying to experiment with a modified version of Hail. I've modified python/requirements.txt
but I've not run generate-pip-lockfile
. I run make install-hailctl
thinking that this will incorporate my changes into a new wheel file which I can use to start cluster. In reality, my changes are ignored and the old pinned requirements are used.
I think we need a rule like:
python/pinned-requirements.txt: python/requirements.txt
docker run --rm -it \
-v $(HAIL_HAIL_DIR):/hail \
python:3.7-slim \
/bin/bash -c "pip install pip-tools && cd /hail && \
pip-compile --upgrade python/requirements.txt \
--output-file=python/pinned-requirements.txt"
Or a rule like:
python/pinned-requirements.txt: generate-pip-lockfile
This latter rule will trigger generate-pip-lockfile
to always be run if we depend on the pinned-requirements.txt which isn't great. I think the best option is to have a rule for each requirements file and to also have generate-pip-lockfile as a special fast path for developers who want to update all the lock files.
hail/Makefile
Outdated
@@ -324,7 +324,7 @@ install: $(WHEEL) | |||
|
|||
.PHONY: install-on-cluster | |||
install-on-cluster: $(WHEEL) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is similar to my concerns above, there's an implicit dependency between python/requirements.txt and python/pinned-requirements.txt. At the very least we should error if the pinned-requirements.txt is older than the requirements.txt
…n path in dataproc
Because of the unpredictable way that git clone might realize the requirements files, I removed the pinned-requirements file as a dependency of the changed targets. In both cases, regenerating that file (either in CI as part of the deploy.yaml target or on a cluster for I performed the following manual testing:
However, I'm not sure I'm actually doing this right. I checked that in step 2 I was not regenerating any pinned-requirments files, but in step 3 make updated the pinned requirements without me telling it to, I'm guessing because of the wheel's dependence on |
hail/Makefile
Outdated
HAIL_HAIL_DIR=$(HAIL_HAIL_DIR) \ | ||
../generate_pip_lockfile.sh python/hailtop/requirements.txt python/hailtop/pinned-requirements.txt | ||
|
||
python/pinned-requirements.txt: python/hailtop/requirements.txt python/requirements.txt |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we have dependencies on hailtop here and on hailtop and hail in the dev target?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
python/requirements.txt
includes those requirements from python/hailtop/requirements.txt
, and the dev requirements are constrained by python/requirements.txt
, so if any of these requirements files changed, it could affect the pinned-requirements.
hail/Makefile
Outdated
rm -f $@ | ||
echo "dataproc:" >> $@ | ||
for FILE in $(notdir $(resources)); do \ | ||
echo " $$FILE: $(cloud_base)/$$FILE" >> $@ || exit 1; done | ||
echo " wheel: $(wheel_cloud_path)" >> $@ | ||
echo " pip_dependencies: $(shell cat python/requirements.txt | sed '/^[[:blank:]]*#/d;s/#.*//' | grep -v pyspark | tr "\n" "|||")" >> $@ | ||
printf " pip_dependencies: " >> $@ | ||
$(PIP) install pip-tools && ../check_pip_requirements.sh python/requirements.txt python/pinned-requirements.txt |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do I understand correctly that this will solve dependencies using both files as constraints, then compare the solution to the pinned (second argument)? The idea being that if I add a new dependency, a new transitive dependency, or a conflicting version to the first file, it will trigger a diff failure when the solution is compared to the existing pinned file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes
Removed the targets due to the reproducibility issues we discussed, but ensured that the targets that use the pinned requirements files check that they are still compatible with the requirements.txt files. I tested this by changing the requirements file (one by making an incompatible version change, one by adding a new dependency) and saw that the deploy.yaml make target failed and emitted a message to recreate the lockfiles
@danking Sorry for the delay on this. Trying to move this along faster now to unblock the release |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please follow up with change to target naming of check-pip-lockfiles
I broke this when I separated out the dependencies for hail into two layers:
This fix does two things:
install-deps
andinstall-dev-deps
as the normal requirements files as those are meant for development (I think?) but am happy to take opinions on whether we should use fully pinned deps there as well. I have so far been going by the rule of thumb of fully-pinned for CI and production environments, more lax rules for dev environments.See here for additional context.
cc: @tpoterba, any idea why the test dataproc test succeeded?