-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add GCP implementation #423
Conversation
This only supports building an image and pushing that image to GCP Artifact Registry, so far. This also shares and updates the Dockerfile used by the AWS script, which was broken (the AWS script remains broken for other reasons).
Add the beginnings of support for buildstockbatch on GCP
* Add basic terraform file. * Cleanup
Flesh out GCP documentation for installation and project definition
* Add postprocessing support to GCP * Add postprocessing support to GCP
* Show batch job progress * Show final status
* prep for release * Update migration file for residential workflow changes. * using semver to compare versions * switching resstock branch to yml-resilience-args * fixing version validation unit test * Revert "switching resstock branch to yml-resilience-args" This reverts commit eb676a5. * updating .readthedocs.yml * Adding black formatter pre-commit * updating installation docs * adding additional pre-commit hooks * adding precommit to ci * updating OpenStudio to 3.7.0-rc1 * updating line length for black * adding black to dev extras_require * Removing /tmp/scratch files at end of array job * adding [dev] to development eagle environments * removing local_project_dir and allowing a dir not to exist * allow local singularity image to not exist * mocking shutil.rmtree in eagle tests * adding mocks to test if directories were deleted * updating changelog --------- Co-authored-by: Noel Merket <noel.merket@nrel.gov> Co-authored-by: Joe Robertson <joseph.robertson@nrel.gov>
Merge in refactoring from NREL's branch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks pretty good. Made some comments below. A few other notes:
- We'll need to bring this up to date with
develop
probably after the other PRs merge in. - On AWS and HPC we put the data in S3 and then run a data crawler on it so we can run sql queries on it in AWS Athena (Trino DB). I noticed you're not doing an equivalent thing for GCP here. Is that because there's no equivalent service or it just wasn't necessary for your use case?
Thanks for all your contributions here. Let me know how I can help get this across the finish line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks really good. One note below on imports.
buildstockbatch/postprocessing.py
Outdated
@@ -18,6 +18,7 @@ | |||
from dask.dataframe.io.parquet import create_metadata_file | |||
from fsspec.implementations.local import LocalFileSystem | |||
from functools import partial | |||
from gcsfs import GCSFileSystem |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
postprocessing.py is used by all our implementations. gcsfs
is only installed if the gcp
extras are installed. This will cause import errors for all other implementations where they may not have installed those extras. It looks like none of these imported fsspec filesystem classes are even used in this file. We can just remove the imports.
from gcsfs import GCSFileSystem |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call - fixed!
Developer Installaion | ||
Developer Installation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for fixing my typo.
Instal the buildstockbatch python library as described in :ref:`bsb-python` for | ||
Install the buildstockbatch python library as described in :ref:`bsb-python` for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Spelling is hard.
We could do the same (probably with BigQuery), but we just haven't needed it so far. |
Pull Request Description
Adds a GCP implementation of Buildstock Batch. This includes:
gcp/README.md
- Overview of the architecture of the job on GCP.gcp/gcp.py
- The bulk of the implementation code. It roughly follows the structure of the AWS implementation. Where possible, shared logic was already moved intodocker_base.py
to avoid duplicated code.docs/
- Documentation for installation and setup, configuration options, and running jobs.schemas/v0.3.yaml
- Schema for the new GCP-related project configuration options.I recommend doing a "Squash and Merge" of this PR - we don't need the (many!) individual commits that it includes.
Checklist
Not all may apply
minimum_coverage
in.github/workflows/coverage.yml
as necessary.gcp/
directory, but shared code in docker_base.py has tests.