Add GCP implementation #423

nweires · 2023-12-08T19:46:03Z

Pull Request Description

Adds a GCP implementation of Buildstock Batch. This includes:

gcp/README.md - Overview of the architecture of the job on GCP.
gcp/gcp.py - The bulk of the implementation code. It roughly follows the structure of the AWS implementation. Where possible, shared logic was already moved into docker_base.py to avoid duplicated code.
docs/ - Documentation for installation and setup, configuration options, and running jobs.
schemas/v0.3.yaml - Schema for the new GCP-related project configuration options.

I recommend doing a "Squash and Merge" of this PR - we don't need the (many!) individual commits that it includes.

Checklist

Not all may apply

Code changes (must work)
Tests exercising your feature/bug fix (check coverage report on Checks -> BuildStockBatch Tests -> Artifacts)
Coverage has increased or at least not decreased. Update minimum_coverage in .github/workflows/coverage.yml as necessary.
- Added exception for the gcp/ directory, but shared code in docker_base.py has tests.
All other unit and integration tests passing
Update validation for project config yaml file changes
Update existing documentation
Run a small batch run on Kestrel/Eagle to make sure it all works if you made changes that will affect Kestrel/Eagle
Add to the changelog_dev.rst file and propose migration text in the pull request

This only supports building an image and pushing that image to GCP Artifact Registry, so far. This also shares and updates the Dockerfile used by the AWS script, which was broken (the AWS script remains broken for other reasons).

Add the beginnings of support for buildstockbatch on GCP

* Add basic terraform file. * Cleanup

Flesh out GCP documentation for installation and project definition

* Add postprocessing support to GCP * Add postprocessing support to GCP

* Show batch job progress * Show final status

* prep for release * Update migration file for residential workflow changes. * using semver to compare versions * switching resstock branch to yml-resilience-args * fixing version validation unit test * Revert "switching resstock branch to yml-resilience-args" This reverts commit eb676a5. * updating .readthedocs.yml * Adding black formatter pre-commit * updating installation docs * adding additional pre-commit hooks * adding precommit to ci * updating OpenStudio to 3.7.0-rc1 * updating line length for black * adding black to dev extras_require * Removing /tmp/scratch files at end of array job * adding [dev] to development eagle environments * removing local_project_dir and allowing a dir not to exist * allow local singularity image to not exist * mocking shutil.rmtree in eagle tests * adding mocks to test if directories were deleted * updating changelog --------- Co-authored-by: Noel Merket <noel.merket@nrel.gov> Co-authored-by: Joe Robertson <joseph.robertson@nrel.gov>

Merge in refactoring from NREL's branch

nmerket

Looks pretty good. Made some comments below. A few other notes:

We'll need to bring this up to date with develop probably after the other PRs merge in.
On AWS and HPC we put the data in S3 and then run a data crawler on it so we can run sql queries on it in AWS Athena (Trino DB). I noticed you're not doing an equivalent thing for GCP here. Is that because there's no equivalent service or it just wasn't necessary for your use case?

Thanks for all your contributions here. Let me know how I can help get this across the finish line.

setup.py

docs/installation.rst

buildstockbatch/gcp/gcp.py

into gcp

nmerket

This looks really good. One note below on imports.

nmerket · 2024-03-28T18:09:47Z

buildstockbatch/postprocessing.py

@@ -18,6 +18,7 @@
 from dask.dataframe.io.parquet import create_metadata_file
 from fsspec.implementations.local import LocalFileSystem
 from functools import partial
+from gcsfs import GCSFileSystem


postprocessing.py is used by all our implementations. gcsfs is only installed if the gcp extras are installed. This will cause import errors for all other implementations where they may not have installed those extras. It looks like none of these imported fsspec filesystem classes are even used in this file. We can just remove the imports.

Suggested change

from gcsfs import GCSFileSystem

Good call - fixed!

buildstockbatch/postprocessing.py

nmerket · 2024-03-28T18:11:06Z

docs/installation.rst

-Developer Installaion
+Developer Installation


Thanks for fixing my typo.

nmerket · 2024-03-28T18:11:14Z

docs/installation.rst

-Instal the buildstockbatch python library as described in :ref:`bsb-python` for
+Install the buildstockbatch python library as described in :ref:`bsb-python` for


Spelling is hard.

nweires · 2024-03-28T19:02:02Z

On AWS and HPC we put the data in S3 and then run a data crawler on it so we can run sql queries on it in AWS Athena (Trino DB). I noticed you're not doing an equivalent thing for GCP here. Is that because there's no equivalent service or it just wasn't necessary for your use case?

We could do the same (probably with BigQuery), but we just haven't needed it so far.

lathanh and others added 30 commits October 19, 2023 21:55

Add the beginnings of support for buildstockbatch on GCP

6da2cb2

This only supports building an image and pushing that image to GCP Artifact Registry, so far. This also shares and updates the Dockerfile used by the AWS script, which was broken (the AWS script remains broken for other reasons).

Merge pull request #1 from rewiringamerica/gcp-start

cbce5c8

Add the beginnings of support for buildstockbatch on GCP

Start outline of documentation updates for running on GCP (#2)

2cbb454

Run basic GCP batch job (#3)

6d83a7d

Flesh out GCP documentation for installation and project definition

954d4b7

Flesh out GCP documentation for installation and project definition

fa33798

Add GCP config validation (#6)

454dc5d

Add basic terraform file (#5)

07d5e8a

* Add basic terraform file. * Cleanup

Improvements and fixes from PR comments

4c2ee04

Merge pull request #7 from rewiringamerica/lathanh/gcp-docs

d841947

Flesh out GCP documentation for installation and project definition

Run building simulations on GCP Batch (#4)

1a5be50

Show status of individual tasks (#9)

048b66a

Add postprocessing support to GCP (#11)

0ad0279

* Add postprocessing support to GCP * Add postprocessing support to GCP

Update settings and defaults for job resources (#8)

0e93574

Show batch job progress (#12)

45d51b7

* Show batch job progress * Show final status

Fix write path for postprocessing for GCP (#13)

8bb583c

Run Black formatter (#14)

ab0e460

Remove notifications email option (#15)

eba8910

Fix documentation cross-reference (#17)

e3ce11c

Omit GCP code from test coverage checks (#19)

74708de

Merge branch 'develop' into natalie/merge

d2ab4dd

Fix broken 'Create a repository' link in installation.rst

50afeda

Merge pull request #20 from rewiringamerica/natalie/merge

350bae4

Merge in refactoring from NREL's branch

Use shared code in GCP implementation (#22)

495455f

Changes to how we handle job IDs (#18)

9e2452f

Make GCP's arg name in documentation match the script's name

4dbb2d0

Make job_identifier actually optional (#24)

ad67859

Use default application credentials (when available) to push to AR (#25)

f5b0011

Add support for running post-processing on GCP Cloud Run (#21)

68faf2f

nweires and others added 8 commits February 27, 2024 09:45

Separate GCP dependencies (#68)

11ea6ff

Small resource docs updates (#69)

500d7e5

skip deletion in postprocessonly flag

2f56442

skip deletion in postprocessonly flag. rebase

687018c

deletion rebase

cdaa68d

Fix rebase

b1c467e

Merge in AWS updates (#72)

374794b

A few fixes for compatibility with ComStock (#73)

afb7c6d

nmerket reviewed Mar 25, 2024

View reviewed changes

nweires added 4 commits March 26, 2024 14:20

Merge branch 'develop' into gcp

7a06247

Merge branch 'gcp' of https://github.com/rewiringamerica/buildstockbatch

a219571

into gcp

Merge branch 'develop' into gcp

0cd324d

Clean up installation docs

96d0857

nweires mentioned this pull request Mar 26, 2024

Add config option for simulation duration rewiringamerica/buildstockbatch#76

Merged

nweires added 3 commits March 26, 2024 17:50

Merge branch 'develop' into gcp

3c561ea

Add config option for simulation duration (#76)

e3720d8

Cleanups and merge fixes

ec61843

nweires mentioned this pull request Mar 27, 2024

Add GCP implementation #444

Closed

8 tasks

nweires added 3 commits March 27, 2024 14:26

Cleanup

797d614

Fix test

56684e7

Add changelog entry

0838e4a

nweires changed the title ~~DRAFT: GCP implementation~~ Add GCP implementation Mar 27, 2024

nweires marked this pull request as ready for review March 27, 2024 17:18

nweires requested a review from nmerket March 27, 2024 17:31

nmerket requested changes Mar 28, 2024

View reviewed changes

Remove unused imports

2cb74a8

nweires requested a review from nmerket March 29, 2024 14:55

nmerket approved these changes Mar 29, 2024

View reviewed changes

nmerket merged commit 92993de into NREL:develop Mar 29, 2024
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add GCP implementation #423

Add GCP implementation #423

nweires commented Dec 8, 2023 •

edited by nmerket

Loading

nmerket left a comment

nmerket left a comment

nmerket Mar 28, 2024

nweires Mar 28, 2024

nmerket Mar 28, 2024

nmerket Mar 28, 2024

nweires commented Mar 28, 2024

		Instal the buildstockbatch python library as described in :ref:`bsb-python` for
		Install the buildstockbatch python library as described in :ref:`bsb-python` for

		Developer Installaion
		Developer Installation

Add GCP implementation #423

Add GCP implementation #423

Conversation

nweires commented Dec 8, 2023 • edited by nmerket Loading

Pull Request Description

Checklist

nmerket left a comment

Choose a reason for hiding this comment

nmerket left a comment

Choose a reason for hiding this comment

nmerket Mar 28, 2024

Choose a reason for hiding this comment

nweires Mar 28, 2024

Choose a reason for hiding this comment

nmerket Mar 28, 2024

Choose a reason for hiding this comment

nmerket Mar 28, 2024

Choose a reason for hiding this comment

nweires commented Mar 28, 2024

nweires commented Dec 8, 2023 •

edited by nmerket

Loading