-
Notifications
You must be signed in to change notification settings - Fork 64
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Document how to enable Requester Pays access to Pangeo data for GCP hubs #662
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a few quick thoughts and suggestions from me. In general this looks quite nice, thanks for explaining all of this as you figured it out!
@@ -0,0 +1,103 @@ | |||
# Pangeo Data Access via Requester Pays |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps we should make this generic instead of Pangeo-specific, since I could imagine other hubs would want the same setup?
@@ -0,0 +1,103 @@ | |||
# Pangeo Data Access via Requester Pays |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we could use a top-level section like Data access
and then this could be the first content in that section. I think that could bring this in-line with our Hub Administrator's guide "topic guides" here: https://pilot.2i2c.org/en/latest/
|
||
For some hubs, such as our Pangeo deployments, the communities they serve require access to data stored in other projects. | ||
Accessing data normally comes with a charge that the folks _hosting_ the data have to take care of. | ||
However, there is a method by which those making the request are responsible for the charges instead: [Requester Pays](https://cloud.google.com/storage/docs/requester-pays). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is "Requestor pays" access something unique to GCP? Or are we just prototyping this first on GCP?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAIK, it is unique to GCP. I've not come across it before being introduced to the Pangeo deployment.
## Setting up Requester Pays Access on GCP | ||
|
||
```{note} | ||
We may automate these steps in the future. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This definitely feels like something that is semi-common for research groups, and that is not scalable in its current implementation of manual steps :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Absolutely. We should open an issue to figure out how to automate this as I suspect that it can't be confined to just the terraform scripts or just the deployer module, as for the last step you need a hub up and running. In practice though, this is a 5 min manual job.
Co-authored-by: Chris Holdgraf <choldgraf@gmail.com>
@choldgraf I think I've addressed your comments, let me know what you think! |
Thanks for fixing this one, @sgibson91. I had already automated this with config connector, creating the appropriate bindings in https://github.com/2i2c-org/pilot-hubs/blob/4d84506587142a36205b56dd3592e45aca649e21/hub-templates/basehub/templates/cloud-resources/gcp/service-account.yaml#L25. So this should have maybe already worked - the question for me is why did it not? Was there a feature we needed to turn on? Are these config connector resources being created appropriately? |
@yuvipanda Some random thoughts:
Let's compare |
Something is definitely broken in that config though as I had to manually set this up for COESSING too #473 (comment) which also had the config connector enabled https://github.com/2i2c-org/pilot-hubs/blob/4d84506587142a36205b56dd3592e45aca649e21/terraform/gcp/projects/pangeo-181919.tfvars#L10 |
@sgibson91 I think it needs to be explicitly enabled with helm config like https://github.com/2i2c-org/pilot-hubs/blob/4d84506587142a36205b56dd3592e45aca649e21/config/hubs/2i2c.cluster.yaml#L65. |
Hmmm, then we need to make that more discoverable/apparent that that's what it's doing. Even from that snippet, I don't think it's clear that that is also enabling requester pays access. Any recommendations? |
@sgibson91 let's try and see if it works, and if it does, we'll have to find a way to document that. I haven't touched it since we set up the first 2i2c clusters. I didn't add any docs when I first implemented that - I'm really sorry about that :( Let's try turning it on and see if it works? |
Sure - no worries! |
@yuvipanda the config connector stuff in the helm chart doesn't work as it creates an annotation that looks like this:
But that service account doesn't exist anywhere in the Google project - I don't think it's created by terraform (we only create I'll open a new issue to track fixing this up (I think it's doable if we set a fixed display name, like |
@sgibson91 that service account should've been also created by configconnector with https://github.com/2i2c-org/pilot-hubs/blob/f6db9a19b4058e1080312a75e9d5798cb3989a60/hub-templates/basehub/templates/cloud-resources/gcp/service-account.yaml#L3. We can't create it in terraform since it's one per hub rather than one per deployment. |
I'll double check it tomorrow but I checked the Google project after running the helm chart with that config enabled and noticed the name of the service account and it wasn't there. |
I opened #669 to track this |
I've updated this PR so the manual steps reflect what we expect the automation to be doing. Can we get this merged and then concentrate on figuring out why the automation doesn't work in #669? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's get this one merged and figure it out why the automation is not working as part of another ticket, as @sgibson91 proposed.
thanks @sgibson91 for digging into this + the follow up issue! |
This PR adds documentation on the steps required to enable Requester Pays access to other Google Cloud storage buckets, such as the Pangeo data. These steps have already been implemented for the Pangeo staging hub.
fixes #655