Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request]: Add Google Drive scope to GCP Auth #23290

Closed
ricardograca-scratch opened this issue Sep 19, 2022 · 21 comments · Fixed by #23644
Closed

[Feature Request]: Add Google Drive scope to GCP Auth #23290

ricardograca-scratch opened this issue Sep 19, 2022 · 21 comments · Fixed by #23644
Labels
core done & done Issue has been reviewed after it was closed for verification, followups, etc. new feature P2 python

Comments

@ricardograca-scratch
Copy link

What would you like to happen?

The current Python SDK doesn't include the https://www.googleapis.com/auth/drive scope when sending the authentication request to Google Cloud. This means that it's not possible to access Google Drive backed tables in BigQuery.

Adding the scope does indeed provide the required access. I'm only interested in the Python SDK, but this change should be easy to apply on all other SDKs as well if there's interest in that.

I can work on this change and submit a pull request.

Issue Priority

Priority: 2

Issue Component

Component: sdk-py-core

@tvalentyn
Copy link
Contributor

What happens if Drive API is not enabled for a project but we include it with a scope?

@ricardograca-scratch
Copy link
Author

I imagine the same thing that happens with any other scope, but I'll check and report back.

@ricardograca-scratch
Copy link
Author

@tvalentyn The Drive API being enabled or not has not effect on the result if the extra scope is included. However, without the scope, even if the file is shared with the service account accessing the data this error is displayed and it's not possible to access the table:

Access Denied: BigQuery BigQuery: Permission denied while getting Drive credentials.

Adding the scope resolves the issue.

@tvalentyn
Copy link
Contributor

Thanks for checking.

Would adding https://www.googleapis.com/auth/cloud-platform also fix the problem or this specific drive scope is necessary?

@tvalentyn
Copy link
Contributor

If not, I would suggest adding a new pipeline option that allows such customizations.

Looks like in Java it's possible to specify a custom credential via a factory or credential instance : https://beam.apache.org/releases/javadoc/current/index.html?org/apache/beam/sdk/extensions/gcp/options/GcpOptions.html . In Python we don't have this. But perhaps passing a string of custom scopes to use when authenticating would be sufficient?

@ricardograca-scratch
Copy link
Author

Would adding https://www.googleapis.com/auth/cloud-platform also fix the problem or this specific drive scope is necessary?

It's already there and it doesn't allow access to Google Drive backed tables, so the drive scope is necessary.

If not, I would suggest adding a new pipeline option that allows such customizations.

No other scope is treated this way, and it seems like I would need to do a lot of modifications to beam to enable that. I'm no beam expert, not to mention I have no experience developing Java or Go, so the likelihood of me actually pulling off what you propose is very slim, since it would require a substantial amount of time.

Can't I just add the necessary scope since it doesn't seem to cause any issues?

@kennknowles
Copy link
Member

kennknowles commented Sep 30, 2022

It seems to me that your proposed change is consistent with the current design so we could go ahead as a workaround. But at the same time the current design is odd, since it adds a totally arbitrary set of scopes instead of just the necessary set, so we should fix that.

@ricardograca-scratch
Copy link
Author

I agree with that. The ideal solution for me would be to provide a minimal set that is required to access BigQuery and allow users to specify any extra scopes needed.

I'll work on this change as it's currently proposed for now.

@lukecwik
Copy link
Member

I disagree that adding scopes is a good idea, we should instead allow users to configure the scopes themselves. Adding a credential factory to python or adding a pipeline option that enables configuring these scopes with the default being the currently enabled list makes a lot more sense to me.

@kennknowles
Copy link
Member

I agree with your comment. I am separating the immediate need with the longer term refactor. Currently, we have an arbitrary set of scopes that probably never should have been done this way. This proposal adds one, keeping the design as-is.

Refactoring the design to use only minimal scopes specified by the user, or perhaps by the IOs in the pipeline, is still the best way to go in the future.

@lukecwik
Copy link
Member

I would disagree, having existing users who upgrade to this version pick up additional scopes that they may not have been aware of is not a good experience.

@kennknowles
Copy link
Member

Again, I agree with you, but that is status quo. That is exactly what happened when SpannerIO was added.

@kennknowles
Copy link
Member

If it is easy to do The Right Thing then go ahead and do it. But forcing arbitrary discipline on new contributors that we ourselves never complied with is not how we should run things.

@lukecwik
Copy link
Member

Continuing to do something that was done incorrectly in the past and digging a bigger hole for future maintenance is a problem.

@tvalentyn
Copy link
Contributor

tvalentyn commented Oct 14, 2022

That is exactly what happened when SpannerIO was added.

It might have been covered by the google-cloud-platform scope. Java doesn't have spanner scopes, not sure if python had to reference them explicitly.

@kennknowles
Copy link
Member

@lukecwik
Copy link
Member

lukecwik commented Oct 14, 2022

5b6a0ea#diff-21a78a52eca0c898070d58302127a9bb5cdb5de512ec16b9b3d945e0b84b694c only modifies Python's scopes and not Java's, I think @tvalentyn point still is valid.

lukecwik added a commit to lukecwik/incubator-beam that referenced this issue Oct 14, 2022
This allows users to limit scopes dependent on their pipeline.

fixes apache#23290
@lukecwik
Copy link
Member

If it is easy to do The Right Thing then go ahead and do it. But forcing arbitrary discipline on new contributors that we ourselves never complied with is not how we should run things.

We should be guiding contributors to produce the right solution, this isn't arbitrary.

@lukecwik
Copy link
Member

#23644 is a proposed solution.

lukecwik added a commit to lukecwik/incubator-beam that referenced this issue Oct 14, 2022
This allows users to limit scopes dependent on their pipeline.

fixes apache#23290
lukecwik added a commit that referenced this issue Oct 15, 2022
* Make GCP OAuth scopes configurable via pipeline options.

This allows users to limit scopes dependent on their pipeline.

fixes #23290

* Fix usages of get credentials where no pipeline options was being used to pass in a default instance

* Address review comments

Drop the non pipelineoptions routes in _add_impersonation_...
@ricardograca-scratch
Copy link
Author

Thanks everyone! The final solution looks great, and I wouldn't have minded doing the right thing to help the project, but it would have taken me a long time since I'm not familiar with the project internals.

@lukecwik
Copy link
Member

I'm happy that you're happy with the solution.

Hopefully we can spend more time with you on your next change.

@tvalentyn tvalentyn mentioned this issue Nov 4, 2022
4 tasks
@tvalentyn tvalentyn added the done & done Issue has been reviewed after it was closed for verification, followups, etc. label Nov 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core done & done Issue has been reviewed after it was closed for verification, followups, etc. new feature P2 python
Projects
None yet
4 participants