-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update BigQuery connector documentation #25109
base: master
Are you sure you want to change the base?
Update BigQuery connector documentation #25109
Conversation
I have one more thing I am considering adding here. Use case - cross project service accounts, aspect - PTF functions. Little on the use case and how to setup service accounts in this way:
IllustrationSetupI did some testing with 3 Google Cloud projects Projects:
Service accounts:
Findings
This makes it possible to access ALL BigQuery resources that this service account has permissions to: -Dtesting.bigquery.parent-project-id=data-2-644 <--------------- data project id set as parent project
-Dtesting.bigquery.project-id=data-2-644
-Dtesting.bigquery.credentials-key=base-30275-service-account-34@base-30275.iam.gserviceaccount.com -- access to data project:
SELECT * FROM TABLE(bigquery.system.query(query => 'SELECT schema_name FROM `data-2-644.region-us.INFORMATION_SCHEMA.SCHEMATA`'));
-- access to parent project:
SELECT * FROM TABLE(bigquery.system.query(query => 'SELECT schema_name FROM `base-30275.region-us.INFORMATION_SCHEMA.SCHEMATA`'));
-- OK <--------------- has access because service account has permissions in parent
-- access to another data project:
SELECT * FROM TABLE(bigquery.system.query(query => 'SELECT schema_name FROM `data-1-13082.region-us.INFORMATION_SCHEMA.SCHEMATA`'));
-- OK <--------------- has access because service account has permissions in other data project as well If, however, used with service account that has no BigQuery permissions in the parent project, access is denied: -Dtesting.bigquery.parent-project-id=data-2-644
-Dtesting.bigquery.project-id=data-2-644
-Dtesting.bigquery.credentials-key=base-30275-service-account-34@base-30275.iam.gserviceaccount.com -- access to data project:
SELECT * FROM TABLE(bigquery.system.query(query => 'SELECT schema_name FROM `data-2-644.region-us.INFORMATION_SCHEMA.SCHEMATA`'));
-- access to parent project:
SELECT * FROM TABLE(bigquery.system.query(query => 'SELECT schema_name FROM `base-30275.region-us.INFORMATION_SCHEMA.SCHEMATA`'));
-- Failed to get destination table for query. Access Denied: Table base-30275:region-us. INFORMATION_SCHEMA. SCHEMATA: User does not have permission to query table base-30275:region-us. INFORMATION_SCHEMA. SCHEMATA, or perhaps it does not exist. SummaryIf a BigQuery catalog is configured with an SA JSON key and a project ID, then through PTF, one effectively gains access to all BigQuery projects that the service account has permissions for. Ultimately, do we want to mention this in the documentation? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great job demystifying this.
Everyone was so confused for so long and wasn't able to figure out how or why it works.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall we might want to rename the section since it seems all about billing ... also since it seems pretty critical we might want to add an anchor and link to it from the configs for each of these properties with a "see also .."
data in multiple GCP projects, You need to create several catalogs, each | ||
pointing to a different GCP project. For example, if you have two GCP projects, | ||
one for the sales and one for analytics, you can create two properties files in | ||
`etc/catalog` named `sales.properties` and `analytics.properties`, both | ||
having `connector.name=bigquery` but with different `project-id`. This will | ||
create the two catalogs, `sales` and `analytics` respectively. | ||
|
||
### Understanding Project ID Resolution | ||
|
||
The BigQuery connector determines the project ID to use based on the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Somewhere we need to link to some sort docs from BigQuery that explains more about the project ID ideally .. maybe right here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fine idea! I am having trouble finding any relevant documentation on this topic that is more than a brief mention in the BigQuery section of Google Cloud docs. There are some third party sources like https://productresources.collibra.com/docs/collibra/latest/Content/DataQuality/DBConnection/ta_bigquery-cross-account-dataset-access.htm
We could have a reference to a general cross project SA page like this one instead, WDYT? https://cloud.google.com/iam/docs/attach-service-accounts#attaching-different-project
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe use this link https://cloud.google.com/resource-manager/docs/creating-managing-projects .. it explicitly explains Project ID
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went ahead and added a sentence, please edit as you see fit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Link the project ID
to the link I found
With regards to the query table function and it being a potential security issue.. thats already documented in a generic fashion .. adding more details specific to BigQuery would also be good. |
164193f
to
c71334f
Compare
Thank you @mosabua! Gave it another try, PTAL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM % nits
3758f98
to
211a2f1
Compare
Sorry, forgot to actually push my latest changes yesterday... All pushed now |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few nits .. then ready to go ... ping me after last push and I can merge
The BigQuery connector can only access a single GCP project.Thus, if you have | ||
data in multiple GCP projects, You need to create several catalogs, each | ||
The BigQuery connector can only access a single GCP project. Thus, if you have | ||
data in multiple GCP projects, you need to create several catalogs, each |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
data in multiple GCP projects, you need to create several catalogs, each | |
data in multiple GCP projects, you must create several catalogs, each |
@@ -74,14 +74,47 @@ bigquery.project-id=<your Google Cloud Platform project id> | |||
|
|||
### Multiple GCP projects | |||
|
|||
The BigQuery connector can only access a single GCP project.Thus, if you have | |||
data in multiple GCP projects, You need to create several catalogs, each | |||
The BigQuery connector can only access a single GCP project. Thus, if you have |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The BigQuery connector can only access a single GCP project. Thus, if you have | |
The BigQuery connector can only access a single GCP project. If you have |
(bigquery-project-id-resolution)= | ||
### Billing and data projects | ||
|
||
The BigQuery connector determines the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix wrapping in this paragraph to 80 char .. currently its weird
[project ID](https://cloud.google.com/resource-manager/docs/creating-managing-projects) | ||
to use based on the configuration settings. | ||
This behavior provides users with flexibility in selecting both | ||
the project to query and the project to be billed for BigQuery operations. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the project to query and the project to be billed for BigQuery operations. | |
the project to query and the project to bill for BigQuery operations. |
@mosabua pls go ahead and change the wording yourself - seems straightforward. |
I don't have time for that until Friday |
Description
Add a section on project ID resolution.
Additional context and related issues
Release notes
( ) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
( ) Release notes are required, with the following suggested text: