Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(ingest): add lineage_client_project_id field to the BigQuery config #4138

Merged
merged 3 commits into from
Feb 28, 2022

Conversation

vcs9
Copy link
Contributor

@vcs9 vcs9 commented Feb 14, 2022

This allows users to specify which project to use when creating the BigQuery client, in case the default project_id is not used for querying.

Checklist

  • The PR conforms to DataHub's Contributing Guideline (particularly Commit Message Format)
  • Links to related issues (if applicable)
  • Tests for the changes have been added/updated (if applicable)
  • Docs related to the changes have been added/updated (if applicable)

@vcs9 vcs9 force-pushed the bq-query-project branch 2 times, most recently from e0121f6 to 2e08675 Compare February 14, 2022 22:32
@@ -336,7 +337,11 @@ def _compute_bigquery_lineage_via_gcp_logging(self) -> None:
def _compute_bigquery_lineage_via_exported_bigquery_audit_metadata(self) -> None:
logger.info("Populating lineage info via exported GCP audit logs")
try:
_client: BigQueryClient = BigQueryClient(project=self.config.project_id)
if self.config.lineage_client_project_id is None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this only for exported_bigquery_audit_metadata?

Copy link
Contributor Author

@vcs9 vcs9 Feb 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For our use case, we only need it for the exported_bigquery_audit_metadata, but I can also add to the code that makes the client for _compute_bigquery_lineage_via_gcp_logging()

@@ -336,7 +337,11 @@ def _compute_bigquery_lineage_via_gcp_logging(self) -> None:
def _compute_bigquery_lineage_via_exported_bigquery_audit_metadata(self) -> None:
logger.info("Populating lineage info via exported GCP audit logs")
try:
_client: BigQueryClient = BigQueryClient(project=self.config.project_id)
if self.config.lineage_client_project_id is None:
self.config.lineage_client_project_id = self.config.project_id
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would not overwrite the config value if possible but I would store it in a local variable just to be on the safe side like

project_id:str = self.config.lineage_client_project_id if self.config.lineage_client_project_id else self.config.project_id
_client: BigQueryClient = BigQueryClient(
                project=project_id
            )

if project_id is not None:
return [GCPLoggingClient(**client_options, project=project_id)]
else:
return [GCPLoggingClient(**client_options)]

def _choose_lineage_client_project_id(self) -> Optional[str]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: I'd make this _get for consistency and clarity (since this method doesn't actually accept any inputs)
Maybe this should be a property so the value is only computed once instead of on each function call?

Copy link
Contributor

@rslanka rslanka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@github-actions
Copy link

github-actions bot commented Feb 25, 2022

Unit Test Results (build & test)

  71 files  ±0    71 suites  ±0   16m 41s ⏱️ - 3m 59s
618 tests ±0  559 ✔️ ±0  59 💤 ±0  0 ±0 

Results for commit 2e4f24d. ± Comparison against base commit bcabff8.

♻️ This comment has been updated with latest results.

@github-actions
Copy link

github-actions bot commented Feb 25, 2022

Unit Test Results (metadata ingestion)

       5 files         5 suites   42m 42s ⏱️
   342 tests    342 ✔️   0 💤 0
1 557 runs  1 526 ✔️ 31 💤 0

Results for commit 2e4f24d.

♻️ This comment has been updated with latest results.

Copy link
Contributor

@shirshanka shirshanka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@shirshanka shirshanka merged commit 93ff095 into datahub-project:master Feb 28, 2022
maggiehays pushed a commit to maggiehays/datahub that referenced this pull request Aug 1, 2022
…fig (datahub-project#4138)

* feat(ingest): add lineage_client_project_id field to the bigquery config

* fix linting issues

* add type annotation for arguments
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants