Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(profiling) - Bigquery: Ability to disable partition profiling #4228

Merged

Conversation

treff7es
Copy link
Contributor

  • Ability to disable partition profiling
  • Creating profiling bigquery temp tables in the schema where the profiling table is by default.

Checklist

  • The PR conforms to DataHub's Contributing Guideline (particularly Commit Message Format)
  • Links to related issues (if applicable)
  • Tests for the changes have been added/updated (if applicable)
  • Docs related to the changes have been added/updated (if applicable)

Creating profiling bigquery temp tables in the schema where the profiling table is by default.
@github-actions
Copy link

github-actions bot commented Feb 23, 2022

Unit Test Results (build & test)

  71 files  +1    71 suites  +1   11m 47s ⏱️ +56s
618 tests +7  558 ✔️ +6  59 💤 ±0  1 +1 

For more details on these failures, see this check.

Results for commit edc7cbb. ± Comparison against base commit 49a8ece.

♻️ This comment has been updated with latest results.

@github-actions
Copy link

github-actions bot commented Feb 23, 2022

Unit Test Results (metadata ingestion)

       5 files  ±  0         5 suites  ±0   48m 46s ⏱️ + 5m 29s
   343 tests +11     343 ✔️ +11    0 💤 ±0  0 ±0 
1 562 runs  +55  1 531 ✔️ +62  31 💤  - 7  0 ±0 

Results for commit edc7cbb. ± Comparison against base commit 49a8ece.

♻️ This comment has been updated with latest results.

@@ -43,6 +43,7 @@ class GEProfilingConfig(ConfigModel):
# Hidden option - used for debugging purposes.
catch_exceptions: bool = True

partition_profiling_enabled: Optional[bool] = True
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove Optional since the default value True will always be present.

and not self.config.profiling.partition_profiling_enabled
):
logger.debug(
f"{dataset_name} is skipped because profiling.partition_profiling_enabled property is disabled"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably include the partition as well in the debug message.

f"{self.config.bigquery_temp_table_schema}.ge-temp-{uuid.uuid4()}"
)
ge_config["bigquery_temp_table"] = bigquery_temp_table
if custom_sql or self.config.limit or self.config.offset:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add comments about when and why the temp table is being created? The logic has a significant change from the earlier code.

@@ -167,7 +171,7 @@ views by setting `profiling.bigquery_temp_table_schema` property.
:::note

Due to performance reasons, we only profile the latest partition for Partitioned tables and the latest shard for sharded tables.

If you want you can set partiton you want partiton by setting `partition.partition_datetime` property. (this will be applied to all partitioned tables)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typos If you want you can set partiton you want partiton?

Copy link
Contributor

@rslanka rslanka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Copy link
Contributor

@shirshanka shirshanka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@shirshanka shirshanka merged commit 2a5cf3d into datahub-project:master Mar 2, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants