Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor(operation_config): change logging on is_profiling_enabled #12416

Merged
merged 7 commits into from
Jan 30, 2025

Conversation

cccs-cat001
Copy link
Contributor

due to the is_profiling_enabled method getting called for every data set, it spams log output with these info lines.
image

Checklist

  • The PR conforms to DataHub's Contributing Guideline (particularly Commit Message Format)
  • Links to related issues (if applicable)
  • Tests for the changes have been added/updated (if applicable)
  • Docs related to the changes have been added/updated (if applicable). If a new feature has been added a Usage Guide has been added for the same.
  • For any breaking change/potential downtime/deprecation/big changes an entry has been made in Updating DataHub

…ed to debug logging

due to the is_profiling_enabled method getting called for every data set, it spams log output with these lines.
@github-actions github-actions bot added ingestion PR or Issue related to the ingestion of metadata community-contribution PR or Issue raised by member(s) of DataHub Community labels Jan 21, 2025
@datahub-cyborg datahub-cyborg bot added the needs-review Label for PRs that need review from a maintainer. label Jan 21, 2025
Copy link

codecov bot commented Jan 21, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Files with missing lines Coverage Δ
...ion/src/datahub/ingestion/source/sql/sql_config.py 98.68% <ø> (-0.07%) ⬇️
...atahub/ingestion/source_config/operation_config.py 70.00% <100.00%> (+1.91%) ⬆️

... and 5 files with indirect coverage changes


Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 457f96e...e3e7d75. Read the comment docs.

Copy link
Collaborator

@hsheth2 hsheth2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly it seems strange that we're repeatedly calling is_profiling_enabled

I think the correct solution is to actually cache the result of is_profiling_enabled / only call it once, not reducing the log level.

@datahub-cyborg datahub-cyborg bot added pending-submitter-response Issue/request has been reviewed but requires a response from the submitter and removed needs-review Label for PRs that need review from a maintainer. labels Jan 21, 2025
@cccs-cat001
Copy link
Contributor Author

yeah that's fair. is it worthwhile for me to try and make this change? Where would be the best spot to start? This function is called from many different sources.

@datahub-cyborg datahub-cyborg bot added needs-review Label for PRs that need review from a maintainer. and removed pending-submitter-response Issue/request has been reviewed but requires a response from the submitter labels Jan 22, 2025
@cccs-cat001
Copy link
Contributor Author

I'm no python pro or anything, but would what I did to iceberg be what we want?

I don't mind doing it for the others that use this function.

Copy link
Collaborator

@hsheth2 hsheth2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sort of approach - particularly the lru cache one, feels pretty reasonable. If you're up for it, it'd make sense to make similar changes to other sources as well

@datahub-cyborg datahub-cyborg bot added pending-submitter-response Issue/request has been reviewed but requires a response from the submitter and removed needs-review Label for PRs that need review from a maintainer. labels Jan 23, 2025
@cccs-cat001
Copy link
Contributor Author

I started going through the other sources and found this - https://github.com/datahub-project/datahub/blob/master/metadata-ingestion/src/datahub/ingestion/source/sql/sql_config.py#L121-L126

Would it not make sense to just add the caching to the operation_config function? One stop and it should fix them all eh?

@datahub-cyborg datahub-cyborg bot added needs-review Label for PRs that need review from a maintainer. and removed pending-submitter-response Issue/request has been reviewed but requires a response from the submitter labels Jan 27, 2025
@hsheth2
Copy link
Collaborator

hsheth2 commented Jan 27, 2025

@cccs-cat001 yeah I think that makes sense - if we just did what that comment suggested, this would work more nicely

We'd need to add a dependency on cachetools for any source that uses profiling, but I think that's acceptable

@datahub-cyborg datahub-cyborg bot added pending-submitter-response Issue/request has been reviewed but requires a response from the submitter and removed needs-review Label for PRs that need review from a maintainer. labels Jan 27, 2025
@cccs-cat001
Copy link
Contributor Author

alright... this is my first time contributing... so how can I fix the build 🤣🤣

@datahub-cyborg datahub-cyborg bot added needs-review Label for PRs that need review from a maintainer. and removed pending-submitter-response Issue/request has been reviewed but requires a response from the submitter labels Jan 27, 2025
@hsheth2
Copy link
Collaborator

hsheth2 commented Jan 27, 2025

You can run the linter and tests locally: https://datahubproject.io/docs/metadata-ingestion/developing/#code-style

@datahub-cyborg datahub-cyborg bot added pending-submitter-response Issue/request has been reviewed but requires a response from the submitter and removed needs-review Label for PRs that need review from a maintainer. labels Jan 27, 2025
@datahub-cyborg datahub-cyborg bot added needs-review Label for PRs that need review from a maintainer. and removed pending-submitter-response Issue/request has been reviewed but requires a response from the submitter labels Jan 27, 2025
@datahub-cyborg datahub-cyborg bot added merge-pending-ci A PR that has passed review and should be merged once CI is green. and removed needs-review Label for PRs that need review from a maintainer. labels Jan 28, 2025
@cccs-cat001
Copy link
Contributor Author

Hey looks like the CI has passed, can we merge in now? :)

@hsheth2 hsheth2 merged commit bc09b28 into datahub-project:master Jan 30, 2025
195 of 222 checks passed
@cccs-cat001 cccs-cat001 deleted the profiling-logs-too-much branch January 30, 2025 13:02
hsheth2 added a commit that referenced this pull request Jan 30, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
community-contribution PR or Issue raised by member(s) of DataHub Community ingestion PR or Issue related to the ingestion of metadata merge-pending-ci A PR that has passed review and should be merged once CI is green.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants