-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor(operation_config): change logging on is_profiling_enabled #12416
refactor(operation_config): change logging on is_profiling_enabled #12416
Conversation
…ed to debug logging due to the is_profiling_enabled method getting called for every data set, it spams log output with these lines.
Codecov ReportAll modified and coverable lines are covered by tests ✅
... and 5 files with indirect coverage changes Continue to review full report in Codecov by Sentry.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Honestly it seems strange that we're repeatedly calling is_profiling_enabled
I think the correct solution is to actually cache the result of is_profiling_enabled
/ only call it once, not reducing the log level.
yeah that's fair. is it worthwhile for me to try and make this change? Where would be the best spot to start? This function is called from many different sources. |
I'm no python pro or anything, but would what I did to iceberg be what we want? I don't mind doing it for the others that use this function. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sort of approach - particularly the lru cache one, feels pretty reasonable. If you're up for it, it'd make sense to make similar changes to other sources as well
metadata-ingestion/src/datahub/ingestion/source/iceberg/iceberg_common.py
Outdated
Show resolved
Hide resolved
metadata-ingestion/src/datahub/ingestion/source_config/operation_config.py
Outdated
Show resolved
Hide resolved
I started going through the other sources and found this - https://github.com/datahub-project/datahub/blob/master/metadata-ingestion/src/datahub/ingestion/source/sql/sql_config.py#L121-L126 Would it not make sense to just add the caching to the operation_config function? One stop and it should fix them all eh? |
@cccs-cat001 yeah I think that makes sense - if we just did what that comment suggested, this would work more nicely We'd need to add a dependency on cachetools for any source that uses profiling, but I think that's acceptable |
alright... this is my first time contributing... so how can I fix the build 🤣🤣 |
You can run the linter and tests locally: https://datahubproject.io/docs/metadata-ingestion/developing/#code-style |
metadata-ingestion/src/datahub/ingestion/source_config/operation_config.py
Show resolved
Hide resolved
Hey looks like the CI has passed, can we merge in now? :) |
Fixes a regression from #12416
due to the is_profiling_enabled method getting called for every data set, it spams log output with these info lines.
Checklist