Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(core): memory leak in memorized decorator #17319

Closed
wants to merge 1 commit into from

Conversation

zhaoyongjie
Copy link
Member

@zhaoyongjie zhaoyongjie commented Nov 2, 2021

SUMMARY

closes: #15132
Currently, we use @memorized decorator to cache function results in memory. Obviously, the original decorator does not consider:

  1. how to release memory in the application process
  2. how to define cache size

In this PR:

  1. I used a simplefunctools.lru_cache wrapper replaced with original memorized.
  2. I used memoized_func replaced with memoized when need to define a custom cache key. pickle cannot serialize SQLAlchemy engine object, raise an error TypeError: can't pickle _thread._local objects, so we we can't use memoized_func to cache this function
  3. because functools.lru_cache can't support customized cache-key, add a minor proxy function to delegate get_sqla_engine function

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

TESTING INSTRUCTIONS

ADDITIONAL INFORMATION

  • Has associated issue: Superset memory leak #15132
  • Required feature flags:
  • Changes UI
  • Includes DB Migration (follow approval process in SIP-59)
    • Migration is atomic, supports rollback & is backwards-compatible
    • Confirm DB migration upgrade and downgrade tested
    • Runtime estimates and downtime expectations provided
  • Introduces new feature or API
  • Removes existing feature or API

@zhaoyongjie zhaoyongjie requested a review from villebro November 2, 2021 11:28
@zhaoyongjie zhaoyongjie changed the title fix(core): memory leak in memorized decorator [WIP]fix(core): memory leak in memorized decorator Nov 2, 2021
@zhaoyongjie zhaoyongjie changed the title [WIP]fix(core): memory leak in memorized decorator fix(core): memory leak in memorized decorator Nov 2, 2021
@codecov
Copy link

codecov bot commented Nov 2, 2021

Codecov Report

Merging #17319 (37540f2) into master (ca93d63) will increase coverage by 10.56%.
The diff coverage is 100.00%.

❗ Current head 37540f2 differs from pull request most recent head 88eea26. Consider uploading reports for the commit 88eea26 to get more accurate results

Impacted file tree graph

@@             Coverage Diff             @@
##           master   #17319       +/-   ##
===========================================
+ Coverage   66.40%   76.96%   +10.56%     
===========================================
  Files        1641     1037      -604     
  Lines       63520    55629     -7891     
  Branches     6422     7608     +1186     
===========================================
+ Hits        42178    42817      +639     
+ Misses      19681    12562     -7119     
+ Partials     1661      250     -1411     
Flag Coverage Δ
hive ?
mysql 81.98% <100.00%> (+0.14%) ⬆️
postgres 81.99% <100.00%> (+0.10%) ⬆️
presto 81.86% <100.00%> (+29.41%) ⬆️
python 82.25% <100.00%> (-0.07%) ⬇️
sqlite 81.66% <100.00%> (+0.08%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
superset/db_engine_specs/base.py 88.58% <ø> (-0.90%) ⬇️
superset/models/core.py 90.07% <100.00%> (+1.15%) ⬆️
superset/utils/memoized.py 100.00% <100.00%> (+13.15%) ⬆️
superset/db_engines/hive.py 0.00% <0.00%> (-85.19%) ⬇️
...ntend/src/explore/components/ExploreChartPanel.jsx 14.28% <0.00%> (-57.72%) ⬇️
superset/db_engine_specs/__init__.py 46.15% <0.00%> (-42.31%) ⬇️
...tend/src/filters/components/Select/controlPanel.ts 58.33% <0.00%> (-41.67%) ⬇️
superset-frontend/src/chart/ChartErrorMessage.tsx 50.00% <0.00%> (-25.00%) ⬇️
superset/common/request_contexed_based.py 80.00% <0.00%> (-20.00%) ⬇️
...-frontend/src/visualizations/presets/MainPreset.js 80.00% <0.00%> (-20.00%) ⬇️
... and 1492 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ca93d63...88eea26. Read the comment docs.

@zhaoyongjie zhaoyongjie force-pushed the fix_memory_lack branch 2 times, most recently from 1c611a7 to 46eeb54 Compare December 21, 2021 03:22
update doc

wip

wip

fix lint
@stale
Copy link

stale bot commented Apr 30, 2022

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. For admin, please label this issue .pinned to prevent stale bot from closing the issue.

@stale stale bot added the inactive Inactive for >= 30 days label Apr 30, 2022
@mdeshmu
Copy link
Contributor

mdeshmu commented Jul 7, 2022

I am also observing memory leak. Any updates on this PR?

@stale stale bot removed the inactive Inactive for >= 30 days label Jul 7, 2022
@zhaoyongjie
Copy link
Member Author

@mdeshmu let me update this PR and try to resolve it.

@@ -349,6 +349,29 @@ def get_sqla_engine(
nullpool: bool = True,
user_name: Optional[str] = None,
source: Optional[utils.QuerySource] = None,
) -> Engine:
cache_key = (
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we removed @memoized from this method?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, using secret (un-hashed) material as a cache key seems a bit dangerous. Ideally, we would at least hash before using as a cache key.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR has been open for a long time, I'll finish it and ping you. Thanks for the reviewing

def __repr__(self) -> str:
"""Return the function's docstring."""
return self.func.__doc__ or ""
def wrapper_cache(func: Callable[..., Any]) -> Callable[..., Any]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like the new version is missing the watch kwarg functionality that allowed args to be selectively included as part of the cache key

Copy link
Member Author

@zhaoyongjie zhaoyongjie Jul 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

arguments of the function are implicit watch hash id.

@mdeshmu
Copy link
Contributor

mdeshmu commented Jul 12, 2022

Thanks @zhaoyongjie

@mdeshmu
Copy link
Contributor

mdeshmu commented Aug 20, 2022

@zhaoyongjie can you please revisit this PR once you get time.

@mdeshmu
Copy link
Contributor

mdeshmu commented Jan 16, 2023

@zhaoyongjie any update here?

@rusackas IMO, Its not good for an excellent tool like Superset to have a code that causes memory leak, not fixed for an year. Can you please get this fixed ASAP. Appreciate everyone's efforts in keeping the tool top notch. Thanks.

@EugeneTorap
Copy link
Contributor

@zhaoyongjie @villebro @etr2460 @craig-rueda @dpgaspar Is it possible to fix the problem and merge it?

@EugeneTorap
Copy link
Contributor

Superseded by #23139
@villebro @dpgaspar close this PR?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Superset memory leak
5 participants