Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[#1462] fix(server): Memory may leak when flushQueue is full #1463

Merged

Conversation

rickyma
Copy link
Contributor

@rickyma rickyma commented Jan 17, 2024

What changes were proposed in this pull request?

Release the memory when a flush event is dropped.

Why are the changes needed?

For #1462

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Existing UTs.

@rickyma
Copy link
Contributor Author

rickyma commented Jan 17, 2024

@zuston Can you check this out? I think this is related to your PR #1461.
If you have any ideas, please ping me anytime.

cc @jerqi @xianjingfeng

@zuston
Copy link
Member

zuston commented Jan 17, 2024

Yes. This PR could be involved in #1461

@rickyma
Copy link
Contributor Author

rickyma commented Jan 17, 2024

If you have handled the leak in #1461, I can close this PR. I didn't look very carefully into your PR, have you handled the leak mentioned in my PR? @zuston

@codecov-commenter
Copy link

codecov-commenter commented Jan 17, 2024

Codecov Report

Attention: 2 lines in your changes are missing coverage. Please review.

Comparison is base (31dae44) 54.21% compared to head (9939137) 54.98%.
Report is 1 commits behind head on master.

Files Patch % Lines
...pache/uniffle/server/DefaultFlushEventHandler.java 0.00% 2 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master    #1463      +/-   ##
============================================
+ Coverage     54.21%   54.98%   +0.76%     
+ Complexity     2780     2776       -4     
============================================
  Files           426      407      -19     
  Lines         24237    21963    -2274     
  Branches       2063     2072       +9     
============================================
- Hits          13141    12077    -1064     
+ Misses        10280     9140    -1140     
+ Partials        816      746      -70     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@zuston
Copy link
Member

zuston commented Jan 17, 2024

If you have handled the leak in #1461, I can close this PR. I didn't look very carefully into your PR, have you handled the leak mentioned in my PR? @zuston

No. If you don't mind, please fix this after #1461 is merged.

@rickyma
Copy link
Contributor Author

rickyma commented Jan 17, 2024

If you have handled the leak in #1461, I can close this PR. I didn't look very carefully into your PR, have you handled the leak mentioned in my PR? @zuston

No. If you don't mind, please fix this after #1461 is merged.

I'm ok. I just hope your PR can be merged a bit faster, as I need these patches for testing. :)

@zuston
Copy link
Member

zuston commented Jan 17, 2024

If you have handled the leak in #1461, I can close this PR. I didn't look very carefully into your PR, have you handled the leak mentioned in my PR? @zuston

No. If you don't mind, please fix this after #1461 is merged.

I'm ok. I just hope your PR can be merged a bit faster, as I need these patches for testing. :)

This fixes the corner case, I don't think It will leak resource on prod.

@rickyma
Copy link
Contributor Author

rickyma commented Jan 17, 2024

Yes, you are right. 90% of the time it may be a normal scenario without memory leaks, but in the remaining 10% of cases, leaks may still occur. If the 10% scenario is triggered like crazy, it can still lead to serious memory leaks. I have encountered situations with crazy dropping events in our environment.

@zuston
Copy link
Member

zuston commented Jan 18, 2024

#1461 has been merged. Could you help update this patch? @rickyma

@rickyma rickyma force-pushed the potential-mem-leak-when-event-is-dropped branch from a4e3358 to f50a1b6 Compare January 18, 2024 14:19
@rickyma
Copy link
Contributor Author

rickyma commented Jan 18, 2024

#1461 has been merged. Could you help update this patch? @rickyma

After #1461 , many of my previous code modifications are no longer needed, I think only this change needs to be merged in. You can take a look. By the way, #1461 is a great patch. @zuston

@rickyma rickyma changed the title [#1462] fix(server): Memory may leak when a flush event is dropped [#1462] fix(server): Memory may leak when flushQueue is full Jan 18, 2024
@rickyma rickyma closed this Jan 18, 2024
@rickyma rickyma deleted the potential-mem-leak-when-event-is-dropped branch January 18, 2024 14:25
@rickyma rickyma restored the potential-mem-leak-when-event-is-dropped branch January 18, 2024 14:26
@rickyma rickyma reopened this Jan 18, 2024
@@ -69,6 +69,8 @@ public DefaultFlushEventHandler(
public void handle(ShuffleDataFlushEvent event) {
if (!flushQueue.offer(event)) {
LOG.error("Flush queue is full, discard event: " + event);
// We need to release the memory when discarding the event
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add the metric like ShuffleServerMetrics.counterTotalDroppedEventNum.inc();

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

@rickyma rickyma requested a review from zuston January 19, 2024 02:52
Copy link
Member

@zuston zuston left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks @rickyma

@zuston zuston merged commit 2bcaf9e into apache:master Jan 19, 2024
36 checks passed
@rickyma rickyma deleted the potential-mem-leak-when-event-is-dropped branch May 5, 2024 08:34
zuston pushed a commit to zuston/incubator-uniffle that referenced this pull request May 27, 2024
…pache#1463)

### What changes were proposed in this pull request?

Release the memory when a flush event is dropped.

### Why are the changes needed?

For [apache#1462](apache#1462)

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Existing UTs.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants