Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

Memory leak in Naive engine when profiling #15375

Closed
larroy opened this issue Jun 26, 2019 · 18 comments · Fixed by #15405
Closed

Memory leak in Naive engine when profiling #15375

larroy opened this issue Jun 26, 2019 · 18 comments · Fixed by #15405
Labels

Comments

@larroy
Copy link
Contributor

larroy commented Jun 26, 2019

I believe there's a memory leak in Naive engine here:

https://github.com/apache/incubator-mxnet/blob/master/src/engine/naive_engine.cc#L164

I think it would be a great opportunity to refactor how NewOperator is released. As it's also difficult to reason about the release of the operator memory in the threaded engine (looks correct to me, but it takes quite a bit to dig it up and understand).

I would propose returning a unique_ptr directly with a custom deleter using a virtual operator() that is specialized for the engine type which will get the same effect as how operators are being released now. This would make easier to reason about this memory management part.

https://github.com/apache/incubator-mxnet/blob/master/src/engine/threaded_engine.cc#L270

@mxnet-label-bot
Copy link
Contributor

Hey, this is the MXNet Label Bot.
Thank you for submitting the issue! I will try and suggest some labels so that the appropriate MXNet community members can help resolve it.
Here are my recommended labels: Bug

@vdantu
Copy link
Contributor

vdantu commented Jun 26, 2019

@mxnet-label-bot add [bug]

@marcoabreu marcoabreu added the Bug label Jun 26, 2019
@anirudh2290
Copy link
Member

I agree with you about fixing the memory leak in the NaiveEngine and thanks for raising that. But i disagree about the need to refactor the release of operator memory in the threaded engine. This was a conscious design decision to use Object pool for operator objects which would be manually removed from and added back to the pool. This removal and addition was supposed to be managed by graph executors and other modules that push operators to the engine. If readability is really the issue that is very subjective and I don't really see the benefit trump the cost here.

@cjolivier01
Copy link
Member

I agree with Anirudh.

I didn’t write the object pool code, but the reason it was done that way was to
avoid a new/delete operation in-line to the profiling, thus causing profiling to introduce possibly extra locks (and associated heap code), depending on the allocation library used, causing the “Observer Effect”, which is not desired in profiling, of all things. Ideally, the naive engine would use the same mechanism (a better change, imho), but since it’s not used as much for production workloads, I suppose they didn’t think it was worth it to code it.

@larroy
Copy link
Contributor Author

larroy commented Jun 28, 2019

@anirudh2290 I proposed refactoring around a unique_ptr I don't think is going to change the way memory is released, maybe you got a different impression of what I was suggesting to do. I don't know why are you oposing a proposed change right away without considering the benefits. Having a pointer released in a call hiearchy 3 levels down leads to bugs and memory leaks, is difficult to reason about it. Best is to use RAII and return a unique_ptr or similar which holds ownership. I don't think the way the release of resources is going to be modified or performance impacted in any way, but If the proposal is blocked from the start makes no sense to even propose a PR, this is not very encouraging.

@larroy
Copy link
Contributor Author

larroy commented Jun 28, 2019

@cjolivier01 I didn't understand a word of what you are writing. I'm not proposing to change the object pool, just make NewOp return a smart pointer.

@anirudh2290
Copy link
Member

@larroy even less encouraging is having you make a PR and then blocking it. I was trying to save you some time. Ownership of the pointer rests with the module that pushes it. So imperative or graph executor should handle it. If you are really keen on doing this do it in the graph executor. There is no need to touch the engine code here.

Having a pointer released in a call hierarchy 3 levels down leads to bugs and memory leaks, is difficult to reason about it.

This is the design, Engine relies on object pool for allocation, and the modules depend on engine to handle this, the modules just tell them when it wants a new operator creator or destroyed.
I think the code is very logical and it makes sense to me. Also, the memory leak you found is nowhere related to the ThreadedEngine code.

I considered the benefits of your approach: "readability/understandability" in the first comment.

@larroy
Copy link
Contributor Author

larroy commented Jun 28, 2019

I will make a small fix to the leak in Naive engine, and a separate PR with a refactor, so we can have a more focused and productive conversation then. Thanks for your feedback.

@larroy
Copy link
Contributor Author

larroy commented Jun 28, 2019

@anirudh2290 it makes sense to you because you know the code, I would encourage you to read about the benfits of RAII. This is orthogonal to having an object pool for allocation, which I was not proposing to change.

@anirudh2290
Copy link
Member

it makes sense to you because you know the code

So who are you targeting, someone unfamiliar with the code. Everyone needs to spend some time with the code to understand it, but after they spend that time it makes sense.

I know the benefits of RAII, but it doesn't apply to this use case of threaded engine. I am saying there is no need to change threaded engine code here which you are proposing to do. RAII won't benefit here, because engine itself is only responsible for indicating to the object pool that the object has to be deleted. The rest is taken care by the object pool itself. As I said if you really want to add RAII, do it in the graph executor (although I don't think it is really needed there either).

@larroy
Copy link
Contributor Author

larroy commented Jun 28, 2019

Maybe you know the code better and what I propose doesn't make sense, my proposal was to wrap the operator in a unique_ptr with a custom deleter with a virtual apply operator, so delete doesn't need to be called here:

https://github.com/apache/incubator-mxnet/blob/master/src/engine/threaded_engine.cc#L472

But it will still use the object pool, this wouldn't be changed.

Did I understand correctly that you think this is not possible or that is better to explicitly delete the operator in this case?

The leak would not be in the first place if you would always pass around a smart pointer from the operator, that was my point, maybe I'm missing something which really needs the raw pointer around, but couldn't you pass around a unique_ptr that gets freed from the pool when needed? having the pointer deleted manually is not exactly exception safe.

@anirudh2290
Copy link
Member

First when you talk about leak, you talk about naive, which does things differently from threaded.

Coming to threaded, What will be the scope and lifetime of the unique_ptr, that you will be creating, how will dependency engine know when it has to be deleted ? It has individual view of operators not the full graph view.

I have spent a lot of time here, having this discussion, on something that hardly adds any value. I have said all I have to say.

larroy added a commit to larroy/mxnet that referenced this issue Jun 28, 2019
@larroy
Copy link
Contributor Author

larroy commented Jun 28, 2019

I have spent a lot of time here, having this discussion, on something that hardly adds any value. I have said all I have to say.

Hi @anirudh2290
I don't think this is very constructive to conclude your reply with that, if you are asking questions. I was trying to understand your point and explain my proposal. If you don't have time to followup, no need to start a discussion in the first place. I just made a suggestion on this ticket to prevent similar kinds of leaks in other places using this code. I respect that you have your own opinion, but for me having unmanaged pointers in many places with a complex delete pattern is creating an environment when reasoning about the code is hard and prone to bugs. Saying that something hardly adds any value is not very constructive if you don't try to understand the problem first. Cheers.

@anirudh2290
Copy link
Member

Okay, I will say it again: the leak is in naive engine and not the threaded engine. Also, let me say again for the threaded engine, there is no problem here. I would request you to spend time fixing some real issues that mxnet has.

I am going to close this issue, please feel free to open a new one for the naive engine memory leak.

@larroy
Copy link
Contributor Author

larroy commented Jun 28, 2019

Did you read the title of the issue? Why are you closing this? it says the leak is in the NaiveEngine when profiling. I said the leak is in Naive Engine since the beginning.

@anirudh2290
Copy link
Member

This thread has been derailed a lot from the original issue and the whole discussion has been about threaded engine where there is no issue. I will keep this closed, feel free to open a new issue and keep the discussion focused to naive engine.

@larroy
Copy link
Contributor Author

larroy commented Jun 29, 2019

Attaching valgrind run which shows the leaked memory.

naive_engine_leak

@larroy
Copy link
Contributor Author

larroy commented Jun 29, 2019

valgrind.txt

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants