You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Third, I use mallinfo to trace memory consumption. It does consume some memory every batch, but I can't locate the operator: every operator use new or malloc to allocate memory in our code or std:: STL code.
Conclusion: Need help
Maybe memory fragment is the reason of memory leak. We can use malloc hook to manage our memory.
Background:
I found memory leaks in the process of run transformer model. Memory increases by speed about 100KB/batch. Both trainer and pserver meet the problem.
Generally, memory increases by two reasons:
And I found two location of not freed memory use
pprof
tool to run all C++ unit tests:But the memory increases over time even I solved the above.
Analysis
First, I use
pprof
andValgrind
to detect when runpython
interface, but it contains a lot of warningsSecond, I think maybe there's memory fragment in Glibc memory pool:
malloc_trim
to release not used memory: it's not helpful.TCMALLOC_RELEASE_RATE=10.0(max value)
: it's not helpful.tcmalloc
topaddle
: because our complicated dependency and the dependency order, I meetfree invalid pointer
error and so fail to link.Third, I think it's maybe the
python
memory leak:gc.collect
gc.garbage
to find uncollectable objects: there's nothing.Third, I use
mallinfo
to trace memory consumption. It does consume some memory every batch, but I can't locate the operator: every operator usenew
ormalloc
to allocate memory in our code or std:: STL code.Conclusion: Need help
Maybe memory fragment is the reason of memory leak. We can use malloc hook to manage our memory.
Reference:
valgrind + debug version python:
glibc malloc trace:
Diff is the memory consumed by operator.
Diff is the memory consumed by executor of every batch.
The text was updated successfully, but these errors were encountered: