-
Notifications
You must be signed in to change notification settings - Fork 110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory Leak #255
Comments
I have also been wondering lately if it were leaking. I wish we could use Valgrind for this. |
Hmm, maybe it's easiest if we just start at the parts of core.jl that deal with memory allocation and deallocation and see if there are obvious mistakes. It would be nice if we could get some kind of test code that could tell us if the memory leak was still happening, even if it's not minimal. |
I tried a few changes on https://github.com/malmaud/TensorFlow.jl/tree/jmm/memory_leak, not sure if they will help. |
This is expressly what valgind is for. It is not minimal, and I'm not sure it is going to actually give useful information. Rust's TensorFlow bindings are looking at Vagrinding: tensorflow/rust#69 Failing that, |
@malmaud I think you may have plugged it! Memory usage is stable after 20 epochs, which it definitely never was before. Nice work! It jumps up (significantly!) between epochs, but drops back down to a stable point which it did not do before. Cheers! |
@staticfloat I think we should leave this open til that branch gets merged. |
Closed in #256 |
TensorFlow seems to be leaking memory, but I have not yet figured out where this is happening. It's not leaking Julia objects, because
whos()
can't account for the memory usage. A graph of the free memory in my system is shown here. You can see my system starting to swap out to disk around 20:00. I killed the Julia process around 20:44.My best guess is that we're leaking memory within the TensorFlow C library within my train loop. I've tried reproducing this with a smaller example like
examples/logistic.jl
but of course it doesn't happen. Usinggdb
to look at places wheremmap()
is being called, it's all either within Julia's array allocation routines duringfeed_dict
construction time, or withinEigen
inside of tensorflow.I would post my code, but there's so much of it it would be unfair to you. Do you have any general debugging tips for tracking something like this down?
The text was updated successfully, but these errors were encountered: