Investigate memory leak #2403

carlfm01 · 2019-10-07T08:58:32Z

Have I written custom code (as opposed to running examples on an unmodified clone of the repository): No
OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Windows 10 Enterprise
TensorFlow version of the Nuget: v1.13.1-13-g174b4760eb
DeepSpeech version of the Nuget:v0.6.0-alpha.1-0-g90e7980
Python version: NA
Bazel version (if compiling from source): NA
GCC/Compiler version (if compiling from source): NA
CUDA/cuDNN version: NA
GPU model and memory: NA
Exact command to reproduce: NA. Using C# client
Nuget version: :0.6.0-alpha.1

Description:
Memory not being released on model destroy when using the language model.

The issue is coming from 0.6.0-alpha.1, 0.6.0-alpha.0 works just fine.

@lissyx the LAZY config change is not the source of the issue, lazy change: #2385

The text was updated successfully, but these errors were encountered:

lissyx · 2019-10-07T08:59:31Z

The issue is coming from 0.6.0-alpha.1, 0.6.0-alpha.0 works just fine.

Thanks for that. Do you reproduce on Linux as well?

carlfm01 · 2019-10-07T09:00:31Z

Thanks for that. Do you reproduce on Linux as well?

Not yet, I was testing the Nugets first

lissyx · 2019-10-07T09:01:28Z

The only big changes in that window are indeed related to the LM: 31afc68 cc @reuben

reuben · 2019-10-07T09:08:03Z

In your tests are you specifying an empty trie path and relying on the Scorer to create it dynamically? In other words, is Scorer::fill_dictionary being called?

carlfm01 · 2019-10-07T09:13:44Z

In your tests are you specifying an empty trie path and relying on the Scorer to create it dynamicall

No, I'm using a trie file path, let me see without it.

carlfm01 · 2019-10-07T10:10:36Z

Without the trie, it keeps in the range from 2GB to 4GB every run, without a noticeable difference in resource usage from the first one and the last one.

lissyx · 2019-10-07T15:09:32Z

So far I have a hard time reproducing the leak. @carlfm01 do you have specific code to share for reproducing ? amounts of memory leaked ?

lissyx · 2019-10-07T16:10:29Z

@carlfm01 Your previous message mentionned growing from 200MB to 700MB over 20 iterations, that would mean we are loosing 25MB per run.

So far, I can only account for ~2MB at best:

==22384== LEAK SUMMARY:
==22384==    definitely lost: 24 bytes in 1 blocks
==22384==    indirectly lost: 0 bytes in 0 blocks
==22384==      possibly lost: 332,124 bytes in 1,521 blocks
==22384==    still reachable: 2,655,432 bytes in 33,076 blocks
==22384==                       of which reachable via heuristic:
==22384==                         newarray           : 52,576 bytes in 196 blocks
==22384==         suppressed: 0 bytes in 0 blocks

And most of it is from TensorFlow itself, and this might be false-positive from valgrind.

The only items that would connect to language model / decoder would account only for a few bytes (10 occurrences, between 24 bytes and 32 bytes, so at worst it's < 320 bytes per run). Obivously, very far away from what you experience. I do fear this might be Windows-specific. Or at least, not reproductible on Linux / under valgrind.

carlfm01 · 2019-10-07T20:42:10Z

do you have specific code to share for reproducing?

Just the console example using a for over the same file::https://gist.github.com/carlfm01/fd69a8ca2784837dabf9375d35258953#file-test-cs-L59
To see the memory usage I'm using the VS profiler(poor details of the unmanaged side)

Now I want to compile the console client for Linux to perform the same test, or that was exactly what you did?

carlfm01 · 2019-10-08T01:45:07Z

Finally compiled and now testing:

Versions:

TensorFlow: v1.14.0-16-g3b4ce374f5
DeepSpeech: v0.6.0-alpha.8-16-gfb611ef

Valgrind report for 1 run:

==32502== LEAK SUMMARY:
==32502==    definitely lost: 24 bytes in 1 blocks
==32502==    indirectly lost: 214 bytes in 5 blocks
==32502==      possibly lost: 331,620 bytes in 1,500 blocks
==32502==    still reachable: 2,115,668 bytes in 39,762 blocks
==32502==                       of which reachable via heuristic:
==32502==                         stdstring          : 465,999 bytes in 11,883 blocks
==32502==                         newarray           : 42,880 bytes in 194 blocks
==32502==         suppressed: 0 bytes in 0 blocks

20 runs:

==45309==
==45309== LEAK SUMMARY:
==45309==    definitely lost: 7,520 bytes in 42 blocks
==45309==    indirectly lost: 7,959,270 bytes in 102,263 blocks
==45309==      possibly lost: 2,973,857 bytes in 34,726 blocks
==45309==    still reachable: 2,154,783 bytes in 40,201 blocks
==45309==                       of which reachable via heuristic:
==45309==                         stdstring          : 938,728 bytes in 17,033 blocks
==45309==                         newarray           : 208,224 bytes in 988 blocks

I do fear this might be Windows-specific.

Given the results looks you are correct.

Now looking :kkm000/openfst#8

lissyx · 2019-10-08T06:41:51Z

Now looking :kkm000/openfst#8

So it would mean ConstFst is not really doing mmap() on windows, and thus we leak from there?

carlfm01 · 2019-10-08T16:46:57Z

not really doing mmap() on windows

Instead is using a custom implementation using a buffer: https://github.com/kkm000/openfst/blob/989affd3043b6357e6047a395565c3e0d979c01f/src/lib/mapped-file.cc#L48

I'll compile and debug that file, I also want to test a few things with https://code.google.com/archive/p/mman-win32/ and see if we can get rid of the buffer implementation.

carlfm01 · 2019-10-09T06:12:44Z

The destructor of MappedFile is never called:

DeepSpeech/native_client/ctcdecode/third_party/openfst-1.6.9-win/src/lib/mapped-file.cc

Line 26 in 031479d

MappedFile::~MappedFile() {

then never executes:

DeepSpeech/native_client/ctcdecode/third_party/openfst-1.6.9-win/src/lib/mapped-file.cc

Line 38 in 031479d

operator delete(static_cast<char *>(region_.data) - region_.offset);

Before I found that the deconstructor is not called, I tried to replicate with the python client on windows and turns out that the python client does no contains freeModel, why @lissyx ? I'm missing something?
Windows Python version : deepspeech==0.6.0a8

lissyx · 2019-10-09T06:58:04Z

turns out that the python client does no contains freeModel

It is handled by the wrapper:

DeepSpeech/native_client/python/__init__.py

Lines 42 to 45 in 031479d

    
           def __del__(self): 
        
               if self._impl: 
        
                   deepspeech.impl.FreeModel(self._impl) 
        
                   self._impl = None

lissyx · 2019-10-09T07:10:17Z

The deconstructor of MappedFile is never called

Just to make sure, this is not only when using the Python code, this is always ?

carlfm01 · 2019-10-09T07:30:10Z

Just to make sure, this is not only when using the Python code, this is always ?

Yes always, with both, python client and the c# client.

lissyx · 2019-10-09T07:31:46Z

c# client.

Can you replicate with the C++ basic client ? Just to see if the .Net bindings could have a play in the equation.

lissyx · 2019-10-09T08:26:23Z

MappedFile as much as I can read in the windows part is all std::unique_ptr<> scoped, is it possible we are missing something at a upper level?

carlfm01 · 2019-10-09T18:20:53Z

Can you replicate with the C++ basic client ?

yes I'll test the basic C++ client, allow me some time to complete my builds and switch back to r1.14

carlfm01 · 2019-10-10T09:21:13Z

yes I'll test the basic C++ client,

Dealing with make: \bin\amd64\cl.exe: Command not found, I'll read the cluster examples again and try carefully.

lissyx · 2019-10-14T15:43:29Z

yes I'll test the basic C++ client,

Dealing with make: \bin\amd64\cl.exe: Command not found, I'll read the cluster examples again and try carefully.

Have you been able to sort this out?

carlfm01 · 2019-10-15T03:26:36Z

Hello @lissyx, unfortunately not, last week was a busy week working on TTS. I tried with short time windows but did not get any luck.

I'm back to it :)

carlfm01 · 2019-10-16T01:12:11Z

Just realized I wasted my time, Bazel is not detecting changes inside header files :), manually removed the fst.obj under _objs and now I see the execution path printed (With this I was trying to see where is mapped-file allocated but not released).

Bazel version: 0.24.1

At this point I don't know which changes were applied for the tests, testing again... :/

Now about the C++ basic client:

Can you replicate with the C++ basic client ?

Yes, is eating the same amount of memory as the .Net client.

make: \bin\amd64\cl.exe: Command not found

solved this by replacing :

DeepSpeech/native_client/definitions.mk

Line 39 in 5fa6d23

TOOLCHAIN := '$(VCINSTALLDIR)\bin\amd64\'

with my full path to the cl.exe of my VS and then running vcvars64 before the make command

lissyx · 2019-10-16T13:43:12Z

with my full path to the cl.exe of my VS and then running vcvars64 before the make command

Right, reminds me of things I had to do on TaskCluster. I assumed that on developper systems, this should be already dealt with.

Can you replicate with the C++ basic client ?

Yes, is eating the same amount of memory as the .Net client.

Good, at least confirms it's not coming from the bindings. Do you think you can investigate why the destructor is not called ?

We have some code that triggers some lost but still reachable memory under valgrind on linux, and it deals with what calls this, so I'm wondering if this is not the root cause indeed, and we are just more lucky / going through another path on linux to free ?

carlfm01 · 2019-10-17T07:29:40Z

Do you think you can investigate why the destructor is not called ?

Yes, I'm already trying to spot the issue, but due to my newbie eyes for C++ I'm not making any significant progress.

The only thing that seems wrong apart from the destructor of MappedFile never called is the destructor of ConstFstImpl only called for the first run, then never again. I want to see if this happens also on Linux.

scoped, is it possible we are missing something at a upper level?

Upper ConstFstImpl is ConstFst which is used for PathTrie as FstType I sort of feel the issue is coming from PathTrie and the usage of share_ptr:

DeepSpeech/native_client/ctcdecode/path_trie.cpp

Line 83 in 336daa1

new_path->matcher_ = matcher_;

¿What do you think?

lissyx · 2019-10-17T16:18:33Z

@carlfm01 Long-shot, but doing so does not seems to trigger issues here:

diff --git a/native_client/ctcdecode/path_trie.cpp b/native_client/ctcdecode/path_trie.cpp
index 51f75ff..dee792d 100644
--- a/native_client/ctcdecode/path_trie.cpp
+++ b/native_client/ctcdecode/path_trie.cpp
@@ -33,6 +33,8 @@ PathTrie::~PathTrie() {
   for (auto child : children_) {
     delete child.second;
   }
+
+  matcher_ = nullptr;
 }
 
 PathTrie* PathTrie::get_path_trie(int new_char, int new_timestep, float cur_log_prob_c, bool reset) {

This should make sure that any PathTrie destruction frees the matching allocation of matcher_.

lissyx · 2019-10-17T17:26:31Z

@carlfm01 I'm having more doubts (and @reuben shares this as well) against dictionary_ there, which is std::unique_ptr<> in native_client/ctcdecode/scorer.h and that we ->Copy(true) in native_client/ctcdecode/ctc_beam_search_decoder.cpp. This Copy() call triggers a new behind.

Fixes mozilla#2403

lissyx · 2019-10-18T10:15:21Z

i'm keeping that open until @carlfm01 can confirm this is fixed

carlfm01 · 2019-10-19T04:03:02Z

Thanks @lissyx

Using :
TensorFlow: v1.14.0-16-g3b4ce374f5
DeepSpeech: v0.6.0-alpha.10-2-g469ddd2

First run 53MB, last run 45MB.(.Net client)

I can confirm that issue is fixed for the .Net client.

Testing with Python and C++ client, looks like the C++ client is not releasing completely.

lissyx · 2019-10-19T09:06:22Z

I'm going to close it then, we can still fix the C++ client but if the leak in the lib is fixed it's the most important :)

lock · 2019-11-18T09:56:43Z

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

carlfm01 self-assigned this Oct 7, 2019

carlfm01 added the bug label Oct 7, 2019

lissyx pushed a commit to lissyx/STT that referenced this issue Oct 18, 2019

Use std::shared_ptr instead of raw pointer for dictionary_

ef3f800

Fixes mozilla#2403

lissyx mentioned this issue Oct 18, 2019

Use std::shared_ptr instead of raw pointer for dictionary_ #2448

Merged

lissyx closed this as completed in #2448 Oct 18, 2019

lissyx reopened this Oct 18, 2019

lissyx closed this as completed Oct 19, 2019

lock bot locked and limited conversation to collaborators Nov 18, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Investigate memory leak #2403

Investigate memory leak #2403

carlfm01 commented Oct 7, 2019

lissyx commented Oct 7, 2019

carlfm01 commented Oct 7, 2019

lissyx commented Oct 7, 2019

reuben commented Oct 7, 2019

carlfm01 commented Oct 7, 2019 •

edited

Loading

carlfm01 commented Oct 7, 2019

lissyx commented Oct 7, 2019

lissyx commented Oct 7, 2019

carlfm01 commented Oct 7, 2019

carlfm01 commented Oct 8, 2019

lissyx commented Oct 8, 2019

carlfm01 commented Oct 8, 2019

carlfm01 commented Oct 9, 2019 •

edited

Loading

lissyx commented Oct 9, 2019

lissyx commented Oct 9, 2019

carlfm01 commented Oct 9, 2019

lissyx commented Oct 9, 2019

lissyx commented Oct 9, 2019

carlfm01 commented Oct 9, 2019 •

edited

Loading

carlfm01 commented Oct 10, 2019 •

edited

Loading

lissyx commented Oct 14, 2019

carlfm01 commented Oct 15, 2019

carlfm01 commented Oct 16, 2019 •

edited

Loading

lissyx commented Oct 16, 2019

carlfm01 commented Oct 17, 2019 •

edited

Loading

lissyx commented Oct 17, 2019

lissyx commented Oct 17, 2019 •

edited

Loading

lissyx commented Oct 18, 2019

carlfm01 commented Oct 19, 2019

lissyx commented Oct 19, 2019

lock bot commented Nov 18, 2019

Investigate memory leak #2403

Investigate memory leak #2403

Comments

carlfm01 commented Oct 7, 2019

lissyx commented Oct 7, 2019

carlfm01 commented Oct 7, 2019

lissyx commented Oct 7, 2019

reuben commented Oct 7, 2019

carlfm01 commented Oct 7, 2019 • edited Loading

carlfm01 commented Oct 7, 2019

lissyx commented Oct 7, 2019

lissyx commented Oct 7, 2019

carlfm01 commented Oct 7, 2019

carlfm01 commented Oct 8, 2019

lissyx commented Oct 8, 2019

carlfm01 commented Oct 8, 2019

carlfm01 commented Oct 9, 2019 • edited Loading

lissyx commented Oct 9, 2019

lissyx commented Oct 9, 2019

carlfm01 commented Oct 9, 2019

lissyx commented Oct 9, 2019

lissyx commented Oct 9, 2019

carlfm01 commented Oct 9, 2019 • edited Loading

carlfm01 commented Oct 10, 2019 • edited Loading

lissyx commented Oct 14, 2019

carlfm01 commented Oct 15, 2019

carlfm01 commented Oct 16, 2019 • edited Loading

lissyx commented Oct 16, 2019

carlfm01 commented Oct 17, 2019 • edited Loading

lissyx commented Oct 17, 2019

lissyx commented Oct 17, 2019 • edited Loading

lissyx commented Oct 18, 2019

carlfm01 commented Oct 19, 2019

lissyx commented Oct 19, 2019

lock bot commented Nov 18, 2019

carlfm01 commented Oct 7, 2019 •

edited

Loading

carlfm01 commented Oct 9, 2019 •

edited

Loading

carlfm01 commented Oct 9, 2019 •

edited

Loading

carlfm01 commented Oct 10, 2019 •

edited

Loading

carlfm01 commented Oct 16, 2019 •

edited

Loading

carlfm01 commented Oct 17, 2019 •

edited

Loading

lissyx commented Oct 17, 2019 •

edited

Loading