Memory leak #19

michael-conrad · 2020-08-19T18:50:44Z

There appears to be some long term running memory leak, probably related to graphs. As the training progresses, my Xorg memory consumption gradually increases. If I stop the training, the memory is instantly released.

I suspect it is related to graphs because of the following warning:

RuntimeWarning: More than 20 figures have been opened. Figures created through the pyplot interface (`matplotlib.pyplot.figure`) are retained until explicitly closed and may consume too much memory. (To control this warning, see the rcParam `figure.max_open_warning`).
  fig = plt.figure(figsize=(16, 4))

Tomiinek · 2020-08-20T09:41:39Z

Hello, thank you for your observation!

I unfortunately cannot replicate the problem.
The code does not explicitly dispose created figures which are passed into tensorboard's SummaryWritter. However, the documentation of SummaryWritter.add_figure(tag, figure, global_step=None, close=True, walltime=None) says that the call should automatically close the figure if close=True.

Can you please change the utils/logging.py file as follows and test whether it works?

... 

# log spectrograms
if hp.normalize_spectrogram:
    predicted_spec = audio.denormalize_spectrogram(predicted_spec, not hp.predict_linear)
    f_predicted_spec = audio.denormalize_spectrogram(f_predicted_spec, not hp.predict_linear)
    target_spec = audio.denormalize_spectrogram(target_spec, not hp.predict_linear)

f = Logger._plot_spectrogram(predicted_spec)
Logger._sw.add_figure(f"Predicted/generated", f, eval_step)
plt.close(f)

f = Logger._plot_spectrogram(f_predicted_spec)
Logger._sw.add_figure(f"Predicted/forced", f, eval_step)
plt.close(f)

f = Logger._plot_spectrogram(target_spec)
Logger._sw.add_figure(f"Target/eval", f, eval_step) 
plt.close(f)

# log audio
waveform = audio.inverse_spectrogram(predicted_spec, not hp.predict_linear)
Logger._sw.add_audio(f"Audio/generated", waveform, eval_step, sample_rate=hp.sample_rate)  
waveform = audio.inverse_spectrogram(f_predicted_spec, not hp.predict_linear)
Logger._sw.add_audio(f"Audio/forced", waveform, eval_step, sample_rate=hp.sample_rate)              
        
# log alignment
alignment = alignment[idx, :target_len[idx], :source_len[idx]].data.cpu().numpy().T
f = Logger._plot_alignment(alignment)
Logger._sw.add_figure(f"Alignment/eval", f, eval_step)          
plt.close(f)   
        
# log source text
utterance = text.to_text(source[idx].data.cpu().numpy()[:source_len[idx]], hp.use_phonemes)
Logger._sw.add_text(f"Text/eval", utterance, eval_step)      
        
# log stop tokens
Logger._sw.add_figure(f"Stop/eval", Logger._plot_stop_tokens(stop_target[idx].data.cpu().numpy(), stop_prediction[idx].data.cpu().numpy()), eval_step) 
        
...

Thank you very much.

michael-conrad · 2020-08-21T19:00:08Z

Sorry the late response.

I added

plt.close("all")

as the last statement in both the evaluation reporting and the train reporting.

This seems to have solved the issue where the plots were causing the Xorg server to reserve memory for plots never being screen displayed.

michael-conrad · 2020-08-21T23:22:29Z

Sorry the late response.

I added
plt.close("all")

well.. suddenly started getting messages about freeing stuff up not in the main thread or some such.

So I changed it to read (based on your instructions):

@staticmethod
    def evaluation(eval_step, losses, mcd, source_len, target_len, source, target, prediction_forced, prediction, stop_prediction, stop_target, alignment, classifier):
        """Log evaluation results.
        
        Arguments:
            eval_step -- number of the current evaluation step (i.e. epoch)
            losses (dictionary of {loss name, value})-- dictionary with values of batch losses
            mcd (float) -- evaluation Mel Cepstral Distorsion
            source_len (tensor) -- number of characters of input utterances
            target_len (tensor) -- number of frames of ground-truth spectrograms
            source (tensor) -- input utterances
            target (tensor) -- ground-truth spectrograms
            prediction_forced (tensor) -- ground-truth-aligned spectrograms
            prediction (tensor) -- predicted spectrograms
            stop_prediction (tensor) -- predicted stop token probabilities
            stop_target (tensor) -- true stop token probabilities
            alignment (tensor) -- alignments (attention weights for each frame) of the last evaluation batch
            classifier (float) -- accuracy of the reversal classifier
        """  

        # log losses
        total_loss = sum(losses.values())
        Logger._sw.add_scalar(f'Eval/loss_total', total_loss, eval_step)
        for n, l in losses.items():
            Logger._sw.add_scalar(f'Eval/loss_{n}', l, eval_step) 

        # show random sample: spectrogram, stop token probability, alignment and audio
        idx = random.randint(0, alignment.size(0) - 1)
        predicted_spec = prediction[idx, :, :target_len[idx]].data.cpu().numpy()
        f_predicted_spec = prediction_forced[idx, :, :target_len[idx]].data.cpu().numpy()
        target_spec = target[idx, :, :target_len[idx]].data.cpu().numpy()  

        # log spectrograms
        if hp.normalize_spectrogram:
            predicted_spec = audio.denormalize_spectrogram(predicted_spec, not hp.predict_linear)
            f_predicted_spec = audio.denormalize_spectrogram(f_predicted_spec, not hp.predict_linear)
            target_spec = audio.denormalize_spectrogram(target_spec, not hp.predict_linear)
        
        f = Logger._plot_spectrogram(predicted_spec)
        Logger._sw.add_figure(f"Predicted/generated", f, eval_step)
        plt.close(f)
        
        f = Logger._plot_spectrogram(f_predicted_spec)
        Logger._sw.add_figure(f"Predicted/forced", f, eval_step)
        plt.close(f)
        
        f = Logger._plot_spectrogram(target_spec)
        Logger._sw.add_figure(f"Target/eval", f, eval_step) 
        plt.close(f)
        
        # log audio
        waveform = audio.inverse_spectrogram(predicted_spec, not hp.predict_linear)
        Logger._sw.add_audio(f"Audio/generated", waveform, eval_step, sample_rate=hp.sample_rate)  
        waveform = audio.inverse_spectrogram(f_predicted_spec, not hp.predict_linear)
        Logger._sw.add_audio(f"Audio/forced", waveform, eval_step, sample_rate=hp.sample_rate)              
        
        # log alignment
        alignment = alignment[idx, :target_len[idx], :source_len[idx]].data.cpu().numpy().T
        
        f=Logger._plot_alignment(alignment)
        Logger._sw.add_figure(f"Alignment/eval", f, eval_step)
        plt.close(f)                
        
        # log source text
        utterance = text.to_text(source[idx].data.cpu().numpy()[:source_len[idx]], hp.use_phonemes)
        Logger._sw.add_text(f"Text/eval", utterance, eval_step)      
        
        # log stop tokens
        f = Logger._plot_stop_tokens(stop_target[idx].data.cpu().numpy(), stop_prediction[idx].data.cpu().numpy())
        Logger._sw.add_figure(f"Stop/eval", f, eval_step) 
        plt.close(f)
        
        # log mel cepstral distorsion
        Logger._sw.add_scalar(f'Eval/mcd', mcd, eval_step)
        
        # log reversal language classifier accuracy
        if hp.reversal_classifier:
            Logger._sw.add_scalar(f'Eval/classifier', classifier, eval_step)

So far so good. At 6 epochs on the resumed training and the Xorg memory is no longer increasing every training loop. And no crashes.

michael-conrad · 2020-08-22T03:38:09Z

ack, now I'm getting crashes:

Exception ignored in: <function Image.__del__ at 0x7efbc32268c0>
Traceback (most recent call last):
  File "/home/muksihs/git/Multilingual_Text_to_Speech/env/lib/python3.7/tkinter/__init__.py", line 3507, in __del__
    self.tk.call('image', 'delete', self.name)
RuntimeError: main thread is not in main loop
Exception ignored in: <function Image.__del__ at 0x7efbc32268c0>
Traceback (most recent call last):
  File "/home/muksihs/git/Multilingual_Text_to_Speech/env/lib/python3.7/tkinter/__init__.py", line 3507, in __del__
    self.tk.call('image', 'delete', self.name)
RuntimeError: main thread is not in main loop
Exception ignored in: <function Image.__del__ at 0x7efbc32268c0>
Traceback (most recent call last):
  File "/home/muksihs/git/Multilingual_Text_to_Speech/env/lib/python3.7/tkinter/__init__.py", line 3507, in __del__
    self.tk.call('image', 'delete', self.name)
RuntimeError: main thread is not in main loop

Tomiinek · 2020-08-22T06:44:35Z

I stil cannot reproduce your issue 😥

This issue concerns something similar. It solves the problem with these changes, i.e.:

I fixed issue of #5 by changing the backend of matplotlib from Tkinter(TkAgg) to PyQt5(Qt5Agg).
(See https://stackoverflow.com/questions/14694408/runtimeerror-main-thread-is-not-in-main-loop and http://matplotlib.1069221.n5.nabble.com/Matplotlib-Tk-and-multithreading-td40647.html )

Another way is probably to remove the plt.close(...) as I suggested above and sometimes explicitly force garbage collecting:

import gc
gc.collect()

Can you try it out and let me know, please?

michael-conrad · 2020-08-22T08:49:12Z

ok, I added PyQt5 to the environment and added the following to the main script:

import matplotlib
matplotlib.use("Qt5Agg")

And I'm resuming training now.

michael-conrad · 2020-08-22T14:06:42Z

ok, I added PyQt5 to the environment and added the following to the main script:
import matplotlib
matplotlib.use("Qt5Agg")
And I'm resuming training now.

With the plot.close(f) code, no crashes because of thread violations so far. (5 hours run time).

Tomiinek · 2020-08-22T17:33:36Z

I am glad to hear that 🙂

Tomiinek added the bug Something isn't working label Aug 20, 2020

Tomiinek closed this as completed Sep 1, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memory leak #19

Memory leak #19

michael-conrad commented Aug 19, 2020

Tomiinek commented Aug 20, 2020 •

edited

Loading

michael-conrad commented Aug 21, 2020

michael-conrad commented Aug 21, 2020

michael-conrad commented Aug 22, 2020

Tomiinek commented Aug 22, 2020

michael-conrad commented Aug 22, 2020

michael-conrad commented Aug 22, 2020

Tomiinek commented Aug 22, 2020

Memory leak #19

Memory leak #19

Comments

michael-conrad commented Aug 19, 2020

Tomiinek commented Aug 20, 2020 • edited Loading

michael-conrad commented Aug 21, 2020

michael-conrad commented Aug 21, 2020

michael-conrad commented Aug 22, 2020

Tomiinek commented Aug 22, 2020

michael-conrad commented Aug 22, 2020

michael-conrad commented Aug 22, 2020

Tomiinek commented Aug 22, 2020

Tomiinek commented Aug 20, 2020 •

edited

Loading