Skip to content
This repository has been archived by the owner on Jan 18, 2024. It is now read-only.

Memory leak #19

Closed
michael-conrad opened this issue Aug 19, 2020 · 8 comments
Closed

Memory leak #19

michael-conrad opened this issue Aug 19, 2020 · 8 comments
Labels
bug Something isn't working

Comments

@michael-conrad
Copy link
Contributor

There appears to be some long term running memory leak, probably related to graphs. As the training progresses, my Xorg memory consumption gradually increases. If I stop the training, the memory is instantly released.

I suspect it is related to graphs because of the following warning:

RuntimeWarning: More than 20 figures have been opened. Figures created through the pyplot interface (`matplotlib.pyplot.figure`) are retained until explicitly closed and may consume too much memory. (To control this warning, see the rcParam `figure.max_open_warning`).
  fig = plt.figure(figsize=(16, 4))
@Tomiinek
Copy link
Owner

Tomiinek commented Aug 20, 2020

Hello, thank you for your observation!

I unfortunately cannot replicate the problem.
The code does not explicitly dispose created figures which are passed into tensorboard's SummaryWritter. However, the documentation of SummaryWritter.add_figure(tag, figure, global_step=None, close=True, walltime=None) says that the call should automatically close the figure if close=True.

Can you please change the utils/logging.py file as follows and test whether it works?

... 

# log spectrograms
if hp.normalize_spectrogram:
    predicted_spec = audio.denormalize_spectrogram(predicted_spec, not hp.predict_linear)
    f_predicted_spec = audio.denormalize_spectrogram(f_predicted_spec, not hp.predict_linear)
    target_spec = audio.denormalize_spectrogram(target_spec, not hp.predict_linear)

f = Logger._plot_spectrogram(predicted_spec)
Logger._sw.add_figure(f"Predicted/generated", f, eval_step)
plt.close(f)

f = Logger._plot_spectrogram(f_predicted_spec)
Logger._sw.add_figure(f"Predicted/forced", f, eval_step)
plt.close(f)

f = Logger._plot_spectrogram(target_spec)
Logger._sw.add_figure(f"Target/eval", f, eval_step) 
plt.close(f)

# log audio
waveform = audio.inverse_spectrogram(predicted_spec, not hp.predict_linear)
Logger._sw.add_audio(f"Audio/generated", waveform, eval_step, sample_rate=hp.sample_rate)  
waveform = audio.inverse_spectrogram(f_predicted_spec, not hp.predict_linear)
Logger._sw.add_audio(f"Audio/forced", waveform, eval_step, sample_rate=hp.sample_rate)              
        
# log alignment
alignment = alignment[idx, :target_len[idx], :source_len[idx]].data.cpu().numpy().T
f = Logger._plot_alignment(alignment)
Logger._sw.add_figure(f"Alignment/eval", f, eval_step)          
plt.close(f)   
        
# log source text
utterance = text.to_text(source[idx].data.cpu().numpy()[:source_len[idx]], hp.use_phonemes)
Logger._sw.add_text(f"Text/eval", utterance, eval_step)      
        
# log stop tokens
Logger._sw.add_figure(f"Stop/eval", Logger._plot_stop_tokens(stop_target[idx].data.cpu().numpy(), stop_prediction[idx].data.cpu().numpy()), eval_step) 
        
...

Thank you very much.

@Tomiinek Tomiinek added the bug Something isn't working label Aug 20, 2020
@michael-conrad
Copy link
Contributor Author

Sorry the late response.

I added

plt.close("all")

as the last statement in both the evaluation reporting and the train reporting.

This seems to have solved the issue where the plots were causing the Xorg server to reserve memory for plots never being screen displayed.

@michael-conrad
Copy link
Contributor Author

Sorry the late response.

I added

plt.close("all")

well.. suddenly started getting messages about freeing stuff up not in the main thread or some such.

So I changed it to read (based on your instructions):

@staticmethod
    def evaluation(eval_step, losses, mcd, source_len, target_len, source, target, prediction_forced, prediction, stop_prediction, stop_target, alignment, classifier):
        """Log evaluation results.
        
        Arguments:
            eval_step -- number of the current evaluation step (i.e. epoch)
            losses (dictionary of {loss name, value})-- dictionary with values of batch losses
            mcd (float) -- evaluation Mel Cepstral Distorsion
            source_len (tensor) -- number of characters of input utterances
            target_len (tensor) -- number of frames of ground-truth spectrograms
            source (tensor) -- input utterances
            target (tensor) -- ground-truth spectrograms
            prediction_forced (tensor) -- ground-truth-aligned spectrograms
            prediction (tensor) -- predicted spectrograms
            stop_prediction (tensor) -- predicted stop token probabilities
            stop_target (tensor) -- true stop token probabilities
            alignment (tensor) -- alignments (attention weights for each frame) of the last evaluation batch
            classifier (float) -- accuracy of the reversal classifier
        """  

        # log losses
        total_loss = sum(losses.values())
        Logger._sw.add_scalar(f'Eval/loss_total', total_loss, eval_step)
        for n, l in losses.items():
            Logger._sw.add_scalar(f'Eval/loss_{n}', l, eval_step) 

        # show random sample: spectrogram, stop token probability, alignment and audio
        idx = random.randint(0, alignment.size(0) - 1)
        predicted_spec = prediction[idx, :, :target_len[idx]].data.cpu().numpy()
        f_predicted_spec = prediction_forced[idx, :, :target_len[idx]].data.cpu().numpy()
        target_spec = target[idx, :, :target_len[idx]].data.cpu().numpy()  

        # log spectrograms
        if hp.normalize_spectrogram:
            predicted_spec = audio.denormalize_spectrogram(predicted_spec, not hp.predict_linear)
            f_predicted_spec = audio.denormalize_spectrogram(f_predicted_spec, not hp.predict_linear)
            target_spec = audio.denormalize_spectrogram(target_spec, not hp.predict_linear)
        
        f = Logger._plot_spectrogram(predicted_spec)
        Logger._sw.add_figure(f"Predicted/generated", f, eval_step)
        plt.close(f)
        
        f = Logger._plot_spectrogram(f_predicted_spec)
        Logger._sw.add_figure(f"Predicted/forced", f, eval_step)
        plt.close(f)
        
        f = Logger._plot_spectrogram(target_spec)
        Logger._sw.add_figure(f"Target/eval", f, eval_step) 
        plt.close(f)
        
        # log audio
        waveform = audio.inverse_spectrogram(predicted_spec, not hp.predict_linear)
        Logger._sw.add_audio(f"Audio/generated", waveform, eval_step, sample_rate=hp.sample_rate)  
        waveform = audio.inverse_spectrogram(f_predicted_spec, not hp.predict_linear)
        Logger._sw.add_audio(f"Audio/forced", waveform, eval_step, sample_rate=hp.sample_rate)              
        
        # log alignment
        alignment = alignment[idx, :target_len[idx], :source_len[idx]].data.cpu().numpy().T
        
        f=Logger._plot_alignment(alignment)
        Logger._sw.add_figure(f"Alignment/eval", f, eval_step)
        plt.close(f)                
        
        # log source text
        utterance = text.to_text(source[idx].data.cpu().numpy()[:source_len[idx]], hp.use_phonemes)
        Logger._sw.add_text(f"Text/eval", utterance, eval_step)      
        
        # log stop tokens
        f = Logger._plot_stop_tokens(stop_target[idx].data.cpu().numpy(), stop_prediction[idx].data.cpu().numpy())
        Logger._sw.add_figure(f"Stop/eval", f, eval_step) 
        plt.close(f)
        
        # log mel cepstral distorsion
        Logger._sw.add_scalar(f'Eval/mcd', mcd, eval_step)
        
        # log reversal language classifier accuracy
        if hp.reversal_classifier:
            Logger._sw.add_scalar(f'Eval/classifier', classifier, eval_step)

So far so good. At 6 epochs on the resumed training and the Xorg memory is no longer increasing every training loop. And no crashes.

@michael-conrad
Copy link
Contributor Author

ack, now I'm getting crashes:

Exception ignored in: <function Image.__del__ at 0x7efbc32268c0>
Traceback (most recent call last):
  File "/home/muksihs/git/Multilingual_Text_to_Speech/env/lib/python3.7/tkinter/__init__.py", line 3507, in __del__
    self.tk.call('image', 'delete', self.name)
RuntimeError: main thread is not in main loop
Exception ignored in: <function Image.__del__ at 0x7efbc32268c0>
Traceback (most recent call last):
  File "/home/muksihs/git/Multilingual_Text_to_Speech/env/lib/python3.7/tkinter/__init__.py", line 3507, in __del__
    self.tk.call('image', 'delete', self.name)
RuntimeError: main thread is not in main loop
Exception ignored in: <function Image.__del__ at 0x7efbc32268c0>
Traceback (most recent call last):
  File "/home/muksihs/git/Multilingual_Text_to_Speech/env/lib/python3.7/tkinter/__init__.py", line 3507, in __del__
    self.tk.call('image', 'delete', self.name)
RuntimeError: main thread is not in main loop

@Tomiinek
Copy link
Owner

I stil cannot reproduce your issue 😥

This issue concerns something similar. It solves the problem with these changes, i.e.:

I fixed issue of #5 by changing the backend of matplotlib from Tkinter(TkAgg) to PyQt5(Qt5Agg).
(See https://stackoverflow.com/questions/14694408/runtimeerror-main-thread-is-not-in-main-loop and http://matplotlib.1069221.n5.nabble.com/Matplotlib-Tk-and-multithreading-td40647.html )

Another way is probably to remove the plt.close(...) as I suggested above and sometimes explicitly force garbage collecting:

import gc
gc.collect()

Can you try it out and let me know, please?

@michael-conrad
Copy link
Contributor Author

ok, I added PyQt5 to the environment and added the following to the main script:

import matplotlib
matplotlib.use("Qt5Agg")

And I'm resuming training now.

@michael-conrad
Copy link
Contributor Author

ok, I added PyQt5 to the environment and added the following to the main script:

import matplotlib
matplotlib.use("Qt5Agg")

And I'm resuming training now.

With the plot.close(f) code, no crashes because of thread violations so far. (5 hours run time).

@Tomiinek
Copy link
Owner

I am glad to hear that 🙂

@Tomiinek Tomiinek closed this as completed Sep 1, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants