-
Notifications
You must be signed in to change notification settings - Fork 158
Memory leak #19
Comments
Hello, thank you for your observation! I unfortunately cannot replicate the problem. Can you please change the ...
# log spectrograms
if hp.normalize_spectrogram:
predicted_spec = audio.denormalize_spectrogram(predicted_spec, not hp.predict_linear)
f_predicted_spec = audio.denormalize_spectrogram(f_predicted_spec, not hp.predict_linear)
target_spec = audio.denormalize_spectrogram(target_spec, not hp.predict_linear)
f = Logger._plot_spectrogram(predicted_spec)
Logger._sw.add_figure(f"Predicted/generated", f, eval_step)
plt.close(f)
f = Logger._plot_spectrogram(f_predicted_spec)
Logger._sw.add_figure(f"Predicted/forced", f, eval_step)
plt.close(f)
f = Logger._plot_spectrogram(target_spec)
Logger._sw.add_figure(f"Target/eval", f, eval_step)
plt.close(f)
# log audio
waveform = audio.inverse_spectrogram(predicted_spec, not hp.predict_linear)
Logger._sw.add_audio(f"Audio/generated", waveform, eval_step, sample_rate=hp.sample_rate)
waveform = audio.inverse_spectrogram(f_predicted_spec, not hp.predict_linear)
Logger._sw.add_audio(f"Audio/forced", waveform, eval_step, sample_rate=hp.sample_rate)
# log alignment
alignment = alignment[idx, :target_len[idx], :source_len[idx]].data.cpu().numpy().T
f = Logger._plot_alignment(alignment)
Logger._sw.add_figure(f"Alignment/eval", f, eval_step)
plt.close(f)
# log source text
utterance = text.to_text(source[idx].data.cpu().numpy()[:source_len[idx]], hp.use_phonemes)
Logger._sw.add_text(f"Text/eval", utterance, eval_step)
# log stop tokens
Logger._sw.add_figure(f"Stop/eval", Logger._plot_stop_tokens(stop_target[idx].data.cpu().numpy(), stop_prediction[idx].data.cpu().numpy()), eval_step)
... Thank you very much. |
Sorry the late response. I added plt.close("all") as the last statement in both the evaluation reporting and the train reporting. This seems to have solved the issue where the plots were causing the Xorg server to reserve memory for plots never being screen displayed. |
well.. suddenly started getting messages about freeing stuff up not in the main thread or some such. So I changed it to read (based on your instructions): @staticmethod
def evaluation(eval_step, losses, mcd, source_len, target_len, source, target, prediction_forced, prediction, stop_prediction, stop_target, alignment, classifier):
"""Log evaluation results.
Arguments:
eval_step -- number of the current evaluation step (i.e. epoch)
losses (dictionary of {loss name, value})-- dictionary with values of batch losses
mcd (float) -- evaluation Mel Cepstral Distorsion
source_len (tensor) -- number of characters of input utterances
target_len (tensor) -- number of frames of ground-truth spectrograms
source (tensor) -- input utterances
target (tensor) -- ground-truth spectrograms
prediction_forced (tensor) -- ground-truth-aligned spectrograms
prediction (tensor) -- predicted spectrograms
stop_prediction (tensor) -- predicted stop token probabilities
stop_target (tensor) -- true stop token probabilities
alignment (tensor) -- alignments (attention weights for each frame) of the last evaluation batch
classifier (float) -- accuracy of the reversal classifier
"""
# log losses
total_loss = sum(losses.values())
Logger._sw.add_scalar(f'Eval/loss_total', total_loss, eval_step)
for n, l in losses.items():
Logger._sw.add_scalar(f'Eval/loss_{n}', l, eval_step)
# show random sample: spectrogram, stop token probability, alignment and audio
idx = random.randint(0, alignment.size(0) - 1)
predicted_spec = prediction[idx, :, :target_len[idx]].data.cpu().numpy()
f_predicted_spec = prediction_forced[idx, :, :target_len[idx]].data.cpu().numpy()
target_spec = target[idx, :, :target_len[idx]].data.cpu().numpy()
# log spectrograms
if hp.normalize_spectrogram:
predicted_spec = audio.denormalize_spectrogram(predicted_spec, not hp.predict_linear)
f_predicted_spec = audio.denormalize_spectrogram(f_predicted_spec, not hp.predict_linear)
target_spec = audio.denormalize_spectrogram(target_spec, not hp.predict_linear)
f = Logger._plot_spectrogram(predicted_spec)
Logger._sw.add_figure(f"Predicted/generated", f, eval_step)
plt.close(f)
f = Logger._plot_spectrogram(f_predicted_spec)
Logger._sw.add_figure(f"Predicted/forced", f, eval_step)
plt.close(f)
f = Logger._plot_spectrogram(target_spec)
Logger._sw.add_figure(f"Target/eval", f, eval_step)
plt.close(f)
# log audio
waveform = audio.inverse_spectrogram(predicted_spec, not hp.predict_linear)
Logger._sw.add_audio(f"Audio/generated", waveform, eval_step, sample_rate=hp.sample_rate)
waveform = audio.inverse_spectrogram(f_predicted_spec, not hp.predict_linear)
Logger._sw.add_audio(f"Audio/forced", waveform, eval_step, sample_rate=hp.sample_rate)
# log alignment
alignment = alignment[idx, :target_len[idx], :source_len[idx]].data.cpu().numpy().T
f=Logger._plot_alignment(alignment)
Logger._sw.add_figure(f"Alignment/eval", f, eval_step)
plt.close(f)
# log source text
utterance = text.to_text(source[idx].data.cpu().numpy()[:source_len[idx]], hp.use_phonemes)
Logger._sw.add_text(f"Text/eval", utterance, eval_step)
# log stop tokens
f = Logger._plot_stop_tokens(stop_target[idx].data.cpu().numpy(), stop_prediction[idx].data.cpu().numpy())
Logger._sw.add_figure(f"Stop/eval", f, eval_step)
plt.close(f)
# log mel cepstral distorsion
Logger._sw.add_scalar(f'Eval/mcd', mcd, eval_step)
# log reversal language classifier accuracy
if hp.reversal_classifier:
Logger._sw.add_scalar(f'Eval/classifier', classifier, eval_step) So far so good. At 6 epochs on the resumed training and the Xorg memory is no longer increasing every training loop. And no crashes. |
ack, now I'm getting crashes: Exception ignored in: <function Image.__del__ at 0x7efbc32268c0>
Traceback (most recent call last):
File "/home/muksihs/git/Multilingual_Text_to_Speech/env/lib/python3.7/tkinter/__init__.py", line 3507, in __del__
self.tk.call('image', 'delete', self.name)
RuntimeError: main thread is not in main loop
Exception ignored in: <function Image.__del__ at 0x7efbc32268c0>
Traceback (most recent call last):
File "/home/muksihs/git/Multilingual_Text_to_Speech/env/lib/python3.7/tkinter/__init__.py", line 3507, in __del__
self.tk.call('image', 'delete', self.name)
RuntimeError: main thread is not in main loop
Exception ignored in: <function Image.__del__ at 0x7efbc32268c0>
Traceback (most recent call last):
File "/home/muksihs/git/Multilingual_Text_to_Speech/env/lib/python3.7/tkinter/__init__.py", line 3507, in __del__
self.tk.call('image', 'delete', self.name)
RuntimeError: main thread is not in main loop |
I stil cannot reproduce your issue 😥 This issue concerns something similar. It solves the problem with these changes, i.e.:
Another way is probably to remove the import gc
gc.collect() Can you try it out and let me know, please? |
ok, I added PyQt5 to the environment and added the following to the main script: import matplotlib
matplotlib.use("Qt5Agg") And I'm resuming training now. |
With the plot.close(f) code, no crashes because of thread violations so far. (5 hours run time). |
I am glad to hear that 🙂 |
There appears to be some long term running memory leak, probably related to graphs. As the training progresses, my Xorg memory consumption gradually increases. If I stop the training, the memory is instantly released.
I suspect it is related to graphs because of the following warning:
The text was updated successfully, but these errors were encountered: