track_epoch_end_reduce_metrics and memory consumption #7498

Demirrr · 2021-05-12T09:29:10Z

🐛 Bug

track_epoch_end_reduce_metrics leads an increase in the memory consumption between epochs.

Memory consumption starts increasing in the very begining of an epoch
Memory consumption peaks in the very end of an epoch.
After an epoch is completed, the memory consumption significantly decreases.
Go to 1 again.

Demirrr · 2021-05-12T10:45:12Z

Sorry it is my bad. I cant seem to reproduce this error. Although the behaviour of memory consumption still persists, I am not quite sure whether the aforemented method causes this issue.

awaelchli · 2021-05-13T08:02:29Z

Do you return objects from your step methods, or have callback methods for on_*_epoch_end implemented?

This PR #7339 and #7338 should improve the situation :)

Demirrr · 2021-05-15T10:20:35Z

Nope, I do not return any object neither from step methods, nor from on_*_epoch_end as show below :(

import pytorch_lightning as pl
import torch
from torch import nn
from torch.nn import functional as F
from pytorch_lightning.metrics.functional import accuracy
from typing import List, Any, Tuple
from torch.nn.init import xavier_normal_


class BaseKGE(pl.LightningModule):
    def __init__(self, learning_rate=.1):
        super().__init__()
        self.name = 'Not init'
        self.learning_rate = learning_rate

    def configure_optimizers(self):
        return torch.optim.Adam(self.parameters(), lr=self.learning_rate)

    def loss_function(self, y_hat, y):
        return self.loss(y_hat, y)

    def forward_triples(self, *args, **kwargs):
        raise ValueError(f'MODEL:{self.name} does not have forward_triples function')

    def forward_k_vs_all(self, *args, **kwargs):
        raise ValueError(f'MODEL:{self.name} does not have forward_k_vs_all function')

    def forward(self, x):
        if len(x) == 3:
            h, r, t = x[0], x[1], x[2]
            return self.forward_triples(h, r, t)
        elif len(x) == 2:
            h, y = x[0], x[1]
            # Note that y can be relation or tail entity.
            return self.forward_k_vs_all(h, y)
        else:
            raise ValueError('Not valid input')

    def training_step(self, batch, batch_idx):
        x_batch, y_batch = batch
        train_loss = self.loss_function(self.forward(x_batch), y_batch)
        return {'loss': train_loss}

    #def training_epoch_end(self, outputs) -> None:
    #    """ DBpedia debugging removed."""
    #    #avg_loss = torch.stack([x['loss'] for x in outputs]).mean()
    #    #self.log('avg_loss', avg_loss, on_epoch=False, prog_bar=True)

Thanks for the tip! I reckon that the two PRs are closely related with my issue

Borda · 2021-05-18T12:27:00Z

I think it quite expected that the input companion during each epoch takes some memory and after all is computed at the end, them memory drops to normal... 🐰
cc: @SkafteNicki

Demirrr added bug Something isn't working help wanted Open to be worked on labels May 12, 2021

Borda closed this as completed May 18, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

track_epoch_end_reduce_metrics and memory consumption #7498

track_epoch_end_reduce_metrics and memory consumption #7498

Demirrr commented May 12, 2021 •

edited

Loading

Demirrr commented May 12, 2021

awaelchli commented May 13, 2021 •

edited

Loading

Demirrr commented May 15, 2021 •

edited by Borda

Loading

Borda commented May 18, 2021

track_epoch_end_reduce_metrics and memory consumption #7498

track_epoch_end_reduce_metrics and memory consumption #7498

Comments

Demirrr commented May 12, 2021 • edited Loading

🐛 Bug

Demirrr commented May 12, 2021

awaelchli commented May 13, 2021 • edited Loading

Demirrr commented May 15, 2021 • edited by Borda Loading

Borda commented May 18, 2021

Demirrr commented May 12, 2021 •

edited

Loading

awaelchli commented May 13, 2021 •

edited

Loading

Demirrr commented May 15, 2021 •

edited by Borda

Loading