Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fix] torch.inference_mode inplace of torch.no_grad #3188

Merged
merged 2 commits into from
Jul 14, 2024

Conversation

pdumin
Copy link
Contributor

@pdumin pdumin commented Jul 14, 2024

Changed inference context to torch.inference_mode() in test stage.

Setting device automatically with torch.cuda.is_availble.

@CLAassistant
Copy link

CLAassistant commented Jul 14, 2024

CLA assistant check
All committers have signed the CLA.

@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

Copy link
Contributor

@mihran113 mihran113 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @pdumin! Thanks a lot for opening the PR. Everything looks good 🎉 Proceeding to merge.

@mihran113 mihran113 merged commit 756f41a into aimhubio:main Jul 14, 2024
1 check passed
@pdumin
Copy link
Contributor Author

pdumin commented Jul 15, 2024

@mihran113 Does it make sense to rewrite the training and validation in the PyTorch example? Currently, the accumulation of losses and metrics is done illogically: every 30 (batch) iterations. It could be fixed by accumulating gradients and metrics after each batch, which seems more natural.

I mean replace this:

for i, (images, labels) in enumerate(train_loader):
        images = images.to(device)
        labels = labels.to(device)

        # Forward pass
        outputs = model(images)
        loss = criterion(outputs, labels)

        # Backward and optimize
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        if i % 30 == 0:
            logging.info(
                'Epoch [{}/{}], Step [{}/{}], ' 'Loss: {:.4f}'.format(
                    epoch + 1, num_epochs, i + 1, total_step, loss.item()
                )
            )

            # aim - Track model loss function
            correct = 0
            total = 0
            _, predicted = torch.max(outputs.data, 1)
            total += labels.size(0)
            correct += (predicted == labels).sum().item()
            acc = 100 * correct / total

            # aim - Track metrics
            items = {'accuracy': acc, 'loss': loss}
            aim_run.track(items, epoch=epoch, context={'subset': 'train'})

            # aim - Track weights and gradients distributions
            track_params_dists(model, aim_run)
            track_gradients_dists(model, aim_run)

with this:

for i, (images, labels) in enumerate(train_loader):
        images = images.to(device)
        labels = labels.to(device)

        # Forward pass
        outputs = model(images)
        loss = criterion(outputs, labels)

        # Backward and optimize
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        # aim - Track model loss function
       correct = 0
       total = 0
       _, predicted = torch.max(outputs.data, 1)
       total += labels.size(0)
       correct += (predicted == labels).sum().item()
       acc = 100 * correct / total

       # aim - Track metrics
       items = {'accuracy': acc, 'loss': loss}
       aim_run.track(items, epoch=epoch, context={'subset': 'train'})

      # aim - Track weights and gradients distributions
      track_params_dists(model, aim_run)
      track_gradients_dists(model, aim_run)

And there is no validation step, I can do it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants