Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unintuitive reduction of mini-batch loss for NLLLoss #9882

Open
timonbimon opened this issue Jul 26, 2018 · 1 comment
Open

Unintuitive reduction of mini-batch loss for NLLLoss #9882

timonbimon opened this issue Jul 26, 2018 · 1 comment
Assignees
Labels
module: docs Related to our documentation, both in docs/ and docblocks module: loss Problem is related to loss function module: nn Related to torch.nn triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Comments

@timonbimon
Copy link

timonbimon commented Jul 26, 2018

I find the reduction method that was chosen for the NLLLoss quite unintutive.

screen shot 2018-07-26 at 18 33 10

This introduces a weird interdependence of the chosen class weights with the chose batch size (and more: the influence of the class weights depend on which ground-truth classes are present in the mini-batch)
Extreme case with the current implementation: with batch size one, it does not matter which class weights I choose, my net will always see the same gradients.

In other words: I would expect F.nll_loss(..., reduce=True) == torch.mean(F.nll_los(..., reduce=False)) but this does not hold true when using different class weights.

In the documentation of the CrossEntropyLoss it also says the following

screen shot 2018-07-26 at 18 45 47

Especially the sentence "The losses are averaged across observations for each minibatch." is very misleading with the current implementation if you are using class weights.

I can only guess that the reason this implementation was chosen is s.t. your loss value doesn't change when you change the class weights (which makes multiple runs with different class weights more comparable when you're just looking at the loss), but it seems to come at a cost of a very unintuitive treatment of class weights, that in my opinion is not worth it.

cc @jlin27 @mruberry @albanD @jbschlosser

@zou3519 zou3519 added the todo Not as important as medium or high priority tasks, but we will work on these. label Jul 30, 2018
@li-roy
Copy link
Contributor

li-roy commented Aug 8, 2018

Currently in process of improving the docs. We're also looking into weighted average as an option for loss in the future.

@heitorschueroff heitorschueroff added module: docs Related to our documentation, both in docs/ and docblocks module: loss Problem is related to loss function triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module and removed todo Not as important as medium or high priority tasks, but we will work on these. labels Jan 26, 2021
@mruberry mruberry added the module: nn Related to torch.nn label Jan 26, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module: docs Related to our documentation, both in docs/ and docblocks module: loss Problem is related to loss function module: nn Related to torch.nn triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

No branches or pull requests

5 participants