Unintuitive reduction of mini-batch loss for NLLLoss #9882

timonbimon · 2018-07-26T16:47:03Z

I find the reduction method that was chosen for the NLLLoss quite unintutive.

This introduces a weird interdependence of the chosen class weights with the chose batch size (and more: the influence of the class weights depend on which ground-truth classes are present in the mini-batch)
Extreme case with the current implementation: with batch size one, it does not matter which class weights I choose, my net will always see the same gradients.

In other words: I would expect F.nll_loss(..., reduce=True) == torch.mean(F.nll_los(..., reduce=False)) but this does not hold true when using different class weights.

In the documentation of the CrossEntropyLoss it also says the following

Especially the sentence "The losses are averaged across observations for each minibatch." is very misleading with the current implementation if you are using class weights.

I can only guess that the reason this implementation was chosen is s.t. your loss value doesn't change when you change the class weights (which makes multiple runs with different class weights more comparable when you're just looking at the loss), but it seems to come at a cost of a very unintuitive treatment of class weights, that in my opinion is not worth it.

cc @jlin27 @mruberry @albanD @jbschlosser

The text was updated successfully, but these errors were encountered:

li-roy · 2018-08-08T04:57:25Z

Currently in process of improving the docs. We're also looking into weighted average as an option for loss in the future.

zou3519 added the todo Not as important as medium or high priority tasks, but we will work on these. label Jul 30, 2018

zou3519 assigned li-roy Jul 30, 2018

mruberry added the module: nn Related to torch.nn label Jan 26, 2021

idc9 mentioned this issue Jan 30, 2022

Counter intuitive behavior of nn.CrossEntropy/nn.NLLLoss with weights and issue with gradient accumulation #72047

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unintuitive reduction of mini-batch loss for NLLLoss #9882

Unintuitive reduction of mini-batch loss for NLLLoss #9882

timonbimon commented Jul 26, 2018 •

edited by pytorch-probot bot

Loading

li-roy commented Aug 8, 2018

Unintuitive reduction of mini-batch loss for NLLLoss #9882

Unintuitive reduction of mini-batch loss for NLLLoss #9882

Comments

timonbimon commented Jul 26, 2018 • edited by pytorch-probot bot Loading

li-roy commented Aug 8, 2018

timonbimon commented Jul 26, 2018 •

edited by pytorch-probot bot

Loading