Does loss dict keep the same order in different processes? #309

yelantf · 2018-12-29T03:09:03Z

❓ Questions and Help

In maskrcnn_benchmark.engine.trainer, the function reduce_loss_dict reduce all losses in loss_dict to rank 0.

def reduce_loss_dict(loss_dict):
    """
    Reduce the loss dictionary from all processes so that process with rank
    0 has the averaged results. Returns a dict with the same fields as
    loss_dict, after reduction.
    """
    world_size = get_world_size()
    if world_size < 2:
        return loss_dict
    with torch.no_grad():
        loss_names = []
        all_losses = []
        for k, v in loss_dict.items():
            loss_names.append(k)
            all_losses.append(v)
        all_losses = torch.stack(all_losses, dim=0)
        dist.reduce(all_losses, dst=0)
        if dist.get_rank() == 0:
            # only main process gets accumulated, so only divide by
            # world_size in this case
            all_losses /= world_size
        reduced_losses = {k: v for k, v in zip(loss_names, all_losses)}
    return reduced_losses

It uses loss_dict.items() and I think the order of plain dictionary object keys is not deterministic. Would it be better if we use OrderDict?

The text was updated successfully, but these errors were encountered:

fmassa · 2018-12-29T14:12:16Z

Yes, ordered dict would be safer.
Another possibility would be to sort loss_names and use this order to sort all_losses.
Can you send a PR with the change?

yelantf · 2018-12-30T03:34:42Z

Yes, I can do that.

yelantf · 2018-12-30T03:43:50Z

By the way, it did mismatch among different losses during reduction. This issue became apparent when I added several new loss items into the original codes. After I sorted their keys, the value of those losses in the log became much different. A lucky thing is that it only influences the log, while the training keeps all right.

sort keys before reduction

fmassa · 2018-12-30T13:28:45Z

Thanks a lot for fixing it in #310 !

fix a bug in loss reduction for log(facebookresearch#309) (facebookresearch#310)

…search#310) sort keys before reduction

fmassa pushed a commit that referenced this issue Dec 30, 2018

fix a bug in loss reduction for log(#309) (#310)

f25c6cf

sort keys before reduction

fmassa closed this as completed Dec 30, 2018

BobZhangHT added a commit to BobZhangHT/maskrcnn-benchmark that referenced this issue Jan 2, 2019

Merge pull request #2 from facebookresearch/master

d2d9648

fix a bug in loss reduction for log(facebookresearch#309) (facebookresearch#310)

nprasad2021 pushed a commit to nprasad2021/maskrcnn-benchmark that referenced this issue Jan 29, 2019

fix a bug in loss reduction for log(facebookresearch#309) (facebookre…

f03d291

…search#310) sort keys before reduction

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Does loss dict keep the same order in different processes? #309

Does loss dict keep the same order in different processes? #309

yelantf commented Dec 29, 2018 •

edited

Loading

fmassa commented Dec 29, 2018

yelantf commented Dec 30, 2018

yelantf commented Dec 30, 2018

fmassa commented Dec 30, 2018

Does loss dict keep the same order in different processes? #309

Does loss dict keep the same order in different processes? #309

Comments

yelantf commented Dec 29, 2018 • edited Loading

❓ Questions and Help

fmassa commented Dec 29, 2018

yelantf commented Dec 30, 2018

yelantf commented Dec 30, 2018

fmassa commented Dec 30, 2018

yelantf commented Dec 29, 2018 •

edited

Loading