Replies: 3 comments 14 replies
-
I am in favour of adding the trainer flag as it's a flag that users touch in general PyTorch quite a bit. models like iGPT will not run in DDP mode without this. Just for adding fuel to the discussion, I do wonder the long term future of this particular flag, as I know atleast for FairScale they are trying to not use I vote against the general kwargs approach however (sending kwargs through the trainer into the plugins/all functions). I think that adds additional confusion that I'd rather not deal with (I think the HF datasets do a similar thing, it makes it tricky to trace what params are going where without digging into the code/checking the docs). We should make an explicit |
Beta Was this translation helpful? Give feedback.
-
Should it be a property of the model instead? Just as |
Beta Was this translation helpful? Give feedback.
-
It looks like this is defaulted to true again, but it definitely should be defaulted to
It's only if you're doing something weird that you really need this, and it causes a performance penalty, so please revert to defaulting it to I can tell you as a heavy user of this flag, making it a model property is a good idea because the flag is only required by certain model architectures it has nothing to do with the training setup and everything to do with how the model code was written. Therefore making it a model property makes sense. Put another way, given a specific implementation of a model it will either always need this flag for distributed training or it will never need this flag, so it can be determined by the person implementing the model and hard coded. |
Beta Was this translation helpful? Give feedback.
-
One example is
find_unused_parameters=True/False
flag in the Trainer.Several users have recently asked how to set this flag, mainly because the default of this parameter changed from 1.1 to 1.2 and some users are now forced to toggle it.
The current way is to change it like so:
Some users have expressed a desire to set it directly in the Trainer.
Advantages:
Disadvantages:
Beta Was this translation helpful? Give feedback.
All reactions