-
Notifications
You must be signed in to change notification settings - Fork 318
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BREAKING][misc] feat: change micro_batch_size to micro_batch_size_per_gpu #136
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not a good idea to break user's API. We should introduce per gpu bsz in addition to the original parameter. When both of them present, we should assert false. When the old parameter is used, we should print a warning.
Sure. We should set the default value of micro_batch_size_per_gpu to null to implement that. Moreover, for the examples we provide, I think we should simply use micro_batch_size_per_gpu but not the original micro_batch_size. |
…r_gpu (volcengine#136) ## Summary This PR changes all the micro_batch_size to micro_batch_size_per_gpu. **The Core logic of setting batch size:** - **All algorithmic metrics** (train batch size, ppo mini batch size): are global (from the perspective of single-controller), which will be normalized in each Worker. - **All performance-related parameters** (micro batch size, max token length in dynamic batch size) are local parameters, which represent the data sizes per GPU (i.e., each Worker). ## Main Changes 1. Change the scripts and config and delete the normalization for micro_bsz 2. Fix CI for SFT
…r_gpu (volcengine#136) This PR changes all the micro_batch_size to micro_batch_size_per_gpu. **The Core logic of setting batch size:** - **All algorithmic metrics** (train batch size, ppo mini batch size): are global (from the perspective of single-controller), which will be normalized in each Worker. - **All performance-related parameters** (micro batch size, max token length in dynamic batch size) are local parameters, which represent the data sizes per GPU (i.e., each Worker). 1. Change the scripts and config and delete the normalization for micro_bsz 2. Fix CI for SFT
…r_gpu (volcengine#136) This PR changes all the micro_batch_size to micro_batch_size_per_gpu. **The Core logic of setting batch size:** - **All algorithmic metrics** (train batch size, ppo mini batch size): are global (from the perspective of single-controller), which will be normalized in each Worker. - **All performance-related parameters** (micro batch size, max token length in dynamic batch size) are local parameters, which represent the data sizes per GPU (i.e., each Worker). 1. Change the scripts and config and delete the normalization for micro_bsz 2. Fix CI for SFT
Summary
This PR changes all the micro_batch_size to micro_batch_size_per_gpu.
The Core logic of setting batch size:
Main Changes