-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat/refactor partition strategy #13
feat/refactor partition strategy #13
Conversation
merge upstream/develop
merge upstream develop
…t-0.x-activation-ckpt feat(isp.py): isp communicator support 0.x activation ckpt
feat(model/linear.py): update FeedForward class to internlm2
configs/7B_sft.py
Outdated
1. size: int, the size of weight parallel. | ||
2. overlap: bool, enable/disable all_gather/reduce_scatter communication overlap, defaults to False. | ||
3. memory_pool: bool, enable/disable memory pool, defaults to False. | ||
""" | ||
parallel = dict( | ||
zero1=dict(size=8, fsdp=False), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
既然我们都用了自己的wp了,fsdp这个接口要不要就隐藏了,不在范例中体现了
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
也可以
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
已更新62a665d
pipeline=dict(size=1, interleaved_overlap=True), | ||
sequence_parallel=False, | ||
weight=dict(size=1, overlap=True, memory_pool=True), | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
是否需要新增一个带WP的example?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
和测例
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
目前的config 7B_sft.py就是带wp的;测例的话可以加一个
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
哦,鹏哥的意思加一个wp size大于1的样例
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
已添加62a665d
module_shapes: Dict[str, torch.Size] = None | ||
|
||
|
||
class MemoryPool: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@huangting4201 @mwiacx 如果升级到pytorch lastest版本(假设使用vmm api的版本从rc版本变为正式版)后,memory pool是可以不用了?
expert_parallel_size (int): Size of expert parallel. | ||
""" | ||
|
||
def __init__(self, *args, **kwargs): | ||
super().__init__(*args, **kwargs) | ||
self.rank_num_per_group = self.tensor_parallel_size * self.pipeline_parallel_size | ||
self.num_group = self.world_size // self.rank_num_per_group | ||
self.num_tensor_parallel_group = self.world_size // self.tensor_parallel_size |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
为什么这里不用管 pp 了?
|
||
def _get_expert_parallel_ranks(self): | ||
""" | ||
Create expert and data parallel groups | ||
Example: world_size = 8, model_parallel_size = 2, expert_parallel_size = 2 | ||
Example: world_size = 8, tensor_parallel_size = 2, expert_parallel_size = 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
EP 遵循的方式?
) | ||
|
||
|
||
class ISPLinear(ColumnParallelLinear): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
为 ISP 设linear 是为了加 communicator?
Motivation
Refactor data(sequence), weight, gradients and os partition strategy.