-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Custom Device]add run_check support for custom device #56318
[Custom Device]add run_check support for custom device #56318
Conversation
你的PR提交成功,感谢你对开源项目的贡献! |
if use_custom is True: | ||
import os | ||
|
||
os.environ['PADDLE_DISTRI_BACKEND'] = "xccl" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
分布式通过读取环境变量PADDLE_DISTRI_BACKEND
设置backend,默认值为auto。这里检出使用custom device后手动设置backend为xccl,避免设置错误的backend。
python/paddle/distributed/spawn.py
Outdated
elif 'npu' in device: | ||
return core.get_custom_device_count('npu') | ||
elif 'mlu' in device: | ||
return core.get_custom_device_count('mlu') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里改成适用所有custom device的方式,不要用字符串进行判断,只能支持 npu 和 mlu 两种硬件类型。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
@@ -126,6 +130,8 @@ def _get_default_backend(): | |||
return 'bkcl' | |||
elif 'cpu' in device: | |||
return 'gloo' | |||
elif 'npu' or 'mlu' in device: | |||
return 'xccl' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
同上,这里修改为支持所有通过custom device注册的硬件类型,不要用过字符串判断。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
python/paddle/utils/install_check.py
Outdated
|
||
if paddle.is_compiled_with_cuda(): | ||
use_cuda = _is_cuda_available() | ||
elif paddle.is_compiled_with_xpu(): | ||
use_xpu = _is_xpu_available() | ||
elif len(paddle.framework.core.get_all_custom_device_type()) == 1: | ||
use_custom = _is_custom_device_available() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里259行的判断逻辑应该是 >0, 存在注册多个custom device的情况,另外 _is_custom_device_available 里面实现的逻辑和 259 行 elif的逻辑是同一个,判断条件重复,可以去掉一个。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
custom_device_name | ||
) | ||
) | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里默认只跑 device[0],判断一下如果有多个device注册,这里加点warning message提示下只对你device[0]进行检测
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
PR types
New features
PR changes
Others
Description
为Custom Device增加run_check支持。目前仅支持单一种类的Custom Device(如只安装了昇腾NPU)。测试结果如下