-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AutoParallel] Support multi machine case for the visualize tool #59179
Merged
From00
merged 63 commits into
PaddlePaddle:develop
from
AndSonder:support_multi_multimachine
Nov 25, 2023
Merged
[AutoParallel] Support multi machine case for the visualize tool #59179
From00
merged 63 commits into
PaddlePaddle:develop
from
AndSonder:support_multi_multimachine
Nov 25, 2023
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…to add_profiler
你的PR提交成功,感谢你对开源项目的贡献! |
AndSonder
force-pushed
the
support_multi_multimachine
branch
from
November 21, 2023 06:04
34feef0
to
0bff0af
Compare
21 tasks
From00
approved these changes
Nov 25, 2023
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
SecretXV
pushed a commit
to SecretXV/Paddle
that referenced
this pull request
Nov 28, 2023
…dlePaddle#59179) * merge from openvino master * add InterpreterRunTime() to record interpreter's run time * add profiler helper static to produce json file * add color map and support perfetto format * recover codes * control include env for gpu_timer.h * fix logic for profiler_helper_static.py * fix build error * fix build error * recover thirdparty * add flag control: not support new ir now * set auto_parallel_profiler flag to false * fix * add auto_parallel_profiler as command parameter * fix value name * support gettimeofday for win env * fix win build error * fix win build error * use job_type_to_id * Fixed repeatedly timing the same stream * add step line for timeline * add step timeline and fix logic when job overlap * update time record logic * fix bug when start profile start from none zero step * fix note * remove FLAGS_auto_parallel_profiler * use run config instead FLAGS_auto_parallelxx * fix color map logic * fix color map logic * fix bug when log step does not start from 0 * fix * fix * don't use set_enable_auto_parallel_profiler * fix bug * disable auto_parallel_profiler when not open flag by command line * fix bug * remove resettime * fix build bug * fix * remove set enable * fix build error * fix build error * fix build error * fix ci error * fix * fix run error * fix * fix * fix calculate_stream_timer logic * remove fluid head * fix build error * set default value for enable_job_schedule_profiler * support multi machine * fix load dir logic
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
PR types
Others
PR changes
Others
Description
PR #58313 中实现了静态图模式下可视化流水并行时序图,本 PR 为工具提供多机模式下的支持。
多机模式下需要用户手动将多台机器上的数据拷贝到一个目录下并按照如下格式组织,下面以一个2机2卡的测试环境举例。
log 文件夹结构:
运行时候添加
--multi_machine
命令可视化效果如下:
依赖PR: