-
Notifications
You must be signed in to change notification settings - Fork 233
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
adapt CI tests to use compiled_rmsnorm #451
Conversation
[ghstack-poisoned]
ghstack-source-id: 357f6104859f6d2d9372622a8284330502a4db77 Pull Request resolved: #451
@@ -54,7 +54,7 @@ def build_test_list(): | |||
"--experimental.pipeline_parallel_split_points layers.4", | |||
"--experimental.pipeline_parallel_schedule 1f1b", | |||
"--training.data_parallel_degree 1", | |||
"--model.norm_type rmsnorm", # fused_rmsnorm crashes with PP | |||
"--model.norm_type rmsnorm", # compiled_rmsnorm / fused_rmsnorm crashes with PP |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
curious what is the reason that compiled RMSNorm is incompatible with PP
test_runner.py
Outdated
@@ -257,7 +253,7 @@ def build_test_list(): | |||
"--experimental.pipeline_parallel_degree 4", | |||
"--experimental.pipeline_parallel_split_points layers.1,layers.2,layers.3,layers.4,layers.5,layers.6,layers.7", | |||
"--experimental.pipeline_parallel_schedule interleaved_1f1b", | |||
"--model.norm_type rmsnorm", # fused_rmsnorm throws cuda context error with pp | |||
"--model.norm_type rmsnorm", # compiled_rmsnorm / fused_rmsnorm throws cuda context error with pp |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
e.g. is this cuda context error the same reason as above for "crashes with PP"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. It seems nobody understands why the error happens at this moment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it is the same error, maybe we should make the commit uniform?
[ghstack-poisoned]
ghstack-source-id: fa38b2261667b182e0b05699d2f04772ec5852d3 Pull Request resolved: #451
a437389
to
a757a84
Compare
ghstack-source-id: fa38b2261667b182e0b05699d2f04772ec5852d3 Pull Request resolved: #451
see updates in #535 |
Stack from ghstack (oldest at bottom):