Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Data] Implement accurate memory accounting for all-to-all operations #50290

Merged
merged 42 commits into from
Feb 15, 2025

Conversation

bveeramani
Copy link
Member

@bveeramani bveeramani commented Feb 6, 2025

Why are these changes needed?

AllToAllOperator and ZipOperator don't implement accurate memory. As a result, if a plan contains either of these operators, the streaming executors falls back to the legacy scheduling algorithm.

Related issue number

Fixes #48104

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
    • I've added any new APIs to the API Reference. For example, if I added a
      method in Tune, I've added it in doc/source/tune/api/ under the
      corresponding .rst file.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu>
Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu>
Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu>
Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu>
Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu>
Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu>
Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu>
Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu>
Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu>
Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu>
Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu>
Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu>
Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu>
Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu>
Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu>
Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu>
Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu>
Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu>
Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu>
Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu>
Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu>
Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu>
Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu>
Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu>
Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu>
@bveeramani bveeramani marked this pull request as draft February 6, 2025 18:40
Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu>
Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu>
bveeramani and others added 13 commits February 10, 2025 15:05
Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu>
Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu>
Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu>
Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu>
Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu>
Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu>
Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu>
Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu>
Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu>
Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu>
Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu>
Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu>
@bveeramani bveeramani added the go add ONLY when ready to merge, run all tests label Feb 14, 2025
Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu>
Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu>
@bveeramani bveeramani marked this pull request as ready for review February 14, 2025 23:51
@bveeramani bveeramani requested a review from a team as a code owner February 14, 2025 23:51
Comment on lines +167 to +168
def implements_accurate_memory_accounting(self):
return True
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will remove this method altogether in a follow-up PR. Wanted to keep this PR relatively small

@bveeramani bveeramani enabled auto-merge (squash) February 15, 2025 00:30
@bveeramani bveeramani merged commit f0942dd into master Feb 15, 2025
6 checks passed
@bveeramani bveeramani deleted the legacy-schedule branch February 15, 2025 00:38
400Ping pushed a commit to 400Ping/ray that referenced this pull request Feb 20, 2025
…ray-project#50290)

`AllToAllOperator` and `ZipOperator` don't implement accurate memory. As
a result, if a plan contains either of these operators, the streaming
executors falls back to the legacy scheduling algorithm.

Fixes ray-project#48104

---------

Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu>
Signed-off-by: 400Ping <43886578+400Ping@users.noreply.github.com>
israbbani pushed a commit that referenced this pull request Feb 25, 2025
…#50290)

`AllToAllOperator` and `ZipOperator` don't implement accurate memory. As
a result, if a plan contains either of these operators, the streaming
executors falls back to the legacy scheduling algorithm.

Fixes #48104

---------

Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu>
xsuler pushed a commit to antgroup/ant-ray that referenced this pull request Mar 4, 2025
…ray-project#50290)

`AllToAllOperator` and `ZipOperator` don't implement accurate memory. As
a result, if a plan contains either of these operators, the streaming
executors falls back to the legacy scheduling algorithm.

Fixes ray-project#48104

---------

Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu>
xsuler pushed a commit to antgroup/ant-ray that referenced this pull request Mar 4, 2025
…ray-project#50290)

`AllToAllOperator` and `ZipOperator` don't implement accurate memory. As
a result, if a plan contains either of these operators, the streaming
executors falls back to the legacy scheduling algorithm.

Fixes ray-project#48104

---------

Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
go add ONLY when ready to merge, run all tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Data] Ray Data runs out of disk while writing Parquet files
3 participants