-
Notifications
You must be signed in to change notification settings - Fork 6.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CI test linux://python/ray/dag:tests/experimental/test_mocked_nccl_dag is flaky #48288
Comments
new and flaky test |
Test passed on latest run: https://buildkite.com/ray-project/postmerge/builds/6705#0192d3e2-fd18-45b5-a512-a9b4bf2ce350 |
CI test linux://python/ray/dag:tests/experimental/test_mocked_nccl_dag is flaky. Recent failures: DataCaseName-linux://python/ray/dag:tests/experimental/test_mocked_nccl_dag-END |
Test passed on latest run: https://buildkite.com/ray-project/postmerge/builds/6705#0192d438-fbe9-469b-8250-a76386293294 |
CI test linux://python/ray/dag:tests/experimental/test_mocked_nccl_dag is flaky. Recent failures: DataCaseName-linux://python/ray/dag:tests/experimental/test_mocked_nccl_dag-END |
CI test linux://python/ray/dag:tests/experimental/test_mocked_nccl_dag is flaky. Recent failures: DataCaseName-linux://python/ray/dag:tests/experimental/test_mocked_nccl_dag-END |
Test passed on latest run: https://buildkite.com/ray-project/postmerge/builds/6719#0192d59a-e22c-46b1-b7e4-dd945cfad604 |
CI test linux://python/ray/dag:tests/experimental/test_mocked_nccl_dag is consistently_failing. Recent failures: DataCaseName-linux://python/ray/dag:tests/experimental/test_mocked_nccl_dag-END |
Test passed on latest run: https://buildkite.com/ray-project/postmerge/builds/6721#0192d6ab-d499-4e83-b956-7d7c70e5e524 |
CI test linux://python/ray/dag:tests/experimental/test_mocked_nccl_dag is flaky. Recent failures: DataCaseName-linux://python/ray/dag:tests/experimental/test_mocked_nccl_dag-END |
CI test linux://python/ray/dag:tests/experimental/test_mocked_nccl_dag is flaky. Recent failures: DataCaseName-linux://python/ray/dag:tests/experimental/test_mocked_nccl_dag-END |
Test passed on latest run: https://buildkite.com/ray-project/postmerge/builds/6985#019341af-ab36-4334-adf0-886d76717731 |
CI test linux://python/ray/dag:tests/experimental/test_mocked_nccl_dag is flaky. Recent failures: DataCaseName-linux://python/ray/dag:tests/experimental/test_mocked_nccl_dag-END |
Test passed on latest run: https://buildkite.com/ray-project/postmerge/builds/6990#01934300-8454-4620-bd26-2f12482a56e4 |
Test passed on latest run: https://buildkite.com/ray-project/postmerge/builds/6992#01934433-3e29-487f-b3c9-26c7bfaf026a |
CI test linux://python/ray/dag:tests/experimental/test_mocked_nccl_dag is consistently_failing. Recent failures: DataCaseName-linux://python/ray/dag:tests/experimental/test_mocked_nccl_dag-END |
CI test linux://python/ray/dag:tests/experimental/test_mocked_nccl_dag is flaky. Recent failures: DataCaseName-linux://python/ray/dag:tests/experimental/test_mocked_nccl_dag-END |
Test passed on latest run: https://buildkite.com/ray-project/postmerge/builds/7008#01934806-7501-4352-84cf-c1835fbd8d80 |
CI test linux://python/ray/dag:tests/experimental/test_mocked_nccl_dag is flaky. Recent failures: DataCaseName-linux://python/ray/dag:tests/experimental/test_mocked_nccl_dag-END |
1 similar comment
CI test linux://python/ray/dag:tests/experimental/test_mocked_nccl_dag is flaky. Recent failures: DataCaseName-linux://python/ray/dag:tests/experimental/test_mocked_nccl_dag-END |
…ng down (#48808) Each compiled graph starts a monitor thread to tear down the DAG upon detecting an error in one of the workers' task loops. Currently, during driver shutdown, this thread can live past the lifetime of the C++ CoreWorker. This causes a silent process exit when the thread later tries to call on the CoreWorker but it has already been destructed. To prevent this from happening, this fix joins the monitor thread *before* destructing the CoreWorker. ## Related issue number Closes #48288. --------- Signed-off-by: Stephanie Wang <smwang@cs.washington.edu>
CI test linux://python/ray/dag:tests/experimental/test_mocked_nccl_dag is flaky. Recent failures: DataCaseName-linux://python/ray/dag:tests/experimental/test_mocked_nccl_dag-END |
Test passed on latest run: https://buildkite.com/ray-project/postmerge/builds/7018#01934b1e-eadf-4208-91fc-08e323b940a0 |
…ng down (ray-project#48808) Each compiled graph starts a monitor thread to tear down the DAG upon detecting an error in one of the workers' task loops. Currently, during driver shutdown, this thread can live past the lifetime of the C++ CoreWorker. This causes a silent process exit when the thread later tries to call on the CoreWorker but it has already been destructed. To prevent this from happening, this fix joins the monitor thread *before* destructing the CoreWorker. ## Related issue number Closes ray-project#48288. --------- Signed-off-by: Stephanie Wang <smwang@cs.washington.edu> Signed-off-by: Connor Sanders <connor@elastiflow.com>
…ng down (ray-project#48808) Each compiled graph starts a monitor thread to tear down the DAG upon detecting an error in one of the workers' task loops. Currently, during driver shutdown, this thread can live past the lifetime of the C++ CoreWorker. This causes a silent process exit when the thread later tries to call on the CoreWorker but it has already been destructed. To prevent this from happening, this fix joins the monitor thread *before* destructing the CoreWorker. ## Related issue number Closes ray-project#48288. --------- Signed-off-by: Stephanie Wang <smwang@cs.washington.edu> Signed-off-by: hjiang <dentinyhao@gmail.com>
CI test linux://python/ray/dag:tests/experimental/test_mocked_nccl_dag is consistently_failing. Recent failures:
- https://buildkite.com/ray-project/postmerge/builds/6696#0192d1c2-1479-41d6-bf43-5727365f667f
- https://buildkite.com/ray-project/postmerge/builds/6696#0192d187-63be-42ea-9a3e-4e2ff54b9f96
DataCaseName-linux://python/ray/dag:tests/experimental/test_mocked_nccl_dag-END
Managed by OSS Test Policy
The text was updated successfully, but these errors were encountered: