Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Core][deprecate run_on_all_workers 1/n] set worker's sys.path through JobConfig._py_driver_sys_path #31383

Merged
merged 7 commits into from
Jan 6, 2023

Conversation

scv119
Copy link
Contributor

@scv119 scv119 commented Jan 1, 2023

Why are these changes needed?

Today we use run_on_all_workers to set worker's system path, where the run_on_all_workers suffers from weak ordering guarantees and will be deprecated.

Instead, we should use JobConfig._py_driver_sys_path to set worker's system path, where the worker will add these paths into its sys.path on startup.

Note we have JobConfig.code_search_path which servers similar functionality, however it uses a different load_code_from_local code path and behaves differently and introduced bugs that failed tests (#17605 (comment)).

Related issue number

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

@scv119 scv119 changed the title [Core][deprecate run_on_all_workers 1/n] set sys.path through JobConfig.code_search_path [Core][deprecate run_on_all_workers 1/n] set worker's sys.path through JobConfig.code_search_path Jan 1, 2023
@scv119 scv119 marked this pull request as ready for review January 1, 2023 06:15
@scv119 scv119 changed the title [Core][deprecate run_on_all_workers 1/n] set worker's sys.path through JobConfig.code_search_path [Core][deprecate run_on_all_workers 1/n] set worker's sys.path through JobConfig._py_driver_sys_path Jan 3, 2023
@@ -2027,6 +2027,29 @@ def connect(
runtime_env.pop("excludes", None)
job_config.set_runtime_env(runtime_env)

if mode == SCRIPT_MODE:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you help me understand why this block needed to move before core worker initializatilon?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there are two (relevant) types of workers: drivers and task/actor execution workers.
the job_config flow from drivers -> raylet (worker_pool) -> executor-workers, both implicitly happens through coreworker constructor worker.core_worker = ray._raylet.CoreWorker, where driver set the job_config to raylet and executor receive the job_config from the raylet.

That's why we need to change it before core_worker initialization, as otherwise the job_config has already being sent to raylet.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw how executor-worker get job_config will change by #31375 soon

@scv119 scv119 added the tests-ok The tagger certifies test failures are unrelated and assumes personal liability. label Jan 3, 2023
@fishbone
Copy link
Contributor

fishbone commented Jan 3, 2023

Thanks for fixing this. As synced offline, my only concern is that we make the code_search_path and the new introduced ones doing similar works. Maybe we should try to unify them.
One way is to introduce something like enable_load_code_from_local so that we make these two features:

  1. we have a way to load from local instead of reading from GCS
  2. we have a unified way to setup code search path

Not very strong opinion here if this require a lot of work.

@fishbone
Copy link
Contributor

fishbone commented Jan 3, 2023

Btw, could you also delete import thread which is not useful anymore with this work.

@scv119
Copy link
Contributor Author

scv119 commented Jan 4, 2023

Btw, could you also delete import thread which is not useful anymore with this work.

Deletion will be done in #30895, since @rkooo567 has some concern on deprecation logistic.

@scv119
Copy link
Contributor Author

scv119 commented Jan 4, 2023

Thanks for fixing this. As synced offline, my only concern is that we make the code_search_path and the new introduced ones doing similar works. Maybe we should try to unify them.
One way is to introduce something like enable_load_code_from_local so that we make these two features:
we have a way to load from local instead of reading from GCS
we have a unified way to setup code search path
Not very strong opinion here if this require a lot of work.

Prototyped it and found it a bit more confusing to me... I'd propose we stick with with current implementation, since ultimately we will replace it by runtime env.

@scv119 scv119 merged commit 6c22e59 into ray-project:master Jan 6, 2023
c21 pushed a commit to c21/ray that referenced this pull request Jan 6, 2023
…h JobConfig._py_driver_sys_path (ray-project#31383)

Why are these changes needed?
Today we use run_on_all_workers to set worker's system path, where the run_on_all_workers suffers from weak ordering guarantees and will be deprecated.

Instead, we should use JobConfig._py_driver_sys_path to set worker's system path, where the worker will add these paths into its sys.path on startup.

Note we have JobConfig.code_search_path which servers similar functionality, however it uses a different load_code_from_local code path and behaves differently and introduced bugs that failed tests (ray-project#17605 (comment)).
scv119 added a commit that referenced this pull request Jan 9, 2023
…precation of run_function_on_all_workers (#31528)

Warn user that run_function_on_all_workers will be deleted by Ray 2.4. Ray core no longer uses this function after #31383
Signed-off-by: Chen Shen <scv119@gmail.com>
Co-authored-by: Cade Daniel <edacih@gmail.com>
AmeerHajAli pushed a commit that referenced this pull request Jan 12, 2023
…h JobConfig._py_driver_sys_path (#31383)

Why are these changes needed?
Today we use run_on_all_workers to set worker's system path, where the run_on_all_workers suffers from weak ordering guarantees and will be deprecated.

Instead, we should use JobConfig._py_driver_sys_path to set worker's system path, where the worker will add these paths into its sys.path on startup.

Note we have JobConfig.code_search_path which servers similar functionality, however it uses a different load_code_from_local code path and behaves differently and introduced bugs that failed tests (#17605 (comment)).
AmeerHajAli pushed a commit that referenced this pull request Jan 12, 2023
…h JobConfig._py_driver_sys_path (#31383)

Why are these changes needed?
Today we use run_on_all_workers to set worker's system path, where the run_on_all_workers suffers from weak ordering guarantees and will be deprecated.

Instead, we should use JobConfig._py_driver_sys_path to set worker's system path, where the worker will add these paths into its sys.path on startup.

Note we have JobConfig.code_search_path which servers similar functionality, however it uses a different load_code_from_local code path and behaves differently and introduced bugs that failed tests (#17605 (comment)).
AmeerHajAli pushed a commit that referenced this pull request Jan 12, 2023
…precation of run_function_on_all_workers (#31528)

Warn user that run_function_on_all_workers will be deleted by Ray 2.4. Ray core no longer uses this function after #31383
Signed-off-by: Chen Shen <scv119@gmail.com>
Co-authored-by: Cade Daniel <edacih@gmail.com>
nidabdella pushed a commit to tweag/ray that referenced this pull request Jan 24, 2023
…h JobConfig._py_driver_sys_path (ray-project#31383)

Why are these changes needed?
Today we use run_on_all_workers to set worker's system path, where the run_on_all_workers suffers from weak ordering guarantees and will be deprecated.

Instead, we should use JobConfig._py_driver_sys_path to set worker's system path, where the worker will add these paths into its sys.path on startup.

Note we have JobConfig.code_search_path which servers similar functionality, however it uses a different load_code_from_local code path and behaves differently and introduced bugs that failed tests (ray-project#17605 (comment)).

Signed-off-by: mohamed <mohamed.nidabdella@tweag.io>
tamohannes pushed a commit to ju2ez/ray that referenced this pull request Jan 25, 2023
…h JobConfig._py_driver_sys_path (ray-project#31383)

Why are these changes needed?
Today we use run_on_all_workers to set worker's system path, where the run_on_all_workers suffers from weak ordering guarantees and will be deprecated.

Instead, we should use JobConfig._py_driver_sys_path to set worker's system path, where the worker will add these paths into its sys.path on startup.

Note we have JobConfig.code_search_path which servers similar functionality, however it uses a different load_code_from_local code path and behaves differently and introduced bugs that failed tests (ray-project#17605 (comment)).

Signed-off-by: tmynn <hovhannes.tamoyan@gmail.com>
scv119 added a commit that referenced this pull request Jun 8, 2023
…_on_all_workers (#30895)

This function is deprecated. The only use of run_funcion_on_all_workers in ray core has been replaced by #31383
Delete it to make our worker prestart change simpler.
This PR depends on #31383 #31528
arvind-chandra pushed a commit to lmco/ray that referenced this pull request Aug 31, 2023
…_on_all_workers (ray-project#30895)

This function is deprecated. The only use of run_funcion_on_all_workers in ray core has been replaced by ray-project#31383
Delete it to make our worker prestart change simpler.
This PR depends on ray-project#31383 ray-project#31528

Signed-off-by: e428265 <arvind.chandramouli@lmco.com>
rkooo567 pushed a commit that referenced this pull request Mar 1, 2024
In #31383 we add _py_driver_sys_path to all workers, making them search from the driver's path. This makes the workers able to find modules near the driver file. However, in these cases it's not working:

if there's working_dir. The code should search from working_dir, not from the driver's dir. e.g. lib code in driver's dir can update after job submission and before worker starts, making the runs inconsistent among workers.
if the worker is in a different node than the driver's node. The code would search "the driver's dir as in the node where the driver runs in", but in a different node. If there happen to be a stale code file in the worker node, the stale code is used.
(already handled) if it's dashboard, it's client mode, or it's interactive mode.
This PR addresses the first issue. Left a TODO for the second issue.

Part of #42863
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
tests-ok The tagger certifies test failures are unrelated and assumes personal liability.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants