Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document unpickle on scheduler changes #8304

Open
crusaderky opened this issue Oct 25, 2023 · 0 comments
Open

Document unpickle on scheduler changes #8304

crusaderky opened this issue Oct 25, 2023 · 0 comments
Labels
documentation Improve or add to documentation

Comments

@crusaderky
Copy link
Collaborator

This paragraph has become obsolete after #7564 and it's already caused confusion: https://dask.discourse.group/t/scheduler-importing-modules-meant-for-the-workers/2281

Cross Language Specialization
-----------------------------
The Client and Workers must agree on a language-specific serialization format.
In the standard ``dask.distributed`` client and worker objects this ends up
being the following::
bytes = cloudpickle.dumps(obj, protocol=pickle.HIGHEST_PROTOCOL)
obj = cloudpickle.loads(bytes)
This varies between Python 2 and 3 and so your client and workers must match
their Python versions and software environments.
However, the Scheduler never uses the language-specific serialization and
instead only deals with MsgPack. If the client sends a pickled function up to
the scheduler the scheduler will not unpack function but will instead keep it
as bytes. Eventually those bytes will be sent to a worker, which will then
unpack the bytes into a proper Python function. Because the Scheduler never
unpacks language-specific serialized bytes it may be in a different language.
**The client and workers must share the same language and software environment,
the scheduler may differ.**
This has a few advantages:
1. The Scheduler is protected from unpickling unsafe code
2. We could conceivably implement workers and clients for other languages
(like R or Julia) and reuse the Python scheduler. The worker and client
code is fairly simple and much easier to reimplement than the scheduler,
which is complex.
3. The scheduler might some day be rewritten in more heavily optimized C or Go

CC @fjetter

@crusaderky crusaderky added documentation Improve or add to documentation and removed needs triage labels Oct 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improve or add to documentation
Projects
None yet
Development

No branches or pull requests

1 participant