-
Notifications
You must be signed in to change notification settings - Fork 97
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Covalent dispatcher requires all of a workflow's package dependencies. #748
Comments
The encoded-workflow-processing-exp branch (work in progress but basically complete
While a proper design doc is forthcoming, the basic idea is that all data related to the workflow is now decoded and processed in the If one instead replaces the "workflow_executor" line with
then one gets the following output:
In this case, the dispatcher has halted workflow execution just before postprocessing. One can then postprocess the workflow "offline" (without the Covalent server running) by calling
|
Hey @cjao thanks for the detailed issue. Is there any way to relax the assumption of client side get result having the same environment? Ideally we want those to be different (at-least just to look at the result). I suspect we can somehow already get the "return" electron/type by post processing while preprocessing happens and just set the result to that. Maybe I am missing something. Thoughts @kessler-frost ? |
Hi @santoshkumarradha , the workflow is constructed by the client, so the client environment is necessarily able to understand the type of the decoded result. Perhaps you are referring to the possibility that the client environment changes after workflow submission. In that case, the client can still view a string or JSON representation of the result (if the result is JSON-serializable); results are encoded as |
Indeed true it's in the client side, but need not be the same client nor have the same environment. But if there is a string representation, that's good, but I may be missing something, don't we need postprocessong to happen for us to know what a lattice result is even for getting the string representation ? If so, where is this post processing happening for the string ? |
True, but that’s just node result. Workflow results require post
processing. I.e if the lattice returns a combination of electrons or
operations on electrons (which is a subset of final children nodes) we need
to return that as result. This value of what to return , currently is not
known until post processing happens.
On Fri, Jul 1, 2022 at 6:20 PM Casey Jao ***@***.***> wrote:
The string representation is stored in the TransportableObject during
serialization.
—
Reply to this email directly, view it on GitHub
<#748 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AG74SGB5KLNYTAQ3NQAB5DLVR5VMNANCNFSM52JGKDKQ>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
--
Santosh Kumar Radha
Agnostiq Inc.
|
The string representation of the workflow result is indeed computed during post-processing. However, the post-processing doesn't have to take place client-side. The only requirement is that the post-processing environment satisfies the dependencies of the workflow. For instance, one can perform the post-processing on a different computer; internally, post-processing simply becomes another task that can be run using any executor. |
@santoshkumarradha So, to the best of my understanding, the only assumption on the client side is that it should have sufficient packages installed to understand each node's result and the final lattice's result; it is not necessary for the client to have all the packages installed that were used during the actual execution, even if it is doing post-processing on the client-side (assuming there are no non-electron executions happening inside the lattice definition). Is that what you were asking or am I misunderstanding something? |
To elaborate on @kessler-frost's reply: There are several types of dependencies:
Each class of deps is dealt with differently.
In
The first two steps are now moved outside the server process and can be run using any executor whose environment satisfies the dependencies. As for the last point, the PR modifies the SDK so that the server can reconstruct a Lattice purely using |
Design docTerms
RationaleThe basic design of Covalent already allows the dispatcher to manage a workflow without being fully aware of the workflow data; the essential information is the input-output relationships between tasks, in the words, the structure of the transport graph and not the raw contents of each task. To actually process a workflow without requiring its dependencies, the Covalent server must observe two basic principles:
Presently, the Covalent server violates the first principle in several ways:
It also unpickles data during several steps of the workflow processing pipeline:
The proposed change moves all unpickling and pickling out of the server process. In addition, the server process only handles data types defined by Covalent:
Main modifications
Assuming that a workflow doesn't use the Some Implications
json.loads(lattice.transport_graph.serialize_to_json()) returns the transport graph in node-link form, with the additional benefit that all metadata objects are in their JSON representations. Thus one can save and load the data using
Status
|
This issue goes hand in hand with the import dependencies issue #674.
The current implementation of Covalent dispatcher requires that the dispatcher's environment satisfies all of a workflow's package dependencies. However, different workflows may have possibly conflicting dependencies. Worse, some of their deps might conflict with Covalent's own dependencies. For instance, our own Quantum Chemistry tutorial is incompatible with the version of numpy required by Covalent. In addition, in a on-prem deployment with a central Covalent server it is impractical, not to mention really bad practice, for each end user to modify the server's environment; the server should ideally run in a lean and fixed environment.
To avoid this limitation, Covalent must process a workflow without unpickling any workflow-specific data. These include:
These must remain serialized in the server process. In particular, the server process must never try to directly execute any code submitted by the user, which it currently does during post-processing or building sublattice graphs.
covalent-server
,covalent-client
and install Covalent in both.covalent-server
with the--no-cluster
option.covalent-client
, installpandas
.The following snippet fails when run from
covalent-client
:covalent logs
:Indeed, the Covalent server currently handles the raw (unserialized) workflow data in several places, and
covalent-server
doesn't have pandas.The text was updated successfully, but these errors were encountered: