-
Notifications
You must be signed in to change notification settings - Fork 81
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug] Workflow sandbox issues with Protobuf #688
Comments
Ideally we can replicate so we can help debug the problem. It is interesting that the exact same Python code does different things in different environments. It may be a protobuf version issue or an order of operations issue, but a standalone replication may be needed for us to confirm the issue.
Looking at the traceback, it's a bit of an interesting case where we catch I can't really tell the problem without a replication. Can you confirm exact protobuf version (both library and protoc generator) and continually reduce the k8s-only form of replication until it's very small and standalone? |
Here are my versions:
Here is an example of a workflow file: from temporalio import workflow
with workflow.unsafe.imports_passed_through():
from temporalio.v1.lorem_pb2 import (
LoremWorkflowRequest,
LoremWorkflowResponse,
)
@workflow.defn(name="lorem-workflow")
class LoremWorkflow:
@workflow.run
async def run(self, req: LoremWorkflowRequest) -> LoremWorkflowResponse:
return LoremWorkflowResponse(text="i am response") Could it be the issue with my Protobuf definitions and the fact that I use maps (but again, on local, all good)? message LoremWorkflowRequest {
map<string, bool> random = 1;
} |
To confirm, the exact same code from Python start to end fails in one of your environments but not another? May need to see the full standalone code (i.e. how you are registering/running worker). There may be an operations order issue.
I cannot know to be honest without replicating and debugging. The code you show above should work fine in every environment. I wonder if there are Python version differences or something else in the one environment where this happens that could help us figure out how to reliably replicate? I think the main goal is to confirm the exact same small code that fails in one environment but works in another. Even something as simple as the order something is imported may affect it which is why exact code matters. |
Yep, I understand that providing insights without a reproducible example is almost impossible. However, I figured out how to make my workflow workable. I had my Protobuf imports under unsafe context manager: with workflow.unsafe.imports_passed_through():
from temporalio.v1.lorem_pb2 import (
LoremWorkflowRequest,
LoremWorkflowResponse,
) I changed that to the following, and now I see no exceptions: from temporalio.v1.lorem_pb2 import (
LoremWorkflowRequest,
LoremWorkflowResponse,
) What is the right way? |
The first way with the imports passed through is usually better for models because with the second way you're re-importing the models on every single workflow run which costs memory and CPU. Having said that, we automatically mark any import beneath |
What are you really trying to do?
Hey!
I'm doing nothing special. I have one workflow and one activity. I use Protobuf messages as input parameters for both cases. I just expect to be able to execute the workflow and wait for it to be completed.
Describe the bug
The issue happens only in the Kubernetes environment.
Minimal Reproduction
Unfortunately, I don't have steps for reproduction. Everything works perfectly in my and my colleague's local environments.
I would love to hear ideas or proposals on what to look into to understand the root cause. The only thing I can state is that if I add the following import, everything works:
My other Protobuf imports (the result of code generation from Protobuf definitions) are already under unsafe context manager.
Environment/Versions
Additional context
The text was updated successfully, but these errors were encountered: