-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Transient S3Upload failure: KeyError('endpoint_resolver') #3925
Comments
This appears to be a boto3 issue -- client creation is not thread safe. See boto/boto3#801 We could consider using |
Created a PR to fix this issue: #3981 For anyone affected by this issue in the meantime, I wrote a workaround for our project which has worked well for the past week, by creating a new task that inherits from Sample code:
|
FYI PR is here: #3981. Don't need to pass in |
Description
Occasionally, one single mapped instance of an
S3Upload
task will fail with an unexpectedKeyError('endpoint_resolver')
which causes the flow to fail. Every time I have encountered this error, the flow was subsequently successful when it was restarted, without making any changes to the code or to any aspect of the Prefect setup.Full stacktrace:
My personal experience with this error is with a flow which creates around 25 mapped instances of the
S3Upload
task, each of which uploads one file to the same S3 bucket using the same credentials. This flow runs every day and seems to encounter an error every four or five runs, meaning theS3Upload
task fails around 1% of the time. I think I can exclude transient connection issues as the root cause, because theS3Upload
tasks all run within a minute or so of each other and only one task ever fails. Curiously, every time I have seen this failure, it has occurred on the mapped task with index1
.Expected Behavior
That all of the
S3Upload
tasks execute reliably without encountering this error. Or, failing that, I would expect that all of theS3Upload
tasks would fail for this reason if there really is some logical or structural problem with the flow.Reproduction
Unfortunately I am unable to reliably replicate this error. Included below are details of my flow and setup for reference. My best suggestion to try to replicate this reliably would be to create a flow which maps over ~25 items and causes that many instances of the
S3Upload
task to execute, then running that flow repeatedly.Flow logic:
Following lines are in
config.toml
:Environment
Output from running
prefect diagnostics
:(in the next couple of weeks our team plans to upgrade these flows to use prefect version >=0.14, if the update changes anything I'll update this issue)
The text was updated successfully, but these errors were encountered: