[TT-1928] improve startup retry logic #1540
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
now we will not change container name on startup retry, but delete old container instead, as new name creates connection problems for containers that depend on the restarted one (they do not know, they need to connect to it using the new name).
This happened here, where startup retry worked, but CL node didn't find the new container.
Below is a summarization created by an LLM (gpt-4-0125-preview). Be mindful of hallucinations and verify accuracy.
Why
The changes improve the container retry mechanisms in our Docker library by introducing a more robust error handling and cleanup process before retrying container starts. Specifically, it addresses scenarios where a container fails to start due to image or platform compatibility issues, ensuring a fresh attempt is made without reusing the problematic container.
What
context
,container
fromgithub/docker/docker/api/types
, anderrors
fromgithub/pkg/errors
imports to support new error handling and container removal logic.NaiveRetrier
andLinuxPlatformImageRetrier
functions to remove the existing container before retrying to start a new one. This involves settingreq.Reuse
tofalse
and calling a newremoveContainer
function if an error occurs during container start.removeContainer
function that attempts to remove a container by its name using the Docker provider. It handles errors gracefully, ignoring the error if the container does not exist, ensuring the function only intervenes when necessary to clean up resources.