added worker error on timeout if restart fails #56
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Fixes #51
Issue
The worker attempts to start the services in the layer as soon as it has all the necessary pieces of config.
If that fails, for whatever reason, it currently reports 'active' and that's it.
Any follow-up event, unless it signals config changes, will not even attempt to restart the services.
Solution
The worker will retry for 15 minutes (default, configurable) to start the services and, if it fails, raise an exception and let the charm go into error state and write some logs that tell the user what's wrong and that it's probably an issue with some external services that the workload depends upon to start, such as s3.
The idea is that juju will retry the hook and eventually as those external services come up, the issue will resolve itself.
Rejected alternatives:
resurrect
or something similar to wake yourself up with custom events periodically until the check finally passesContext
https://discourse.charmhub.io/t/its-probably-ok-for-a-unit-to-go-into-error-state/13022/30
Testing Instructions