Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

worker won't start if s3 is not configured on pebble-ready, and will never be started #51

Closed
PietroPasotti opened this issue Aug 15, 2024 · 0 comments · Fixed by #56
Closed

Comments

@PietroPasotti
Copy link
Contributor

Bug Description

if you deploy a worker app 'too long' before you integrate s3 (and s3 is ready), the worker will fail to start the pebble service:

x7557f17e0dc0>: Failed to establish a new connection: [Errno 111] Connection refused'))                 
unit-tempo-worker-0: 12:54:34 INFO juju.worker.uniter.operation ran "tempo-pebble-ready" hook (via hook 
dispatching script: dispatch)                                                                           
unit-tempo-worker-0: 12:54:37 ERROR unit.tempo-worker/0.juju-log tempo-cluster:29: failed to (re)start w
orker job: cannot perform the following tasks:                                                          
- Start service "tempo" (cannot start service: exited quickly with code 1)
----- Logs from task 0 -----                                                                            
2024-08-15T10:54:37Z INFO Service "tempo" has never been started.                                       
----- Logs from task 1 -----                                                                            
2024-08-15T10:54:37Z INFO Most recent service output:                                                   
    level=info ts=2024-08-15T10:54:37.832596944Z caller=main.go:225 msg="initialising OpenTracing tracer
"                                                                                                       
    level=info ts=2024-08-15T10:54:37.833923905Z caller=main.go:118 msg="Starting Tempo" version="(versi
on=, branch=, revision=2225623e72362fa29496b1f3fcba337c4a982687)"                                       
    level=error ts=2024-08-15T10:54:37.835451845Z caller=main.go:121 msg="error running Tempo" err="fail
ed to init module services: error initialising module: store: failed to create store: unexpected error f
rom ListObjects on tempo: Get \"http://minio-0.minio-endpoints.minio.svc.cluster.local:9000/tempo/?locat
ion=\": dial tcp 10.1.232.179:9000: connect: connection refused"                                        
2024-08-15T10:54:37Z ERROR cannot start service: exited quickly with code 1                             
-----                                                                                                   
Traceback (most recent call last):                                                                      
  File "/var/lib/juju/agents/unit-tempo-worker-0/charm/venv/cosl/coordinated_workers/worker.py", line 31
1, in restart                                                                                           
    self._container.restart(self._name)                                                                 
  File "/var/lib/juju/agents/unit-tempo-worker-0/charm/venv/ops/model.py", line 2274, in restart        
    self._pebble.restart_services(service_names)                                                        
  File "/var/lib/juju/agents/unit-tempo-worker-0/charm/venv/ops/pebble.py", line 2201, in restart_servic
es                                                                                                      
    return self._services_action('restart', services, timeout, delay)                                   
  File "/var/lib/juju/agents/unit-tempo-worker-0/charm/venv/ops/pebble.py", line 2226, in _services_acti
on                                                                                                      
    raise ChangeError(change.err, change)                                                               
ops.pebble.ChangeError: cannot perform the following tasks:                                             
- Start service "tempo" (cannot start service: exited quickly with code 1)                              

When eventually s3 joins and a bucket becomes ready, the worker knows nothing about it.
Therefore, the pebble service remains down indefinitely and the worker never attempts to restart it until there's a config change.

To Reproduce

juju deploy tempo-worker-k8s worker
juju deploy tempo-coordinator-k8s coord
juju relate worker coord

wait for active/idle

juju deploy some-s3-charm s3
juju realte s3 coord

Environment

No response

Relevant log output

No response

Additional context

No response

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant