Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Repo timeout locks full reconciliation cycle #148

Closed
kuzm1ch opened this issue Jun 2, 2021 · 4 comments · Fixed by #153
Closed

Repo timeout locks full reconciliation cycle #148

kuzm1ch opened this issue Jun 2, 2021 · 4 comments · Fixed by #153
Assignees
Labels
good first issue Good for newcomers

Comments

@kuzm1ch
Copy link

kuzm1ch commented Jun 2, 2021

Reflector-version: 0.6.0

Image:         ghcr.io/fluxcd/image-reflector-controller:v0.6.0
    Image ID:      docker-pullable://ghcr.io/fluxcd/image-reflector-controller@sha256:3f2f7e14681e92165b6bcd86fb28cf87bda60628535ab7613a3d3095426cf7b4
    Ports:         8080/TCP, 9440/TCP
    Host Ports:    0/TCP, 0/TCP
    Args:
      --events-addr=http://notification-controller/
      --watch-all-namespaces=true
      --log-level=info
      --log-encoding=json
      --enable-leader-election
      ```
      

Please take a look on ts. 15:41:18. -->(~2 minutes) --> 15:43:52 --> 15:43:52

{"level":"info","ts":"2021-06-02T15:41:18.217Z","logger":"controller-runtime.manager.controller.imagerepository","msg":"reconciliation finished in 114.811428ms, next run in 1m0s","reconciler group":"image.toolkit.fluxcd.io","reconciler kind":"ImageRepository","name":*********","namespace":"flux-system"}
{"level":"error","ts":"2021-06-02T15:43:52.355Z","logger":"controller-runtime.manager.controller.imagerepository","msg":"Reconciler error","reconciler group":"image.toolkit.fluxcd.io","reconciler kind":"ImageRepository","name":"*********,"namespace":"flux-system","error":"Get \"https://git.something.com:5050/v2/\": dial tcp **********:5050: i/o timeout"}
{"level":"info","ts":"2021-06-02T15:43:52.535Z","logger":"controller-runtime.manager.controller.imagerepository","msg":"reconciliation finished in 179.660488ms, next run in 1m0s","reconciler group":"image.toolkit.fluxcd.io","reconciler kind":"ImageRepository","name":"********","namespace":"flux-system"}
  1. Imagerepo reconciliation is stoped in case of timeout for one of the repo.
    In this case, one broken image repo(even temporary) breaks all ci/cd.

  2. Another problem (small one, the first point is more relevant) - by default timeout is equal to the interval. And in theory, an interval for 15 minutes(which is a quite possible scenario) can lock all reconciliation for 15 minutes.
    Can it be 1 minute by default to prevent such cases?

@kuzm1ch kuzm1ch changed the title Repo timeout locks full reconciliation cycle for all image repos Repo timeout locks full reconciliation cycle Jun 2, 2021
@stefanprodan
Copy link
Member

stefanprodan commented Jun 2, 2021

To solve this we need to set the concurrent flag, so that the controller can process resources in parallel. Implementation example here: https://github.com/fluxcd/kustomize-controller/blob/main/main.go#L78

@kuzm1ch
Copy link
Author

kuzm1ch commented Jun 2, 2021

@stefanprodan thanks, concurrent will solve it, I'll try

what do you think the second point? Few broken images repo with large interval can still break all reconciliation.
Or maybe there is an exact reason why default timeout is equal to the interval.

@stefanprodan
Copy link
Member

You can't try it, it needs to be implemented first.

@squaremo squaremo added the good first issue Good for newcomers label Jun 8, 2021
@makkes
Copy link
Member

makkes commented Jun 23, 2021

fyi: I started work on the fix.

makkes pushed a commit to makkes/image-reflector-controller that referenced this issue Jun 23, 2021
Default for both, the ImageRepository and the ImagePolicy controllers
is 4 workers.

closes fluxcd#148
makkes pushed a commit to makkes/image-reflector-controller that referenced this issue Jun 23, 2021
Default for both, the ImageRepository and the ImagePolicy controllers
is 4 workers.

closes fluxcd#148

Signed-off-by: Max Jonas Werner <mail@makk.es>
makkes pushed a commit to makkes/image-reflector-controller that referenced this issue Jun 23, 2021
Default for both, the ImageRepository and the ImagePolicy controllers
is 4 workers.

closes fluxcd#148

Signed-off-by: Max Jonas Werner <mail@makk.es>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants