-
Notifications
You must be signed in to change notification settings - Fork 142
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Warehouse does not reconcile #2836
Comments
@wmiller112 first, my apologies. I really thought we'd found it and nailed it in #2544 and #2754. I know you don't suspect rate limiting at this point (nor do I), but I'm still curious if you observe any change in this behavior when you're subscribed to just one repo. In particular, if you subscribe only to the Git repo, does the situation improve? Also, just to get more data here, is there any sharding involved here? |
It may be an idea for us to bake an image with |
Its the same for a warehouse with only a git subscription. I've built/am running the controller image with pprof, and I'm able to collect profiles/traces, but am not exactly sure what to look for. Also added some additional logging around requeue interval. Confirmed requeue is being set to defined interval when reconcile does run. Edit: Ah, I see the debug readme that was added in the pr instructs to collect heap profile |
The things typically of interest are the following endpoints:
|
Some of these profiles can hypothetically contain slightly sensitive data (e.g., memory addresses or minimal operational details). If you want to share them, you can do so via mail: |
My current working theory based on the traces shared by @wmiller112 in private, is that due to the concurrency configuration of our reconcilers currently being set to the default (which equals 1). Things are not being picked up in time because the controller is working on gathering the latest image build information for another Warehouse (which can be quite an expensive task depending on the number of images you have). I have shared a branch with him (https://github.com/hiddeco/kargo/tree/allow-concurrency-config) that allows the configuration of the number of concurrent reconciliations to see if this proves I am right. If this is true, we should create a proper solution for this and think about the default configuration we want to ship with. |
Would it be possible for you to send me another set of profiles with this configuration enabled? |
After some more debugging through private channels, the reason for this can be summarized as follows:
To address this, we should take multiple actions:
|
Hi, I am on 1.0.3 and I observing the similar behavior. I have 1 project, 1 warehouse, 1 stage and it does not discover new docker images automatically... If I press button to refresh, it will find it but other otherwise it does not do anything. apiVersion: kargo.akuity.io/v1alpha1
kind: Warehouse
metadata:
name: testing-playground-warehouse
namespace: app
spec:
interval: "1m0s"
subscriptions:
- image:
repoURL: registry.gitlab.com/xxxxxxx/testing-playground-service
strictSemvers: false |
I deleted Project, Warehouse and Stage and recreated them again an now it works fine. Will try to do some stuff to get it stuck again. |
Just a note, I am observing this behavior again. I still have just 1 Project, 1 Stage, 1 Warehouse. I dont see any errors except:
So if it would be Rate limit I would expect to see some errors in pods logs. I dont really have reproducible steps yet, the only thing I was doing is changing Stage manifest (testing different Let me know if you think this is completely different issue, I can open new one. |
@ddeath you're looking at API server logs when you should be looking at controller logs. |
@krancour I checked all pods (kargo-api, kargo-controller, kargo-management-controoler, kargo-webhooks-server) and the only errors there where those from |
@ddeath, you'll notice @hiddeco and @wmiller112 had a lot of back and forth over the course of two issues to figure out what was going on here. @hiddeco has summarized findings nicely above. You and I also had some recent discussion about what sort of things can make Warehouse reconciliation slow (and therefore become a bottleneck). Please try to use this information to determine whether you may have one or more poorly-configured Warehouses that are taking a long time to reconcile and blocking others from reconciling at regular intervals. Enabling debug or even tracke-level logging may help as well. |
Yeah I am slowly doing that. However as I noted I have only 1 Warehouse with only 1 subscription (with private token, not public one so rate limits will be higher) and I am not seeing any reconcile happening for a long time (30 minutes). So just reporting my findings 🤷 |
What are its details? Is it a subscription you'd expect to perform poorly based on our other discussion? |
Checklist
kargo version
.Description
This is a continuation of this issue. I continue to experience warehouses not respecting spec.interval or reconciling when changed. I've observed the same behavior across ~20 different projects. It does not seem related to git/image repo because when the reconcile does finally trigger, it executes in a matter of seconds, as can be seen in the screenshot of logs. The reconcile occurs at fairly regular interval of close to 15-20 minutes when the interval is set to any time, ranging from 2m to 10m. Triggering a sync via gui adds annotation to warehouse, but even this does not trigger the reconcile. It eventually reconciles on the same interval.
Screenshots
Steps to Reproduce
I'd expect immediately after clicking reconcile, the modified warehouse would be seen by the controller and reconcile - triggering a freight fetch.
Version
The text was updated successfully, but these errors were encountered: