-
-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[cert-manager] invalid peer certificate: UnknownIssuer #509
Comments
Similar issue occurred again around I deleted the secret and deleted / recreated the TemporalClusterClient and connection worked fine. I have multiple @alexandrevilain Any suggestions on how to troubleshoot this? I'm not familiar with how the operator handles these client certs. |
Some additional details, for one of the certs that was not rotated I added an annotation to the I then deleted the secret and changed the tag again, and the operator reconciled again (with no errors) and this time the secrete was recreated with the new correct TLS cert. Digging in (warning, I haven't touched go in a long time) I see that the reconciler is using
In this case an empty function is passed to the callback so based on the docs it's probably never updating the secret if it already exists. On a second note I'm not sure when the reconciler runs for the |
I am also facing the exact same issue that the secret in the client namespace does not get updated when a cert is being renewed in cluster namespace. Can you share the expected time that this bug will be fixed? We tried this operator out and plan to use it, but it won't work unless this bug is fixed. If it won't be fixed soon, can you please share some details on the following so that we can try to fix it ourselves while waiting for the official fix?
|
Hi ! Thanks for reporting this issue! I'm digging a bit more in the issue to find what's causing the issue. |
I've got a cron job set up to copy the secret between namespaces for now. Not elegant but works as long as you have a decent overlap between when the cert is rotated and when it expires. |
Thanks, @plaisted ! We are also looking for potential work around. And will share what our solution is later once it's working for us. |
While testing the mtls feature, we actually found another issue. When mtls is enabled with cert-manager as a provider, Everything still works fine after the worker certificate and frontend-intermediate-ca-certificate were renewed. But the system worker will stop working right after the root-ca-certificate is renewed. I double checked the log and the renewal time, the failure started right after the root-ca-certificate is renewed. Right after the failure, even restarting the worker pod does not fix it, only thing so far can recover is to delete the certificates and have them reissued. Sometimes, after a long time, restarting the worker pod fixes it. I believe the admintool pod also sees the similar issue. @plaisted are you seeing the same? @alexandrevilain please let me know if you need more information or logs. It seems pretty easy to reproduce it. It happens all the time consistently. The configuration for certificate renewal is as below mTLS: The logs shows that worker fails {"level":"warn","ts":"2023-10-28T01:54:45.515Z","msg":"Failed to poll for task.","service":"worker","Namespace":"temporal-system","TaskQueue":"temporal-sys-history-scanner-taskqueue-0","WorkerID":"1@temporaldevmtls-worker-b94f46888-q6wpn@","WorkerType":"WorkflowWorker","Error":"last connection error: connection error: desc = "transport: authentication handshake failed: tls: failed to verify certificate: x509: certificate signed by unknown authority (possibly because of \"crypto/rsa: verification error\" while trying to verify candidate authority certificate \"Root CA certificate\")"","logging-call-at":"internal_worker_base.go:308"} |
@nchen622 Yea, I've noticed the same (or similar issues). I haven't gotten a chance to dig in yet. For the root CA rotation to work I believe you'd need something like:
That would obviously take a decent amount of coordination. I do see |
@plaisted Thanks for the notes. I thought the operator and cert-manager will handle this automatically. Maybe not fully? Strange thing is that all the internode communication seemed still fine after the root-ca-certificate was renewed. I wonder what are the differences except that they have their own intermediate-ca-certificates |
Reopening as this issue has more than one subject. |
@nchen622 I thought so as well, but seems like cert-manager may not handle it (see here suggesting a similar approach to what I mentioned, but not implemented in cert-manager). I haven't read up on it much so maybe there is a way, but seems like in current setup if you truly rotate the root CA / secret you'll have downtime. It may be best to disable root CA rotation or at least have a warning about downtime etc. I think I'll be disabling the root CA rotation and manually trigger rotation when appropriate in a maintenance window unless someone has an automated solution. |
@plaisted Thanks for the suggestion. We also decided to make the root ca rotation cycle long and put a manual process around it. |
@alexandrevilain Do you have an estimate on when the 0.16.0 release will be available? We want to utilize the feature that the client secret is auto synced |
@alexandrevilain I tested with the 0.16.1 operator version and observed that the secret created by temporalclusterclient is not synced in the worker namespace after the certificate is renewed. Both the secrets i.e. in temporal cluster namespace and in worker namespace differ in content, and we observe bad/invalid certificate errors. Could you please check? Below is the manifest that I used -
Below is the mtls configuration in temporal cluster manifest. We have put small duration to test the secret sync on certificate expiration and renewal.
Our worker logs once the certs get expired (and secret is not updated on renewal) -
Also, I don't see any errors in the temporal-operator logs. |
@alexandrevilain Could you please take a look at the issue? Is there anything that I am missing? Thanks. |
Hi @vinimona ! Thanks for reporting this. |
@alexandrevilain Thanks for responding. I deleted operator pod, then it reconciled the cluster client and secret in worker namespace was synced. And when the certs were renewed after the given duration, it went to the same state. I again deleted operator pod and it synced the secrets. So it appears it is a watch issue. Could you please prioritize this? |
Please note that I'm maintaining this in my free time.
Thanks for the information, If found the issue. The operator was watching for |
@alexandrevilain I still see the issue where the certs are rotated and everything basically crashloops or fails requests due to expired certs. I have to delete the cert secrets and also the deployments to let the operator reconcile it properly. Is this one of the issues here? It looks like here was for only CAs-- it's for all certs it seems. I can make a new issue if this wasn't it also |
Hi @ElanHasson ! Let's do another issue it will be easier to follow it :) Thanks ! |
I have a cluster with mtls set up as shown below. I originally had the certificate durations much shorter but updated to the length below.
Later I attempted to connect via a client using a certificate from the secret created via
TemporalClusterClient
and gotinvalid peer certificate: UnknownIssuer
. I then:TemporalClusterClient
and used the new secret and still had same issue.I'm not sure if this was due to me updating the certificate durations on
TemporalCluster
or some other issue but somehow the certificates were not properly renewed. I'll keep experimenting with this but opened the issue in case others have similar problems.The text was updated successfully, but these errors were encountered: