-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature/Certificate] Restart node when certificate updated #152
base: master
Are you sure you want to change the base?
[Feature/Certificate] Restart node when certificate updated #152
Conversation
Have you tried out the certificate auto-reload feature that NiFi now supports out of the box? It might be worth testing out that workflow, before getting the operator to do the cert management via this PR? |
I never heard about that so yes I'll try to use the feature before going further with this PR. |
If cert-manager automatically renews certificates and the configured Secrets get updated, then Kubernetes will automatically update the volumes mounted to the running Pods: https://kubernetes.io/docs/concepts/configuration/secret/#mounted-secrets-are-updated-automatically If this is the case, then we might be able to take advantage of that feature of NiFi automatically. There's one condition where Kubernetes will not automatically update Secrets mounted as volumes in Pods when a mount nifikop/pkg/resources/nifi/pod.go Lines 308 to 342 in 8874319
|
We chatted about this a little bit afterwards. This auto-reload feature works for the NiFi Jetty web server only. It doesn't apply to any However, a less disruptive approach to node restarts would be to have the operator watch the Certificates its creating, detect renewals, and restart (e.g. disable/re-enable) all controller services and processors with |
The problem is not on the dataflow side but on the pure NiFi side. When the certificate has expired, NiFi does not automatically reload it. So when we try to access the interface, we are blocked because the certificate is no longer valid. We have to force NiFi to reload the new certificate by restarting it. Maybe our way of solving the problem or the problem itself is not the right one. |
If i'm not mistaken, accessing the UI is what the |
8dc1fe9
to
2c33359
Compare
I have recreated the problem. I set a |
I dug around trying to figure out why it wasn't working. So it seems that the function can't be used unless you know a method to delete this timestamp directory (I didn't find it). |
Thank you for spending the time to track this down. That makes sense. It looks like Jetty has a unit test that specifically handles symlink targets. I wonder if it's in a newer version of Jetty? More details here jetty/jetty.project#5042 |
I've talked to the folks at NiFi and @michael81877, and it doesn't seem like this is expected behavior (link to discussion). I have opened a ticket at NiFi to have this fixed. Now the question is what to do with the PR. We can keep the feature but make it optional to have a system that would work in NiFi versions without the fix, or we drop it and hope that on the NiFi side the fix is done for the next release and that it fixes the problem. |
Yeah perhaps we can still pursue node restarts and make that a flag as a part of the SSL config. Then it wouldn't matter what NiFi version the operator is controlling. And then starting from whatever version of NiFi has the fix, the restarts are no longer necessary. |
8f077e4
to
b452e9c
Compare
b452e9c
to
f8c34ad
Compare
@erdrix @michael81877 ready for another review 😄 |
What's in this PR?
This PR aims at creating a new field in the struct NodeState called CertificateExpireDate, and then automate the Nifi cluster rolling upgrade when the certificate is renewed (handle by the cert manager).
The idea is that we will give the field CertificateExpireDate the value of the current certificate renewal date, during each reconcile we will check if the CertificateExpireDate field has the same value as the current certificate renewal date, if no, we will do a rolling upgrade of the cluster and then assign the new renewal date to each node.
Why?
When the certificate handle by the cert manager was renewed it was necessary to restart the Nifi nodes manually to be able to access the UI.
Additional context
Checklist
ToDo