Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Watch directories for certificate hot-reload #4159

Merged
merged 3 commits into from
Jan 16, 2023

Conversation

tsaarni
Copy link
Contributor

@tsaarni tsaarni commented Jan 13, 2023

Which problem is this PR solving?

This PR fixes the certificate hot-reload on Kubernetes.

Resolves #3968

Short description of the changes

This PR moves the inotify watch from certificate files to monitoring the parent directory instead, as also recommended here by the fsnotify library.

Without this change, certificate hot-reload only works if the content of certificate or key files is overwritten. That approach is rarely used for update since it is not atomic and it risks files being read in the middle of the update. Therefore more often the files are replaced by a rename operation, by swapping the old file with a new one. It allows the file to be completely written and closed before exposing it to the application. However, in this update scheme, the old file gets deleted and the current inotify watch on file level is automatically terminated. No reload happens and warning message like following was printed into log

{"msg":"Certificate has been removed, using the last known version","certificate":"/certs/server-ca.pem"}

Example use case:

When Kubelet writes certificate and key files on Kubernetes Secret volume, the target files are symbolic links to files in a different directory - a hidden subdirectory in a same level as the files themselves. During update, that directory is swapped with a new one, while the symbolic links remains the same. This guarantees atomic swap for both certificate and key files at once. It also means that any rename event received at the parent directory level might indicate that the files were replaced, even if name of the renamed file was not any of the files being monitored. Therefore this PR proposes checking the hashes of the files to detect changes.

With this change, following update schemes will work:

  • Replace file content in-place (old behaviour: e.g. cat newfile > cert.pem)
  • Update the files by making any rename operation in the parent directory (Kubernetes secret volume update scheme: mv ..data_tmp ..data)
  • Update the files by renaming the target files (e.g. mv cert.pem.tmp cert.pem)

Signed-off-by: Tero Saarni tero.saarni@est.tech

@tsaarni tsaarni requested a review from a team as a code owner January 13, 2023 16:53
@tsaarni
Copy link
Contributor Author

tsaarni commented Jan 13, 2023

Hi @pavolloffay, I noticed that you did the certificate reload implementation few years back. It would be greatly appreciated if you could take a look at this PR! Thank you 🙏

@tsaarni tsaarni force-pushed the cert-hot-reload-on-kubernetes branch from 52b2d2b to 0c8af96 Compare January 14, 2023 06:15
@tsaarni
Copy link
Contributor Author

tsaarni commented Jan 14, 2023

I've corrected a formatting mistake. @yurishkuro could you please re-trigger the tests. Thank you!

@codecov
Copy link

codecov bot commented Jan 14, 2023

Codecov Report

Base: 97.12% // Head: 97.09% // Decreases project coverage by -0.03% ⚠️

Coverage data is based on head (a5bb77e) compared to base (c104062).
Patch coverage: 96.00% of modified lines in pull request are covered.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4159      +/-   ##
==========================================
- Coverage   97.12%   97.09%   -0.04%     
==========================================
  Files         296      296              
  Lines       17452    17493      +41     
==========================================
+ Hits        16950    16984      +34     
- Misses        404      409       +5     
- Partials       98      100       +2     
Flag Coverage Δ
unittests 97.09% <96.00%> (-0.04%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
pkg/config/tlscfg/cert_watcher.go 92.59% <96.00%> (-4.22%) ⬇️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@tsaarni
Copy link
Contributor Author

tsaarni commented Jan 14, 2023

Coverage results were accidentally decreased, hopefully fixed now with the latest commit. Not sure about CIT OpenSearch failure though. It looks to me there might have been similar errors before with that same job, maybe a fluke?

@tsaarni
Copy link
Contributor Author

tsaarni commented Jan 14, 2023

Yes apparently CIT OpenSearch failure was a fluke. However coverage was still under target %. I think one test was missing even before: replace CA certs with bad ones during update. I added that now. Besides that one, the missed lines are error branches that might be difficult to produce without stubbing the library calls.

@yurishkuro
Copy link
Member

Yeah OS tests have been failing quite often recently. I'll rerun them till they pass.

Just curious, why was it necessary to regenerate test certificates?

@tsaarni
Copy link
Contributor Author

tsaarni commented Jan 14, 2023

Yeah OS tests have been failing quite often recently. I'll rerun them till they pass.

Ok, thank you!

Just curious, why was it necessary to regenerate test certificates?

I wanted to have second server certificate for new test case TestReload_kubernetes_secret_update() and therefore I modified gen-certs.sh to generate that (example-server-2-key.pem and example-server-2-cert.pem). When running that script, it regenerated the old test certs too. It is not completely necessary, it can be avoided as well.

BTW as a shameless plug, I've created a tool to generate test certs with Go API to avoid storing test certs and keys in the first place. But since it is just maintained by me alone, I assume Jaeger does not want that as a new dependency :-)

@yurishkuro
Copy link
Member

I merged some fixes for opensearch integration test, please rebase, hopefully it should clear the failure

Instead of adding inotify watches on files, which mostly get removed during
certificate renewal, add watches for the parent directories and consider
inotify events only as indication that some of the files might have changed.

The change is required for the most typical atomic file update schemes,
for example for the scheme used by Kubernetes to update secret volumes.

Signed-off-by: Tero Saarni <tero.saarni@est.tech>
@tsaarni tsaarni force-pushed the cert-hot-reload-on-kubernetes branch from 8e0380b to 07dde79 Compare January 16, 2023 18:04
@tsaarni
Copy link
Contributor Author

tsaarni commented Jan 16, 2023

@yurishkuro I've now rebased, as well as removed the test data changes that were not necessary and squashed commits.

@yurishkuro
Copy link
Member

@tsaarni I added a couple commits to DRY-up some of the code, but was not able to push changed to your fork. Do you want to cherry-pick the two extra commits from #4165? Otherwise I could merge that new PR directly.

@tsaarni
Copy link
Contributor Author

tsaarni commented Jan 16, 2023

@yurishkuro It is OK to me if you push #4165 instead of this PR. Thank you!

Signed-off-by: Yuri Shkuro <github@ysh.us>
Signed-off-by: Yuri Shkuro <github@ysh.us>
@tsaarni
Copy link
Contributor Author

tsaarni commented Jan 16, 2023

@yurishkuro It is OK to me if you push #4165 instead of this PR. Thank you!

I ended up doing cherry-pick here as well but either way is OK!

Copy link
Member

@yurishkuro yurishkuro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@yurishkuro yurishkuro enabled auto-merge (squash) January 16, 2023 20:14
@yurishkuro yurishkuro merged commit 3bcac80 into jaegertracing:main Jan 16, 2023
shubbham1215 pushed a commit to shubbham1215/jaeger that referenced this pull request Feb 22, 2023
## Which problem is this PR solving?
This PR fixes the certificate hot-reload on Kubernetes.

Resolves jaegertracing#3968

## Short description of the changes
This PR moves the inotify watch from certificate files to monitoring the
parent directory instead, as also recommended
[here](https://github.com/fsnotify/fsnotify/blob/c6f5cfa163edb0f1bb78be3a77053ee14c48a3ce/backend_inotify.go#L246-L254)
by the [fsnotify](https://github.com/fsnotify/fsnotify) library.

Without this change, certificate hot-reload only works if the content of
certificate or key files is overwritten. That approach is rarely used
for update since it is not atomic and it risks files being read in the
middle of the update. Therefore more often the files are replaced by a
rename operation, by swapping the old file with a new one. It allows the
file to be completely written and closed before exposing it to the
application. However, in this update scheme, the old file gets deleted
and the current inotify watch on file level is automatically terminated.
No reload happens and warning message like following was printed into
log

```json
{"msg":"Certificate has been removed, using the last known version","certificate":"/certs/server-ca.pem"}
```

Example use case:

When Kubelet writes certificate and key files on Kubernetes Secret
volume, the target files are symbolic links to files in a different
directory - a hidden subdirectory in a same level as the files
themselves. During update, that directory is swapped with a new one,
while the symbolic links remains the same. This guarantees atomic swap
for both certificate and key files at once. It also means that any
rename event received at the parent directory level might indicate that
the files were replaced, even if name of the renamed file was not any of
the files being monitored. Therefore this PR proposes checking the
hashes of the files to detect changes.

With this change, following update schemes will work:
* Replace file content in-place (old behaviour: e.g. `cat newfile >
cert.pem`)
* Update the files by making any rename operation in the parent
directory (Kubernetes secret volume update scheme: `mv ..data_tmp
..data`)
* Update the files by renaming the target files (e.g. `mv cert.pem.tmp
cert.pem`)

Signed-off-by: Tero Saarni <tero.saarni@est.tech>

Signed-off-by: Tero Saarni <tero.saarni@est.tech>
Signed-off-by: Yuri Shkuro <github@ysh.us>
Co-authored-by: Yuri Shkuro <github@ysh.us>
shubbham1215 pushed a commit to shubbham1215/jaeger that referenced this pull request Mar 5, 2023
## Which problem is this PR solving?
This PR fixes the certificate hot-reload on Kubernetes.

Resolves jaegertracing#3968

## Short description of the changes
This PR moves the inotify watch from certificate files to monitoring the
parent directory instead, as also recommended
[here](https://github.com/fsnotify/fsnotify/blob/c6f5cfa163edb0f1bb78be3a77053ee14c48a3ce/backend_inotify.go#L246-L254)
by the [fsnotify](https://github.com/fsnotify/fsnotify) library.

Without this change, certificate hot-reload only works if the content of
certificate or key files is overwritten. That approach is rarely used
for update since it is not atomic and it risks files being read in the
middle of the update. Therefore more often the files are replaced by a
rename operation, by swapping the old file with a new one. It allows the
file to be completely written and closed before exposing it to the
application. However, in this update scheme, the old file gets deleted
and the current inotify watch on file level is automatically terminated.
No reload happens and warning message like following was printed into
log

```json
{"msg":"Certificate has been removed, using the last known version","certificate":"/certs/server-ca.pem"}
```

Example use case:

When Kubelet writes certificate and key files on Kubernetes Secret
volume, the target files are symbolic links to files in a different
directory - a hidden subdirectory in a same level as the files
themselves. During update, that directory is swapped with a new one,
while the symbolic links remains the same. This guarantees atomic swap
for both certificate and key files at once. It also means that any
rename event received at the parent directory level might indicate that
the files were replaced, even if name of the renamed file was not any of
the files being monitored. Therefore this PR proposes checking the
hashes of the files to detect changes.

With this change, following update schemes will work:
* Replace file content in-place (old behaviour: e.g. `cat newfile >
cert.pem`)
* Update the files by making any rename operation in the parent
directory (Kubernetes secret volume update scheme: `mv ..data_tmp
..data`)
* Update the files by renaming the target files (e.g. `mv cert.pem.tmp
cert.pem`)

Signed-off-by: Tero Saarni <tero.saarni@est.tech>

Signed-off-by: Tero Saarni <tero.saarni@est.tech>
Signed-off-by: Yuri Shkuro <github@ysh.us>
Co-authored-by: Yuri Shkuro <github@ysh.us>
Signed-off-by: shubbham1215 <sawaikershubham@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug]: Jaeger is not reloading the renewed certificates from filesystem even during connection setup
2 participants