-
Notifications
You must be signed in to change notification settings - Fork 818
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix an error on SSL config reload (plus some cleanup) #9334
Conversation
ywkaras
commented
Jan 26, 2023
•
edited
Loading
edited
This PR replaces #8790 , which was laid waste by zwoop's merciless purge of personal branches in the main repo. The TSSslSecretGet() function is now compatible with the rest of the TS API (and bell bottoms and stagflation). |
5b34b1e
to
3b07b73
Compare
iocore/net/SSLUtils.cc
Outdated
@@ -1053,8 +1053,8 @@ SSLPrivateKeyHandler(SSL_CTX *ctx, const SSLConfigParams *params, const char *ke | |||
scoped_BIO bio(BIO_new_mem_buf(secret_data, secret_data_len)); | |||
pkey = PEM_read_bio_PrivateKey(bio.get(), nullptr, nullptr, nullptr); | |||
if (nullptr == pkey) { | |||
Debug("ssl_load", "failed to load server private key from %s", | |||
(!keyPath || keyPath[0] == '\0') ? "[empty key path]" : keyPath); | |||
SSLError("failed to load server private key (%.*s) from %s", secret_data_len < 50 ? secret_data_len : 50, secret_data, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure when to use SSLError instead of Debug, but do the battle on another PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Errors go into diags.log, so it's things that the operations people should see.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not joining the battle. I'm saying focus on fixing the API issue on this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yahoo is not currently using TSSslSecretGet(). For us, the main purpose of this change was to fix ERRORs in diags.log that we were seeing on config reload. The change to the API was just part of the effort to prevent future problems as well as the one we were seeing. Many problems happen first (or only) in prod, and will go unnoticed if they are only seen in debug trace rather than in diags.log.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see a reason to do the two things at the same time on one PR.
This line was changed to Debug about two months ago for some reasons. Changing it back to SSLError would need a discussion. I don't want to keep the API problematic until the discussion ends.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think the size matters. There are at least two distinguishable fixes. And you are delaying one of the fixes by starting the discussion here where we might be able to avoid making an incompatible change on 10.0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But that's a different line of code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But that additional SSLError log replaces all of the changed SSLError to Debug calls. SSLPrivateKeyHandler
is only called once and it will reach that additional SSLError when it returns false. And now with the SSLError called after SSLPrivateKeyHandler
return false, it can print the cert name that wasn't available inside the function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Serris.
Oh no, we released 9.2.0 without this fix. This means 9.2.0 has a problematic TS API and we can't fix it until 10.0. |
Do we have a caution about the problem on the doc? We should probably discourage users from using the API. |
@zwoop What do you think about releasing 9.2.1 soon with this technically an incompatible change? |
3b07b73
to
263a148
Compare
@@ -105,7 +105,7 @@ | |||
' client_cert: {0}/../signed-bar.pem'.format(ts.Variables.SSLDir), | |||
' client_key: {0}/../signed-bar.key'.format(ts.Variables.SSLDir), | |||
'- fqdn: bob.*.com', | |||
' client_cert: {0}/../combo-signed-foo.pem'.format(ts.Variables.SSLDir), | |||
' client_cert: {0}/combo-signed-foo.pem'.format(ts.Variables.SSLDir), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file is in the same directory as other files. Why just this one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know. I made that change and the test passed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that combo-signed-foo.pem is the only cert or key listed in sni.yaml that is actually needed:
wkaras ~/REPOS/YTS/tests/_sandbox/tls_client_cert2_plugin/ts
O$ ls
bin cache config log plugin runtime ssl storage
wkaras ~/REPOS/YTS/tests/_sandbox/tls_client_cert2_plugin/ts
O$ ls ssl
combo-signed-foo.pem server.pem signed2-foo.pem signed-bar.pem signed-foo.pem
server.key signed2-bar.pem signed-bar.key signed-foo.key
wkaras ~/REPOS/YTS/tests/_sandbox/tls_client_cert2_plugin/ts
O$ cat config/sni.yaml
sni:
- fqdn: bob.bar.com
client_cert: /home/wkaras/REPOS/YTS/tests/_sandbox/tls_client_cert2_plugin/ts/ssl/../signed-bar.pem
client_key: /home/wkaras/REPOS/YTS/tests/_sandbox/tls_client_cert2_plugin/ts/ssl/../signed-bar.key
- fqdn: bob.*.com
client_cert: /home/wkaras/REPOS/YTS/tests/_sandbox/tls_client_cert2_plugin/ts/ssl/combo-signed-foo.pem
- fqdn: "*bar.com"
client_cert: /home/wkaras/REPOS/YTS/tests/_sandbox/tls_client_cert2_plugin/ts/ssl/../signed2-bar.pem
client_key: /home/wkaras/REPOS/YTS/tests/_sandbox/tls_client_cert2_plugin/ts/ssl/../signed-bar.key
- fqdn: "foo.com"
client_cert: /home/wkaras/REPOS/YTS/tests/_sandbox/tls_client_cert2_plugin/ts/ssl/../signed2-foo.pem
client_key: /home/wkaras/REPOS/YTS/tests/_sandbox/tls_client_cert2_plugin/ts/ssl/../signed-foo.key
wkaras ~/REPOS/YTS/tests/_sandbox/tls_client_cert2_plugin/ts
O$
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, when I remove my change, the test still works. I don't remember why I made it.
This seems to have eliminated some ERROR diags we were seeing in Yahoo Prod when doing config reloads. The SSLSecret public functions no longer return pointers into the unorded_map of secrets, they return a copy of the secret data. This seemed thread unsafe. A periodic poll running in the background can update the secret data for an entry for a secret name in the map. To avoid exporting pointers, I had to change the prototype of TSSslSecretGet(). Hopefully there are no existing plugins that are already using this TS API function, so breaking this rule will be moot. I added a new TS API type, TSAllocatedVarLenData, in order to be able to cleanly return a copy of the secret data.
1. getOrLoadSecret grabbed the secret_map_mutex and called loadSecret. 2. loadSecret dispatched to Continations that registered for the TS_EVENT_SSL_SECRET event. This would try to grab the Continuation's lock. 3. In the meantime, those Continuations could call setSecret which would try to grab the secret_map_mutex. If this Continuation did this while holding the lock that step 2 is waiting upon, then there will be a deadlock between the Continuation lock and the secret_map_mutex between the two threads. This patch avoids the deadlock by releasing the secret_map_mutex lock before calling loadSecret. It also updates the secret_map when loading secrets from a file in loadSecret.
263a148
to
6223bcb
Compare
* Fix an error on SSL config reload (plus some cleanup). This seems to have eliminated some ERROR diags we were seeing in Yahoo Prod when doing config reloads. The SSLSecret public functions no longer return pointers into the unorded_map of secrets, they return a copy of the secret data. This seemed thread unsafe. A periodic poll running in the background can update the secret data for an entry for a secret name in the map. To avoid exporting pointers, I had to change the prototype of TSSslSecretGet(). Hopefully there are no existing plugins that are already using this TS API function, so breaking this rule will be moot. I added a new TS API type, TSAllocatedVarLenData, in order to be able to cleanly return a copy of the secret data. * YTSATS-4067: Fix deadlock with secret_map_mutex (apache#740) 1. getOrLoadSecret grabbed the secret_map_mutex and called loadSecret. 2. loadSecret dispatched to Continations that registered for the TS_EVENT_SSL_SECRET event. This would try to grab the Continuation's lock. 3. In the meantime, those Continuations could call setSecret which would try to grab the secret_map_mutex. If this Continuation did this while holding the lock that step 2 is waiting upon, then there will be a deadlock between the Continuation lock and the secret_map_mutex between the two threads. This patch avoids the deadlock by releasing the secret_map_mutex lock before calling loadSecret. It also updates the secret_map when loading secrets from a file in loadSecret. --------- Co-authored-by: Brian Neradt <brian.neradt@verizonmedia.com>
Is this something we want for 9.2.1 ? |
I would recommend backporting, I'll do if you want, in case there are any conflicts. |
It will change the prototype of one TS API function. But the current prototype for that function (which returns a pointer to data that's modified by multiple threads) is inherently crashy to use (hopefully no one's using it). |
We talked about it on weekly bug scrub. The API was newly added on 9.2.0. It's now (before many people start using it) or 10.0. Fixing it on 9.2.2 doesn't make sense. |
* Fix an error on SSL config reload (plus some cleanup). This seems to have eliminated some ERROR diags we were seeing in Yahoo Prod when doing config reloads. The SSLSecret public functions no longer return pointers into the unorded_map of secrets, they return a copy of the secret data. This seemed thread unsafe. A periodic poll running in the background can update the secret data for an entry for a secret name in the map. To avoid exporting pointers, I had to change the prototype of TSSslSecretGet(). Hopefully there are no existing plugins that are already using this TS API function, so breaking this rule will be moot. I added a new TS API type, TSAllocatedVarLenData, in order to be able to cleanly return a copy of the secret data. * YTSATS-4067: Fix deadlock with secret_map_mutex (apache#740) 1. getOrLoadSecret grabbed the secret_map_mutex and called loadSecret. 2. loadSecret dispatched to Continations that registered for the TS_EVENT_SSL_SECRET event. This would try to grab the Continuation's lock. 3. In the meantime, those Continuations could call setSecret which would try to grab the secret_map_mutex. If this Continuation did this while holding the lock that step 2 is waiting upon, then there will be a deadlock between the Continuation lock and the secret_map_mutex between the two threads. This patch avoids the deadlock by releasing the secret_map_mutex lock before calling loadSecret. It also updates the secret_map when loading secrets from a file in loadSecret. --------- Co-authored-by: Brian Neradt <brian.neradt@verizonmedia.com> (cherry picked from commit 0c2488c) Conflicts: src/traffic_server/InkAPI.cc
* Fix an error on SSL config reload (plus some cleanup). This seems to have eliminated some ERROR diags we were seeing in Yahoo Prod when doing config reloads. The SSLSecret public functions no longer return pointers into the unorded_map of secrets, they return a copy of the secret data. This seemed thread unsafe. A periodic poll running in the background can update the secret data for an entry for a secret name in the map. To avoid exporting pointers, I had to change the prototype of TSSslSecretGet(). Hopefully there are no existing plugins that are already using this TS API function, so breaking this rule will be moot. * YTSATS-4067: Fix deadlock with secret_map_mutex (apache#740) 1. getOrLoadSecret grabbed the secret_map_mutex and called loadSecret. 2. loadSecret dispatched to Continations that registered for the TS_EVENT_SSL_SECRET event. This would try to grab the Continuation's lock. 3. In the meantime, those Continuations could call setSecret which would try to grab the secret_map_mutex. If this Continuation did this while holding the lock that step 2 is waiting upon, then there will be a deadlock between the Continuation lock and the secret_map_mutex between the two threads. This patch avoids the deadlock by releasing the secret_map_mutex lock before calling loadSecret. It also updates the secret_map when loading secrets from a file in loadSecret. --------- Co-authored-by: Brian Neradt <brian.neradt@verizonmedia.com> (cherry picked from commit 0c2488c) Conflicts: src/traffic_server/InkAPI.cc
…#9647) * Fix an error on SSL config reload (plus some cleanup). (#9334) * Fix an error on SSL config reload (plus some cleanup). This seems to have eliminated some ERROR diags we were seeing in Yahoo Prod when doing config reloads. The SSLSecret public functions no longer return pointers into the unorded_map of secrets, they return a copy of the secret data. This seemed thread unsafe. A periodic poll running in the background can update the secret data for an entry for a secret name in the map. To avoid exporting pointers, I had to change the prototype of TSSslSecretGet(). Hopefully there are no existing plugins that are already using this TS API function, so breaking this rule will be moot. * YTSATS-4067: Fix deadlock with secret_map_mutex (#740) 1. getOrLoadSecret grabbed the secret_map_mutex and called loadSecret. 2. loadSecret dispatched to Continations that registered for the TS_EVENT_SSL_SECRET event. This would try to grab the Continuation's lock. 3. In the meantime, those Continuations could call setSecret which would try to grab the secret_map_mutex. If this Continuation did this while holding the lock that step 2 is waiting upon, then there will be a deadlock between the Continuation lock and the secret_map_mutex between the two threads. This patch avoids the deadlock by releasing the secret_map_mutex lock before calling loadSecret. It also updates the secret_map when loading secrets from a file in loadSecret. --------- Co-authored-by: Brian Neradt <brian.neradt@verizonmedia.com> (cherry picked from commit 0c2488c) Conflicts: src/traffic_server/InkAPI.cc * Remove TSSslSecretXxx TS API functions. * Restore version of Au test tls_keepalive not needing ssl_secret_load_test.so.
…#9334) (apache#9647) * Fix an error on SSL config reload (plus some cleanup). (apache#9334) * Fix an error on SSL config reload (plus some cleanup). This seems to have eliminated some ERROR diags we were seeing in Yahoo Prod when doing config reloads. The SSLSecret public functions no longer return pointers into the unorded_map of secrets, they return a copy of the secret data. This seemed thread unsafe. A periodic poll running in the background can update the secret data for an entry for a secret name in the map. To avoid exporting pointers, I had to change the prototype of TSSslSecretGet(). Hopefully there are no existing plugins that are already using this TS API function, so breaking this rule will be moot. * YTSATS-4067: Fix deadlock with secret_map_mutex (apache#740) 1. getOrLoadSecret grabbed the secret_map_mutex and called loadSecret. 2. loadSecret dispatched to Continations that registered for the TS_EVENT_SSL_SECRET event. This would try to grab the Continuation's lock. 3. In the meantime, those Continuations could call setSecret which would try to grab the secret_map_mutex. If this Continuation did this while holding the lock that step 2 is waiting upon, then there will be a deadlock between the Continuation lock and the secret_map_mutex between the two threads. This patch avoids the deadlock by releasing the secret_map_mutex lock before calling loadSecret. It also updates the secret_map when loading secrets from a file in loadSecret. --------- Co-authored-by: Brian Neradt <brian.neradt@verizonmedia.com> (cherry picked from commit 0c2488c) Conflicts: src/traffic_server/InkAPI.cc * Remove TSSslSecretXxx TS API functions. * Restore version of Au test tls_keepalive not needing ssl_secret_load_test.so. (cherry picked from commit 79d0577)
* asf/master: (623 commits) records.config to records.yaml (apache#9264) Updates the release roadmap, adjusting for delays (apache#9360) Upgrades master branch to use clang-format v15.0.7 (apache#9355) Disable merging on GitHub (apache#9354) Clang-format 15.0.7 is finicky, and does not like these old school array inits (apache#9356) Enable merging for 10-Dev merge (apache#9353) Fix an error on SSL config reload (plus some cleanup). (apache#9334) Cleanup of legacy, makes newer clang-format crash right now (apache#9350) Update the roadmap / branch management doc page (apache#9340) Proxy Protocol out fixes (apache#9341) Memory leaks with storing configuration filenames (apache#9324) s3_auth autest: convert from gold file to file contains (apache#9337) Make 204 cacheable again (apache#9333) Add param to forward headers from the auth server to the origin (apache#9271) s3_auth: Fix assertion failure of TSActionCancel (apache#9329) Added http connect Autest with proxy verifier (apache#9315) s3_auth: Schedule reloading config event on TASK thread (apache#9328) Register ET_UDP thread type even if no UDP threads are requested (apache#9314) Don't send response body on status 204 No Content (apache#9330) Limit the serching range of static table by the first letter of header name (apache#9298) ...