Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix an error on SSL config reload (plus some cleanup) #9334

Merged
merged 2 commits into from
Feb 1, 2023

Conversation

ywkaras
Copy link
Contributor

@ywkaras ywkaras commented Jan 26, 2023

Fix an error on SSL config reload (plus some cleanup).

This seems to have eliminated some ERROR diags we were seeing in Yahoo Prod when
doing config reloads.

The SSLSecret public functions no longer return pointers into the unorded_map of
secrets, they return a copy of the secret data. This seemed thread unsafe. A
periodic poll running in the background can update the secret data for an entry
for a secret name in the map.

To avoid exporting pointers, I had to change the prototype of TSSslSecretGet().
Hopefully there are no existing plugins that are already using this TS API function,
so breaking this rule will be moot.

@ywkaras
Copy link
Contributor Author

ywkaras commented Jan 26, 2023

This PR replaces #8790 , which was laid waste by zwoop's merciless purge of personal branches in the main repo.

The TSSslSecretGet() function is now compatible with the rest of the TS API (and bell bottoms and stagflation).

@@ -1053,8 +1053,8 @@ SSLPrivateKeyHandler(SSL_CTX *ctx, const SSLConfigParams *params, const char *ke
scoped_BIO bio(BIO_new_mem_buf(secret_data, secret_data_len));
pkey = PEM_read_bio_PrivateKey(bio.get(), nullptr, nullptr, nullptr);
if (nullptr == pkey) {
Debug("ssl_load", "failed to load server private key from %s",
(!keyPath || keyPath[0] == '\0') ? "[empty key path]" : keyPath);
SSLError("failed to load server private key (%.*s) from %s", secret_data_len < 50 ? secret_data_len : 50, secret_data,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure when to use SSLError instead of Debug, but do the battle on another PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Errors go into diags.log, so it's things that the operations people should see.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not joining the battle. I'm saying focus on fixing the API issue on this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yahoo is not currently using TSSslSecretGet(). For us, the main purpose of this change was to fix ERRORs in diags.log that we were seeing on config reload. The change to the API was just part of the effort to prevent future problems as well as the one we were seeing. Many problems happen first (or only) in prod, and will go unnoticed if they are only seen in debug trace rather than in diags.log.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see a reason to do the two things at the same time on one PR.

This line was changed to Debug about two months ago for some reasons. Changing it back to SSLError would need a discussion. I don't want to keep the API problematic until the discussion ends.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the size matters. There are at least two distinguishable fixes. And you are delaying one of the fixes by starting the discussion here where we might be able to avoid making an incompatible change on 10.0.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But that's a different line of code.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But that additional SSLError log replaces all of the changed SSLError to Debug calls. SSLPrivateKeyHandler is only called once and it will reach that additional SSLError when it returns false. And now with the SSLError called after SSLPrivateKeyHandler return false, it can print the cert name that wasn't available inside the function

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Serris.

@maskit maskit added this to the 10.0.0 milestone Jan 26, 2023
@maskit
Copy link
Member

maskit commented Jan 26, 2023

Oh no, we released 9.2.0 without this fix. This means 9.2.0 has a problematic TS API and we can't fix it until 10.0.

@maskit
Copy link
Member

maskit commented Jan 26, 2023

Do we have a caution about the problem on the doc? We should probably discourage users from using the API.

@maskit
Copy link
Member

maskit commented Jan 26, 2023

@zwoop What do you think about releasing 9.2.1 soon with this technically an incompatible change?

@ywkaras ywkaras force-pushed the os_pkey_cnf_reload branch from 3b07b73 to 263a148 Compare January 31, 2023 02:57
@@ -105,7 +105,7 @@
' client_cert: {0}/../signed-bar.pem'.format(ts.Variables.SSLDir),
' client_key: {0}/../signed-bar.key'.format(ts.Variables.SSLDir),
'- fqdn: bob.*.com',
' client_cert: {0}/../combo-signed-foo.pem'.format(ts.Variables.SSLDir),
' client_cert: {0}/combo-signed-foo.pem'.format(ts.Variables.SSLDir),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file is in the same directory as other files. Why just this one?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know. I made that change and the test passed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that combo-signed-foo.pem is the only cert or key listed in sni.yaml that is actually needed:

wkaras ~/REPOS/YTS/tests/_sandbox/tls_client_cert2_plugin/ts
O$ ls
bin  cache  config  log  plugin  runtime  ssl  storage
wkaras ~/REPOS/YTS/tests/_sandbox/tls_client_cert2_plugin/ts
O$ ls ssl
combo-signed-foo.pem  server.pem       signed2-foo.pem  signed-bar.pem  signed-foo.pem
server.key            signed2-bar.pem  signed-bar.key   signed-foo.key
wkaras ~/REPOS/YTS/tests/_sandbox/tls_client_cert2_plugin/ts
O$ cat config/sni.yaml 
sni:
- fqdn: bob.bar.com
  client_cert: /home/wkaras/REPOS/YTS/tests/_sandbox/tls_client_cert2_plugin/ts/ssl/../signed-bar.pem
  client_key: /home/wkaras/REPOS/YTS/tests/_sandbox/tls_client_cert2_plugin/ts/ssl/../signed-bar.key
- fqdn: bob.*.com
  client_cert: /home/wkaras/REPOS/YTS/tests/_sandbox/tls_client_cert2_plugin/ts/ssl/combo-signed-foo.pem
- fqdn: "*bar.com"
  client_cert: /home/wkaras/REPOS/YTS/tests/_sandbox/tls_client_cert2_plugin/ts/ssl/../signed2-bar.pem
  client_key: /home/wkaras/REPOS/YTS/tests/_sandbox/tls_client_cert2_plugin/ts/ssl/../signed-bar.key
- fqdn: "foo.com"
  client_cert: /home/wkaras/REPOS/YTS/tests/_sandbox/tls_client_cert2_plugin/ts/ssl/../signed2-foo.pem
  client_key: /home/wkaras/REPOS/YTS/tests/_sandbox/tls_client_cert2_plugin/ts/ssl/../signed-foo.key
wkaras ~/REPOS/YTS/tests/_sandbox/tls_client_cert2_plugin/ts
O$

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, when I remove my change, the test still works. I don't remember why I made it.

ywkaras and others added 2 commits January 31, 2023 22:51
This seems to have eliminated some ERROR diags we were seeing in Yahoo Prod when
doing config reloads.

The SSLSecret public functions no longer return pointers into the unorded_map of
secrets, they return a copy of the secret data. This seemed thread unsafe. A
periodic poll running in the background can update the secret data for an entry
for a secret name in the map.

To avoid exporting pointers, I had to change the prototype of TSSslSecretGet().
Hopefully there are no existing plugins that are already using this TS API function,
so breaking this rule will be moot. I added a new TS API type, TSAllocatedVarLenData,
in order to be able to cleanly return a copy of the secret data.
1. getOrLoadSecret grabbed the secret_map_mutex and called loadSecret.
2. loadSecret dispatched to Continations that registered for the
   TS_EVENT_SSL_SECRET event. This would try to grab the Continuation's
   lock.
3. In the meantime, those Continuations could call setSecret which would
   try to grab the secret_map_mutex. If this Continuation did this while
   holding the lock that step 2 is waiting upon, then there will be a
   deadlock between the Continuation lock and the secret_map_mutex
   between the two threads.

This patch avoids the deadlock by releasing the secret_map_mutex lock
before calling loadSecret. It also updates the secret_map when loading
secrets from a file in loadSecret.
@ywkaras ywkaras force-pushed the os_pkey_cnf_reload branch from 263a148 to 6223bcb Compare January 31, 2023 22:53
@ywkaras ywkaras merged commit 0c2488c into apache:master Feb 1, 2023
moleksy pushed a commit to moleksy/trafficserver that referenced this pull request Feb 8, 2023
* Fix an error on SSL config reload (plus some cleanup).

This seems to have eliminated some ERROR diags we were seeing in Yahoo Prod when
doing config reloads.

The SSLSecret public functions no longer return pointers into the unorded_map of
secrets, they return a copy of the secret data. This seemed thread unsafe. A
periodic poll running in the background can update the secret data for an entry
for a secret name in the map.

To avoid exporting pointers, I had to change the prototype of TSSslSecretGet().
Hopefully there are no existing plugins that are already using this TS API function,
so breaking this rule will be moot. I added a new TS API type, TSAllocatedVarLenData,
in order to be able to cleanly return a copy of the secret data.

* YTSATS-4067: Fix deadlock with secret_map_mutex (apache#740)

1. getOrLoadSecret grabbed the secret_map_mutex and called loadSecret.
2. loadSecret dispatched to Continations that registered for the
   TS_EVENT_SSL_SECRET event. This would try to grab the Continuation's
   lock.
3. In the meantime, those Continuations could call setSecret which would
   try to grab the secret_map_mutex. If this Continuation did this while
   holding the lock that step 2 is waiting upon, then there will be a
   deadlock between the Continuation lock and the secret_map_mutex
   between the two threads.

This patch avoids the deadlock by releasing the secret_map_mutex lock
before calling loadSecret. It also updates the secret_map when loading
secrets from a file in loadSecret.

---------

Co-authored-by: Brian Neradt <brian.neradt@verizonmedia.com>
@zwoop
Copy link
Contributor

zwoop commented Feb 22, 2023

Is this something we want for 9.2.1 ?

@ywkaras
Copy link
Contributor Author

ywkaras commented Feb 22, 2023

I would recommend backporting, I'll do if you want, in case there are any conflicts.

@ywkaras
Copy link
Contributor Author

ywkaras commented Feb 22, 2023

It will change the prototype of one TS API function. But the current prototype for that function (which returns a pointer to data that's modified by multiple threads) is inherently crashy to use (hopefully no one's using it).

@maskit
Copy link
Member

maskit commented Feb 22, 2023

We talked about it on weekly bug scrub. The API was newly added on 9.2.0. It's now (before many people start using it) or 10.0. Fixing it on 9.2.2 doesn't make sense.

ywkaras added a commit to ywkaras/trafficserver that referenced this pull request Apr 26, 2023
* Fix an error on SSL config reload (plus some cleanup).

This seems to have eliminated some ERROR diags we were seeing in Yahoo Prod when
doing config reloads.

The SSLSecret public functions no longer return pointers into the unorded_map of
secrets, they return a copy of the secret data. This seemed thread unsafe. A
periodic poll running in the background can update the secret data for an entry
for a secret name in the map.

To avoid exporting pointers, I had to change the prototype of TSSslSecretGet().
Hopefully there are no existing plugins that are already using this TS API function,
so breaking this rule will be moot. I added a new TS API type, TSAllocatedVarLenData,
in order to be able to cleanly return a copy of the secret data.

* YTSATS-4067: Fix deadlock with secret_map_mutex (apache#740)

1. getOrLoadSecret grabbed the secret_map_mutex and called loadSecret.
2. loadSecret dispatched to Continations that registered for the
   TS_EVENT_SSL_SECRET event. This would try to grab the Continuation's
   lock.
3. In the meantime, those Continuations could call setSecret which would
   try to grab the secret_map_mutex. If this Continuation did this while
   holding the lock that step 2 is waiting upon, then there will be a
   deadlock between the Continuation lock and the secret_map_mutex
   between the two threads.

This patch avoids the deadlock by releasing the secret_map_mutex lock
before calling loadSecret. It also updates the secret_map when loading
secrets from a file in loadSecret.

---------

Co-authored-by: Brian Neradt <brian.neradt@verizonmedia.com>
(cherry picked from commit 0c2488c)

Conflicts:
src/traffic_server/InkAPI.cc
ywkaras added a commit to ywkaras/trafficserver that referenced this pull request Apr 26, 2023
* Fix an error on SSL config reload (plus some cleanup).

This seems to have eliminated some ERROR diags we were seeing in Yahoo Prod when
doing config reloads.

The SSLSecret public functions no longer return pointers into the unorded_map of
secrets, they return a copy of the secret data. This seemed thread unsafe. A
periodic poll running in the background can update the secret data for an entry
for a secret name in the map.

To avoid exporting pointers, I had to change the prototype of TSSslSecretGet().
Hopefully there are no existing plugins that are already using this TS API function,
so breaking this rule will be moot.

* YTSATS-4067: Fix deadlock with secret_map_mutex (apache#740)

1. getOrLoadSecret grabbed the secret_map_mutex and called loadSecret.
2. loadSecret dispatched to Continations that registered for the
   TS_EVENT_SSL_SECRET event. This would try to grab the Continuation's
   lock.
3. In the meantime, those Continuations could call setSecret which would
   try to grab the secret_map_mutex. If this Continuation did this while
   holding the lock that step 2 is waiting upon, then there will be a
   deadlock between the Continuation lock and the secret_map_mutex
   between the two threads.

This patch avoids the deadlock by releasing the secret_map_mutex lock
before calling loadSecret. It also updates the secret_map when loading
secrets from a file in loadSecret.

---------

Co-authored-by: Brian Neradt <brian.neradt@verizonmedia.com>
(cherry picked from commit 0c2488c)

Conflicts:
src/traffic_server/InkAPI.cc
zwoop pushed a commit that referenced this pull request May 30, 2023
…#9647)

* Fix an error on SSL config reload (plus some cleanup). (#9334)

* Fix an error on SSL config reload (plus some cleanup).

This seems to have eliminated some ERROR diags we were seeing in Yahoo Prod when
doing config reloads.

The SSLSecret public functions no longer return pointers into the unorded_map of
secrets, they return a copy of the secret data. This seemed thread unsafe. A
periodic poll running in the background can update the secret data for an entry
for a secret name in the map.

To avoid exporting pointers, I had to change the prototype of TSSslSecretGet().
Hopefully there are no existing plugins that are already using this TS API function,
so breaking this rule will be moot.

* YTSATS-4067: Fix deadlock with secret_map_mutex (#740)

1. getOrLoadSecret grabbed the secret_map_mutex and called loadSecret.
2. loadSecret dispatched to Continations that registered for the
   TS_EVENT_SSL_SECRET event. This would try to grab the Continuation's
   lock.
3. In the meantime, those Continuations could call setSecret which would
   try to grab the secret_map_mutex. If this Continuation did this while
   holding the lock that step 2 is waiting upon, then there will be a
   deadlock between the Continuation lock and the secret_map_mutex
   between the two threads.

This patch avoids the deadlock by releasing the secret_map_mutex lock
before calling loadSecret. It also updates the secret_map when loading
secrets from a file in loadSecret.

---------

Co-authored-by: Brian Neradt <brian.neradt@verizonmedia.com>
(cherry picked from commit 0c2488c)

Conflicts:
src/traffic_server/InkAPI.cc

* Remove TSSslSecretXxx TS API functions.

* Restore version of Au test tls_keepalive not needing ssl_secret_load_test.so.
masaori335 pushed a commit to masaori335/trafficserver that referenced this pull request Jul 24, 2023
…#9334) (apache#9647)

* Fix an error on SSL config reload (plus some cleanup). (apache#9334)

* Fix an error on SSL config reload (plus some cleanup).

This seems to have eliminated some ERROR diags we were seeing in Yahoo Prod when
doing config reloads.

The SSLSecret public functions no longer return pointers into the unorded_map of
secrets, they return a copy of the secret data. This seemed thread unsafe. A
periodic poll running in the background can update the secret data for an entry
for a secret name in the map.

To avoid exporting pointers, I had to change the prototype of TSSslSecretGet().
Hopefully there are no existing plugins that are already using this TS API function,
so breaking this rule will be moot.

* YTSATS-4067: Fix deadlock with secret_map_mutex (apache#740)

1. getOrLoadSecret grabbed the secret_map_mutex and called loadSecret.
2. loadSecret dispatched to Continations that registered for the
   TS_EVENT_SSL_SECRET event. This would try to grab the Continuation's
   lock.
3. In the meantime, those Continuations could call setSecret which would
   try to grab the secret_map_mutex. If this Continuation did this while
   holding the lock that step 2 is waiting upon, then there will be a
   deadlock between the Continuation lock and the secret_map_mutex
   between the two threads.

This patch avoids the deadlock by releasing the secret_map_mutex lock
before calling loadSecret. It also updates the secret_map when loading
secrets from a file in loadSecret.

---------

Co-authored-by: Brian Neradt <brian.neradt@verizonmedia.com>
(cherry picked from commit 0c2488c)

Conflicts:
src/traffic_server/InkAPI.cc

* Remove TSSslSecretXxx TS API functions.

* Restore version of Au test tls_keepalive not needing ssl_secret_load_test.so.

(cherry picked from commit 79d0577)
cmcfarlen pushed a commit to cmcfarlen/trafficserver that referenced this pull request Jun 3, 2024
* asf/master: (623 commits)
  records.config to records.yaml (apache#9264)
  Updates the release roadmap, adjusting for delays (apache#9360)
  Upgrades master branch to use clang-format v15.0.7 (apache#9355)
  Disable merging on GitHub (apache#9354)
  Clang-format 15.0.7 is finicky, and does not like these old school array inits (apache#9356)
  Enable merging for 10-Dev merge (apache#9353)
  Fix an error on SSL config reload (plus some cleanup). (apache#9334)
  Cleanup of legacy, makes newer clang-format crash right now (apache#9350)
  Update the roadmap / branch management doc page (apache#9340)
  Proxy Protocol out fixes (apache#9341)
  Memory leaks with storing configuration filenames (apache#9324)
  s3_auth autest: convert from gold file to file contains (apache#9337)
  Make 204 cacheable again (apache#9333)
  Add param to forward headers from the auth server to the origin (apache#9271)
  s3_auth: Fix assertion failure of TSActionCancel (apache#9329)
  Added http connect Autest with proxy verifier (apache#9315)
  s3_auth: Schedule reloading config event on TASK thread (apache#9328)
  Register ET_UDP thread type even if no UDP threads are requested (apache#9314)
  Don't send response body on status 204 No Content (apache#9330)
  Limit the serching range of static table by the first letter of header name (apache#9298)
  ...
@bryancall bryancall mentioned this pull request Aug 14, 2024
91 tasks
@bryancall bryancall changed the title Fix an error on SSL config reload (plus some cleanup). Fix an error on SSL config reload (plus some cleanup) Aug 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants