Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

9.2.x: Fix an error on SSL config reload (plus some cleanup). (#9334) #9647

Merged
merged 3 commits into from
May 30, 2023

Conversation

ywkaras
Copy link
Contributor

@ywkaras ywkaras commented Apr 26, 2023

  • Fix an error on SSL config reload (plus some cleanup).

This seems to have eliminated some ERROR diags we were seeing in Yahoo Prod when doing config reloads.

The SSLSecret public functions no longer return pointers into the unorded_map of secrets, they return a copy of the secret data. This seemed thread unsafe. A periodic poll running in the background can update the secret data for an entry for a secret name in the map.

To avoid exporting pointers, I had to change the prototype of TSSslSecretGet(). Hopefully there are no existing plugins that are already using this TS API function, so breaking this rule will be moot.

  1. getOrLoadSecret grabbed the secret_map_mutex and called loadSecret.
  2. loadSecret dispatched to Continations that registered for the TS_EVENT_SSL_SECRET event. This would try to grab the Continuation's lock.
  3. In the meantime, those Continuations could call setSecret which would try to grab the secret_map_mutex. If this Continuation did this while holding the lock that step 2 is waiting upon, then there will be a deadlock between the Continuation lock and the secret_map_mutex between the two threads.

This patch avoids the deadlock by releasing the secret_map_mutex lock before calling loadSecret. It also updates the secret_map when loading secrets from a file in loadSecret.


Co-authored-by: Brian Neradt brian.neradt@verizonmedia.com
(cherry picked from commit 0c2488c)

Conflicts:
src/traffic_server/InkAPI.cc

* Fix an error on SSL config reload (plus some cleanup).

This seems to have eliminated some ERROR diags we were seeing in Yahoo Prod when
doing config reloads.

The SSLSecret public functions no longer return pointers into the unorded_map of
secrets, they return a copy of the secret data. This seemed thread unsafe. A
periodic poll running in the background can update the secret data for an entry
for a secret name in the map.

To avoid exporting pointers, I had to change the prototype of TSSslSecretGet().
Hopefully there are no existing plugins that are already using this TS API function,
so breaking this rule will be moot.

* YTSATS-4067: Fix deadlock with secret_map_mutex (apache#740)

1. getOrLoadSecret grabbed the secret_map_mutex and called loadSecret.
2. loadSecret dispatched to Continations that registered for the
   TS_EVENT_SSL_SECRET event. This would try to grab the Continuation's
   lock.
3. In the meantime, those Continuations could call setSecret which would
   try to grab the secret_map_mutex. If this Continuation did this while
   holding the lock that step 2 is waiting upon, then there will be a
   deadlock between the Continuation lock and the secret_map_mutex
   between the two threads.

This patch avoids the deadlock by releasing the secret_map_mutex lock
before calling loadSecret. It also updates the secret_map when loading
secrets from a file in loadSecret.

---------

Co-authored-by: Brian Neradt <brian.neradt@verizonmedia.com>
(cherry picked from commit 0c2488c)

Conflicts:
src/traffic_server/InkAPI.cc
@ywkaras ywkaras self-assigned this Apr 26, 2023
@ywkaras ywkaras added this to the 9.2.1 milestone Apr 26, 2023
@ywkaras ywkaras marked this pull request as ready for review April 27, 2023 00:05
@ywkaras ywkaras removed request for maskit and bryancall April 27, 2023 00:05
@ywkaras ywkaras force-pushed the os_pkey_cnf_reload_9.2.x branch 2 times, most recently from 351b3d6 to 1e4db4d Compare April 28, 2023 01:35
@ywkaras ywkaras force-pushed the os_pkey_cnf_reload_9.2.x branch from 1e4db4d to e52afe4 Compare April 28, 2023 01:43
@apache apache deleted a comment from zwoop Apr 28, 2023
@maskit
Copy link
Member

maskit commented Apr 28, 2023

PR for master. #9334
Thread on dev@ https://lists.apache.org/thread/t4wysyq55c6nt3oxg3skybb8t4klj4n2

Trying to summarize the changes on this PR. Does this list look correct?

  • Removes TSSslSecretSet, TSSslSecretGet, and TSSslSecretUpdate
  • Removes TS_LIFECYCLE_SSL_SECRET_HOOK
  • Keep the value of TS_LIFECYCLE_LAST_HOOK for compatibility (?)
  • Fixes thread unsafety due to returning pointers for secret data in unorderd_map

Removing ssl_secret_load_test.cc makes sense because it uses the removed APIs, but why some other tests are removed as well?

@bneradt bneradt changed the title Fix an error on SSL config reload (plus some cleanup). (#9334) 9.2.x: Fix an error on SSL config reload (plus some cleanup). (#9334) May 1, 2023
@ywkaras
Copy link
Contributor Author

ywkaras commented May 2, 2023

PR for master. #9334 Thread on dev@ https://lists.apache.org/thread/t4wysyq55c6nt3oxg3skybb8t4klj4n2

Trying to summarize the changes on this PR. Does this list look correct?

  • Removes TSSslSecretSet, TSSslSecretGet, and TSSslSecretUpdate
  • Removes TS_LIFECYCLE_SSL_SECRET_HOOK
  • Keep the value of TS_LIFECYCLE_LAST_HOOK for compatibility (?)
  • Fixes thread unsafety due to returning pointers for secret data in unorderd_map

Removing ssl_secret_load_test.cc makes sense because it uses the removed APIs, but why some other tests are removed as well?

Yes, that's a good summary. I preserved the value of TS_LIFECYCLE_LAST_HOOK just as a precaution. All of the deleted Au tests used ssl_secret_load_test.

@maskit
Copy link
Member

maskit commented May 2, 2023

All of the deleted Au tests used ssl_secret_load_test.

tls_keepalive.test.py has a version that doesn't use ssl_secret_load_test.
https://github.com/apache/trafficserver/pull/6609/files#diff-f30d972fa55aef83b333bf6dbbb2ee074eece4b37a25c03aa721cff90bb3f580

@ywkaras
Copy link
Contributor Author

ywkaras commented May 2, 2023

All of the deleted Au tests used ssl_secret_load_test.

tls_keepalive.test.py has a version that doesn't use ssl_secret_load_test. https://github.com/apache/trafficserver/pull/6609/files#diff-f30d972fa55aef83b333bf6dbbb2ee074eece4b37a25c03aa721cff90bb3f580

There are a lot of compatibility issues with that version of the test:

wkaras ~/REPOS/TS/tests/_sandbox/tls_keepalive/ts/log
O$ ls
traffic.out
wkaras ~/REPOS/TS/tests/_sandbox/tls_keepalive/ts/log
O$ cat traffic.out
[May  2 22:18:04.605] traffic_server FATAL: **** Found a legacy config file (/home/wkaras/REPOS/TS/tests/_sandbox/tls_keepalive/ts/config/records.config). Please remove it and migrate to the new YAML format before continuing. ****
Fatal: **** Found a legacy config file (/home/wkaras/REPOS/TS/tests/_sandbox/tls_keepalive/ts/config/records.config). Please remove it and migrate to the new YAML format before continuing. ****
wkaras ~/REPOS/TS/tests/_sandbox/tls_keepalive/ts/log
O$

@maskit
Copy link
Member

maskit commented May 3, 2023

That doesn't look like an incompatibility from the test. The diff is just 3 lines.

$ git diff 7dbb6cb1881aa344edb1ab59dcbe3342171ea0a7^..master tests/gold_tests/tls/tls_keepalive.test.py
diff --git a/tests/gold_tests/tls/tls_keepalive.test.py b/tests/gold_tests/tls/tls_keepalive.test.py
index ac5142478..6694aacba 100644
--- a/tests/gold_tests/tls/tls_keepalive.test.py
+++ b/tests/gold_tests/tls/tls_keepalive.test.py
@@ -27,7 +27,7 @@ Test.SkipUnless(
     Condition.HasCurlFeature('http2')
 )
 
-ts = Test.MakeATSProcess("ts", select_ports=True, enable_tls=True)
+ts = Test.MakeATSProcess("ts", enable_tls=True)
 server = Test.MakeOriginServer("server")
 request_header = {"headers": "GET / HTTP/1.1\r\nHost: www.example.com\r\n\r\n", "timestamp": "1469733493.993", "body": ""}
 # desired response form the origin server
@@ -40,7 +40,7 @@ ts.addSSLfile("ssl/server.key")
 ts.Disk.records_config.update({
     'proxy.config.ssl.server.cert.path': '{0}'.format(ts.Variables.SSLDir),
     'proxy.config.ssl.server.private_key.path': '{0}'.format(ts.Variables.SSLDir),
-    'proxy.config.ssl.TLSv1_3': 0,
+    'proxy.config.ssl.TLSv1_3.enabled': 0,
     'proxy.config.exec_thread.autoconfig.scale': 1.0,
     'proxy.config.log.max_secs_per_buffer': 1
 })
@@ -66,7 +66,7 @@ logging:
 '''.split("\n")
 )
 
-Test.PrepareTestPlugin(os.path.join(Test.Variables.AtsTestPluginsDir, 'ssl_hook_test.so'), ts, '-preaccept=1')
+Test.PrepareTestPlugin(os.path.join(Test.Variables.AtsTestPluginsDir, 'ssl_secret_load_test.so'), ts)
 
 tr = Test.AddTestRun("Test two HTTP/1.1 requests over one TLS connection")
 tr.Processes.Default.StartBefore(server)

@ywkaras
Copy link
Contributor Author

ywkaras commented May 3, 2023

Sorry, my mistake.

Copy link
Member

@maskit maskit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good now. @zwoop

@ywkaras
Copy link
Contributor Author

ywkaras commented May 13, 2023

@zwoop I was assuming you'd merge this at the appropriate time. Was I supposed to merge it?

@zwoop zwoop merged commit 79d0577 into apache:9.2.x May 30, 2023
JosiahWI pushed a commit to JosiahWI/trafficserver that referenced this pull request Jul 19, 2023
The conflicts all have to do with
apache#9647 which we are
intentionally not merging in.

Conflicts:
      doc/developer-guide/api/functions/TSSslSecret.en.rst
      include/ts/apidefs.h.in
      include/ts/ts.h
      iocore/net/SSLSecret.cc
      iocore/net/SSLUtils.cc
      proxy/InkAPIInternal.h
      src/traffic_server/InkAPI.cc
      tests/gold_tests/tls/tls_check_cert_select_plugin.test.py
      tests/gold_tests/tls/tls_client_cert2_plugin.test.py
      tests/gold_tests/tls/tls_client_cert_override_plugin.test.py
      tests/gold_tests/tls/tls_client_cert_plugin.test.py
JosiahWI pushed a commit to JosiahWI/trafficserver that referenced this pull request Jul 19, 2023
Merge ASF 9.2.x into Edge 9.1.x

The conflicts all have to do with
apache#9647 which we are
intentionally not merging in.

    Conflicts:
          doc/developer-guide/api/functions/TSSslSecret.en.rst
          include/ts/apidefs.h.in
          include/ts/ts.h
          iocore/net/SSLSecret.cc
          iocore/net/SSLUtils.cc
          proxy/InkAPIInternal.h
          src/traffic_server/InkAPI.cc
          tests/gold_tests/tls/tls_check_cert_select_plugin.test.py
          tests/gold_tests/tls/tls_client_cert2_plugin.test.py
          tests/gold_tests/tls/tls_client_cert_override_plugin.test.py
          tests/gold_tests/tls/tls_client_cert_plugin.test.py
masaori335 pushed a commit to masaori335/trafficserver that referenced this pull request Jul 24, 2023
…#9334) (apache#9647)

* Fix an error on SSL config reload (plus some cleanup). (apache#9334)

* Fix an error on SSL config reload (plus some cleanup).

This seems to have eliminated some ERROR diags we were seeing in Yahoo Prod when
doing config reloads.

The SSLSecret public functions no longer return pointers into the unorded_map of
secrets, they return a copy of the secret data. This seemed thread unsafe. A
periodic poll running in the background can update the secret data for an entry
for a secret name in the map.

To avoid exporting pointers, I had to change the prototype of TSSslSecretGet().
Hopefully there are no existing plugins that are already using this TS API function,
so breaking this rule will be moot.

* YTSATS-4067: Fix deadlock with secret_map_mutex (apache#740)

1. getOrLoadSecret grabbed the secret_map_mutex and called loadSecret.
2. loadSecret dispatched to Continations that registered for the
   TS_EVENT_SSL_SECRET event. This would try to grab the Continuation's
   lock.
3. In the meantime, those Continuations could call setSecret which would
   try to grab the secret_map_mutex. If this Continuation did this while
   holding the lock that step 2 is waiting upon, then there will be a
   deadlock between the Continuation lock and the secret_map_mutex
   between the two threads.

This patch avoids the deadlock by releasing the secret_map_mutex lock
before calling loadSecret. It also updates the secret_map when loading
secrets from a file in loadSecret.

---------

Co-authored-by: Brian Neradt <brian.neradt@verizonmedia.com>
(cherry picked from commit 0c2488c)

Conflicts:
src/traffic_server/InkAPI.cc

* Remove TSSslSecretXxx TS API functions.

* Restore version of Au test tls_keepalive not needing ssl_secret_load_test.so.

(cherry picked from commit 79d0577)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants