Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Support using SSL on worker endpoints. #14128

Merged
merged 15 commits into from Nov 15, 2022
Merged

Support using SSL on worker endpoints. #14128

merged 15 commits into from Nov 15, 2022

Conversation

ghost
Copy link

@ghost ghost commented Oct 11, 2022

Pull Request Checklist

Signed-off-by: Tuomas Ojamies tuomas.ojamies@gmail.com

  • Pull request is based on the develop branch
  • Pull request includes a changelog file. The entry should:
    • Be a short description of your change which makes sense to users. "Fixed a bug that prevented receiving messages from other servers." instead of "Moved X method from EventStore to EventWorkerStore.".
    • Use markdown where necessary, mostly for code blocks.
    • End with either a period (.) or an exclamation mark (!).
    • Start with a capital letter.
    • Feel free to credit yourself, by adding a sentence "Contributed by @github_username." or "Contributed by [Your Name]." to the end of the entry.
  • Pull request includes a sign off
  • Code style is correct
    (run the linters)

@ghost ghost self-requested a review as a code owner October 11, 2022 07:10
@clokep clokep changed the title Fix missing SSL support in worker endpoints. Support using SSL on worker endpoints. Oct 11, 2022
@reivilibre
Copy link
Contributor

Do you mind if I ask for your motivation? I'm struggling to see a situation in which TLS support on the replication listener will help; the traffic isn't authenticated so if it goes over an untrusted network then you're still open to having untrusted parties sending things to your workers through the replication port, even with TLS. (Maybe with mTLS you could avoid this, but that's starting to get quite messy I suspect.)

In cases where the workers are on different machines, our approach has been to route the traffic through a VPN of some kind — I think we use tinc on matrix.org but WireGuard would also be suitable for example. This both provides encryption and authentication since you can't have random machines connecting to your VPN.

@clokep
Copy link
Member

clokep commented Oct 12, 2022

the traffic isn't authenticated

You can use a shared secret, see worker_replication_secret.

(I agree with the rest of your statement, however.)

@reivilibre
Copy link
Contributor

the traffic isn't authenticated

You can use a shared secret, see worker_replication_secret.

Ah hm, I wasn't aware of that. Looks like the passage in the docs that talks about never exposing your replication to the outside world has also been updated since my last memorised version: https://matrix-org.github.io/synapse/develop/workers.html#shared-configuration

@ghost
Copy link
Author

ghost commented Oct 12, 2022

Do you mind if I ask for your motivation?

I plan to use Synapse in environment that has high security standards and requires that endpoints are encrypted. I need to have TLS enabled in the generic worker listener to pass audits. Same applies for the replication listener, unless I keep it and all workers on the same host. But my goals is to enable horizontal scalability so I will want to spread out workers on multiple hosts and will need the encryption. Adding new network layers to work around this, instead of just supporting TLS is not a good option for me.

More philosophically, these endpoints can already be configured with the TLS flag but now it is ignored. With these fairly small changes we get the option to secure these endpoints with encryption. Which is a good idea to do for better security.

# This has same logic as synapse/app/homeserver.py where we choose between
# listen_ssl and listen_tcp based on the tls configuration flag in listener
# config.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we abstract this out so that we can call something like:

self._listen_something(listener_config, root_resource)

This can then do the decision of TLS or not and possibly create the site (alternately, pass the site instead of root_resource in).

(Name to be determined! 😄 )

@clokep
Copy link
Member

clokep commented Oct 12, 2022

@tuojamie This seems to have a couple of failing tests?

@ghost
Copy link
Author

ghost commented Oct 12, 2022

@tuojamie This seems to have a couple of failing tests?

It is the new tls parameter in InstanceLocationConfig. It is probably better if it was optional for backwards compatibility.

I look into fixing this and abstracting the tls logic soon.

@dklimpel
Copy link
Contributor

I am thinking about:

  • Which TLS certificate is used (server/listener)?
  • Which rules apply for validation of the certificate (client)?
  • Are self sign certificates valid?

@ghost
Copy link
Author

ghost commented Oct 28, 2022

I am thinking about:

  • Which TLS certificate is used (server/listener)?
  • Which rules apply for validation of the certificate (client)?
  • Are self sign certificates valid?

In this PR we are enabling Synapse Workers to respect the existing configuration options. Depending how Main and Workers are deployed, and their certificates are configured we may have gaps along your question but I am not sure if PR like this should try to address them. Those feel more like generic TLS configuration topics that need deeper look to answer.

@ghost
Copy link
Author

ghost commented Oct 28, 2022

@clokep could you take a look on the failing tests? The SonarCloud failure does not seem to be related to my change. (Edit: SonarCloud failures went away already.)

There are also few failures on complement tests that seem odd. Do you know about these, and how they might link to these changes?

@clokep clokep self-requested a review October 28, 2022 11:58
@clokep
Copy link
Member

clokep commented Oct 28, 2022

@tuojamie Looks like this needs some types fixed / updated.

There are also few failures on complement tests that seem odd. Do you know about these, and how they might link to these changes?

Some of the complement tests are flaky, searching this repo for the test name should bring up any known ones!

synapse/app/_base.py Outdated Show resolved Hide resolved
synapse/app/_base.py Outdated Show resolved Hide resolved
synapse/app/_base.py Outdated Show resolved Hide resolved
@@ -140,37 +136,15 @@ def _listener_http(
else:
root_resource = OptionsResource()

site = SynapseSite(
"synapse.access.%s.%s" % ("https" if tls else "http", site_tag),
site_tag,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

site_tag seems unused except for here, so I think we can remove declaring it above.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is used here still:

handler_cls, config = load_module(
resmodule,
("listeners", site_tag, "additional_resources", "<%s>" % (path,)),
)

Tuomas Ojamies and others added 4 commits October 31, 2022 18:15
Co-authored-by: Patrick Cloke <clokep@users.noreply.github.com>
Co-authored-by: Patrick Cloke <clokep@users.noreply.github.com>
Co-authored-by: Patrick Cloke <clokep@users.noreply.github.com>
@clokep clokep self-requested a review October 31, 2022 18:06
@ghost
Copy link
Author

ghost commented Nov 11, 2022

@clokep Hi, before this gets stale - is there something I should be doing to get this over the line?

@richvdh richvdh requested a review from a team November 14, 2022 09:56
@reivilibre reivilibre requested review from reivilibre and removed request for a team November 15, 2022 12:35
Copy link
Contributor

@reivilibre reivilibre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good; sorry for delaying it!

@DMRobertson
Copy link
Contributor

mypy checks seem to be failing on develop after this was merged in, see https://github.com/matrix-org/synapse/actions/runs/3470613461/jobs/5799032715

root_resource: Resource,
version_string: str,
max_request_body_size: int,
context_factory: IOpenSSLContextFactory,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be Optional[IOpenSSLContextFactory], I think.

DMRobertson pushed a commit that referenced this pull request Nov 15, 2022
* Fix typechecking errors introduced in #14128

* Changelog

* Correct annotations

so that context_factory works if you don't use TLS
bradtgmurray added a commit to beeper/synapse-legacy-fork that referenced this pull request Nov 22, 2022
Synapse 1.72.0 (2022-11-22)
===========================

Please note that Synapse now only supports PostgreSQL 11+, because PostgreSQL 10 has reached end-of-life, c.f. our [Deprecation Policy](https://github.com/matrix-org/synapse/blob/develop/docs/deprecation_policy.md).

Bugfixes
--------

- Update forgotten references to legacy metrics in the included Grafana dashboard. ([\matrix-org#14477](matrix-org#14477))

Synapse 1.72.0rc1 (2022-11-16)
==============================

Features
--------

- Add experimental support for [MSC3912](matrix-org/matrix-spec-proposals#3912): Relation-based redactions. ([\matrix-org#14260](matrix-org#14260))
- Build Debian packages for Ubuntu 22.10 (Kinetic Kudu). ([\matrix-org#14396](matrix-org#14396))
- Add an [Admin API](https://matrix-org.github.io/synapse/latest/usage/administration/admin_api/index.html) endpoint for user lookup based on third-party ID (3PID). Contributed by @ashfame. ([\matrix-org#14405](matrix-org#14405))
- Faster joins: include heroes' membership events in the partial join response, for rooms without a name or canonical alias. ([\matrix-org#14442](matrix-org#14442))

Bugfixes
--------

- Faster joins: do not block creation of or queries for room aliases during the resync. ([\matrix-org#14292](matrix-org#14292))
- Fix a bug introduced in Synapse 1.64.0rc1 which could cause log spam when fetching events from other homeservers. ([\matrix-org#14347](matrix-org#14347))
- Fix a bug introduced in 1.66 which would not send certain pushrules to clients. Contributed by Nico. ([\matrix-org#14356](matrix-org#14356))
- Fix a bug introduced in v1.71.0rc1 where the power level event was incorrectly created during initial room creation. ([\matrix-org#14361](matrix-org#14361))
- Fix the refresh token endpoint to be under /r0 and /v3 instead of /v1. Contributed by Tulir @ Beeper. ([\matrix-org#14364](matrix-org#14364))
- Fix a long-standing bug where Synapse would raise an error when encountering an unrecognised field in a `/sync` filter, instead of ignoring it for forward compatibility. ([\matrix-org#14369](matrix-org#14369))
- Fix a background database update, introduced in Synapse 1.64.0, which could cause poor database performance. ([\matrix-org#14374](matrix-org#14374))
- Fix PostgreSQL sometimes using table scans for queries against the `event_search` table, taking a long time and a large amount of IO. ([\matrix-org#14409](matrix-org#14409))
- Fix rendering of some HTML templates (including emails). Introduced in v1.71.0. ([\matrix-org#14448](matrix-org#14448))
- Fix a bug introduced in Synapse 1.70.0 where the background updates to add non-thread unique indexes on receipts could fail when upgrading from 1.67.0 or earlier. ([\matrix-org#14453](matrix-org#14453))

Updates to the Docker image
---------------------------

- Add all Stream Writer worker types to `configure_workers_and_start.py`. ([\matrix-org#14197](matrix-org#14197))
- Remove references to legacy worker types in the multi-worker Dockerfile. ([\matrix-org#14294](matrix-org#14294))

Improved Documentation
----------------------

- Upload documentation PRs to Netlify. ([\matrix-org#12947](matrix-org#12947), [\matrix-org#14370](matrix-org#14370))
- Add addtional TURN server configuration example based on [eturnal](https://github.com/processone/eturnal) and adjust general TURN server doc structure. ([\matrix-org#14293](matrix-org#14293))
- Add example on how to load balance /sync requests. Contributed by [aceArt](https://aceart.de). ([\matrix-org#14297](matrix-org#14297))
- Edit sample Nginx reverse proxy configuration to use HTTP/1.1. Contributed by Brad Jones. ([\matrix-org#14414](matrix-org#14414))

Deprecations and Removals
-------------------------

- Remove support for PostgreSQL 10. ([\matrix-org#14392](matrix-org#14392), [\matrix-org#14397](matrix-org#14397))

Internal Changes
----------------

- Run unit tests against Python 3.11. ([\matrix-org#13812](matrix-org#13812))
- Add TLS support for generic worker endpoints. ([\matrix-org#14128](matrix-org#14128), [\matrix-org#14455](matrix-org#14455))
- Switch to a maintained action for installing Rust in CI. ([\matrix-org#14313](matrix-org#14313))
- Add override ability to `complement.sh` command line script to request certain types of workers. ([\matrix-org#14324](matrix-org#14324))
- Enabling testing of [MSC3874](matrix-org/matrix-spec-proposals#3874) (filtering of `/messages` by relation type) in complement. ([\matrix-org#14339](matrix-org#14339))
- Concisely log a failure to resolve state due to missing `prev_events`. ([\matrix-org#14346](matrix-org#14346))
- Use a maintained Github action to install Rust. ([\matrix-org#14351](matrix-org#14351))
- Cleanup old worker datastore classes. Contributed by Nick @ Beeper (@Fizzadar). ([\matrix-org#14375](matrix-org#14375))
- Test against PostgreSQL 15 in CI. ([\matrix-org#14394](matrix-org#14394))
- Remove unreachable code. ([\matrix-org#14410](matrix-org#14410))
- Clean-up event persistence code. ([\matrix-org#14411](matrix-org#14411))
- Update docstring to clarify that `get_partial_state_events_batch` does not just give you completely arbitrary partial-state events. ([\matrix-org#14417](matrix-org#14417))
- Fix mypy errors introduced by bumping the locked version of `attrs` and `gitpython`. ([\matrix-org#14433](matrix-org#14433))
- Make Dependabot only bump Rust deps in the lock file. ([\matrix-org#14434](matrix-org#14434))
- Fix an incorrect stub return type for `PushRuleEvaluator.run`. ([\matrix-org#14451](matrix-org#14451))
- Improve performance of `/context` in large rooms. ([\matrix-org#14461](matrix-org#14461))
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants