server: Three ports to rule them all #30828

bdarnell · 2018-10-01T14:46:22Z

The vision of using cmux to combine all cockroach traffic onto a single port never quite materialized. The HTTP interface has been on a separate port due to performance interactions, but now even if those were resolved we'd probably want to keep it on a separate port so it can use a browser-friendly certificate and other TLS settings.

More importantly, #30821 has shown that combining GRPC and pgwire on the same port is an unnecessary risk. The GRPC interface is only used internally (with a few exceptions for some CLI commands), and the pgwire interface is only used by clients. If these were separate ports, they could have separate firewall rules to provide an extra layer of defense against a bug in the GRPC layer. This is especially important in a hosted environment where the pgwire port may be exposed relatively broadly.

We should add crdb_internal SQL functions for any remaining RPCs that are used by the CLI so that it only interacts with the server over the pgwire protocol. Then we can add an option to run GRPC and pgwire on two separate ports (with the web interface on a third port). We'll still need to support the combined pgwire/grpc port for backwards compatibility, but this will give operators who want it the ability to firewall off the inter-node grpc traffic.

cc @mberhault

The text was updated successfully, but these errors were encountered:

keith-mcclellan · 2019-07-29T17:12:18Z

This is something that's being requested by at least one customer. Ideally, this will allow them to use a different TLS config for SQL connections where they want to host a cluster cert.

petermattis · 2019-07-29T17:43:15Z

@bdarnell Are you working on this?

bdarnell · 2019-07-29T17:43:42Z

No, I'm not working on it.

petermattis · 2019-07-29T17:47:56Z

We should add crdb_internal SQL functions for any remaining RPCs that are used by the CLI so that it only interacts with the server over the pgwire protocol.

I believe this is by far the largest piece of work here. We use the GRPC interface for decommissioning, quitting, bootstrapping, and various debug requests. I'm not clear on why moving these requests over to a SQL interface would be necessary. I'm probably missing something obvious. Can you elaborate?

bdarnell · 2019-07-29T17:52:51Z

The idea is that in MSO, we'd disallow end-user access to the GRPC port completely. Users would only be able to access the SQL and web interfaces. Of course, in MSO we'd also want to disallow many of the operations that are currently grpc-only (decommissioning, quitting, etc), so maybe we don't actually need to move everything over. I'm not sure what's left on the GRPC port that really needs moving over to the SQL or HTTP ports.

petermattis · 2019-07-29T17:58:34Z

I'm not sure what's left on the GRPC port that really needs moving over to the SQL or HTTP ports.

Leaving the admin commands alone (decommissioning, bootstrapping, quitting, haproxy), I think the remaining items are debug related: debug gossip-values, debug tsdump, debug zip. I suspect those could be left alone.

Did you have any thoughts about the UX behind specifying an additional port? --port should specify the pgwire port. We could then add a --grpc-port. How to specify that --port should be pgwire only? How to specify the certs to use?

bdarnell · 2019-07-29T18:00:09Z

Yeah, I was thinking we'd add --grpc-port, but hadn't gotten much beyond that (I guess you'd also use the grpc ports in the --join flag?). The certs would be handled by naming conventions in the certs-dir (for which we already have precedent as we have a number of different variations already).

knz · 2019-07-31T20:35:43Z

IMHO the existing --port flag (really, --listen-addr, as --port has been deprecated since 19.1) should remain dedicated to gRPC.

Then we'd have --sql-port (or --listen-addr-sql or similar) to split SQL away. If --sql-port is not specified, its value would default to that of --port.

As to the RPC endpoints. My proposal would be to keep them defined as RPCs in protobufs (and leave them reachable via RPC) and auto-generate SQL built-in functions for all of them, as well as go wrapper functions that would translate a call into a SQL call, given a sql connection.

This patch introduces the ability to split off the SQL server into a separate port, using the new command-line flag `--sql-addr`. **Motivation** This is a long- and oft-requested feature, aimed at facilitating deployments in professional networks that use firewalls to fence off "internal" (server-side) traffic from external (client) traffic. **Usage** How it works (simplified): - the flag `--sql-addr` indicates on which host/port to listen to for SQL connections. - it is possible to specify `--sql-addr` to be equal to `--listen-addr`, in which case both will share a single TCP connection (internally: using `cmux`). - in fact, the default for `--sql-addr` is to be equal to `--listen-addr`, for compatibility with previous versions of CockroachDB. - when `--sql-addr` and `--listen-addr` are different, then the server does not accept SQL connections any more on the `--listen-addr` address. Note (advanced): the computation of defaults is performed separately for the host and port part of the flag. The logic is the same as that used for `--http-addr`, using default port 5432 (postgres). For example, `--sql-addr=localhost` (without port number) is equivalent to `--sql-addr=localhost:5432`. **Design notes** This is one of two possible design directions that were considered: A. this implementation, where the opt-in flag splits the SQL server entirely onto a different port. *The original port becomes unable to accept SQL connections.* B. the opt-in flag *adds* a SQL server listening onto an additional address. *The original port remains able to accept SQL connections.* The option A is somewhat simpler to implement and also sufficient to answer the use cases from issues cockroachdb#5816 and cockroachdb#30828. However, for backward compatibility with pre-19.2 clients, we cannot both implement option A and make `--sql-addr=:5432` the new default configuration. This would force every client to change the port number they use to connect. That's why Option B may be more interesting for a next iteration. It offers the opportunity to let CockroachDB 19.2 *always* listen to a separate port, presumably the same as postgres, by default. This would smoothen the path to adoption by existing pg clients a little further. In this mode, Pre-19.2 clients would not be affected as the main crdb port (26257) would still accept SQL clients. Release note (cli change): CockroachDB now recognizes a flag `--sql-addr` which makes it possible to accept connections by clients on a separate TCP address and/or port number from the one used for intra-cluster (node-node) connections. This is aimed to enable firewalling client traffic from server traffic.

39305: server,config,cli: offer a separate port for SQL clients r=knz a=knz (First two commits from #39452) Fixes #30828. Fixes #5816. his patch introduces the ability to split off the SQL server into a separate port, using the new command-line flag `--sql-addr`. The remainder of this commit message details: - the motivation for the change - some usage documentation for users of the new feature - how to introduce this feature in existing clusters - reading recommendations to reviewers of this change - release note **Motivation** This is a long- and oft-requested feature, aimed at facilitating deployments in professional networks that use firewalls to fence off "internal" (server-side) traffic from external (client) traffic. **Usage** How it works (simplified): - the flag `--sql-addr` indicates on which host/port to listen to for SQL connections. - it is possible to specify `--sql-addr` to be equal to `--listen-addr`, in which case both will share a single TCP connection (internally: using `cmux`). - in fact, the default for `--sql-addr` is to be equal to `--listen-addr`, for compatibility with previous versions of CockroachDB. - when `--sql-addr` and `--listen-addr` are different, then the server does not accept SQL connections any more on the `--listen-addr` address. - the flag `--advertise-sql-addr` complements `--sql-addr` in the same way as `--advertise-addr` complements `--listen-addr`. In addition, the output of `cockroach node status` (and the contents of `crdb_internal.gossip_nodes`) is extended to display the SQL address. Note (intermediate): the new flags enables both using separate ports on the same host address (e.g. listening on 127.0.0.1 with separate ports for SQL and RPC), and also using the same port on separate host addresses (e.g. listening on 127.0.0.1:26257 for SQL, and 192.168.2.123:26257 for RPC). Both use cases are legitimate and have seen demand in the wild. Note (advanced): the computation of defaults is performed separately for the host and port part of the flag. The logic is the same as that used for `--http-addr`, using default port 26257. For example, `--sql-addr=localhost` (without port number) is equivalent to `--sql-addr=localhost:26257`. Note (advanced): the computation of defaults for `--advertise-sql-addr` is a bit non-trivial as it pulls its address and port part separately from `--sql-addr`, `--listen-addr` and `--advertise-addr`. Effort was made to ensure sane defaults when some flags but not all are omitted. This is best explained by examples, see the unit tests in flags_test.go for details. **Upgrading a cluster to use separate ports** If a cluster is already online and the need arises to split the ports, the flag can be introduced as follows: - when keeping port 26257 for SQL: 1. restart all nodes in a rolling fashion, updating the `--join` flag on each node to add the new RPC address. 2. restart all nodes in a rolling fashion, adding `--sql-addr=:26257` and setting `--listen-addr=xxx` to the new RPC address. The previous address with port 26257 can also be removed from `--join` in the same step. - when keeping port 26257 for RPC, introducing a new SQL addr/port: 1. if load balancers are in use, extend the load balancer config to also attempt connections on the new SQL address. If load balancers are not in use, temporarily add one that accepts clients on the old addr/port and redirects the connection on both the new addr/port and the old. 2. restart all nodes in a rolling fashion, updating the `--sql-addr` flag. **Review notes** This change was constructed as follows: 1. introducing new fields in `base.Config` 2. implementing the address validation logic in `addr_validation.go` and corresponding unit tests. 3. picking up the new fields to listen separately in `(s *Server) startListenRPCAndSQL()` 4. adding the command line flag parsing and default logic in `flags.go` and corresponding unit tests. 5. manually testing that indeed the server can listen separately on separate ports. 6. to ensure that most unit tests also exercise the split ports, make TestServer/TestCluster set the new flags `SplitListenSQL` 7. extend the TestServer interface to make the SQL address available alongside the RPC address. 8. update all the tests using a SQL connection to use the SQL address 9. update the CLI test logic from `cli_test.go` to configure the `--host` flag to the RPC or SQL address depending on the command being invoked. At this point all tests except for `TestZip` would pass successfully. The remainder of the work is for the benefit of `cockroach zip`, which needs to discover the reachable address of every node in a cluster from the address of just 1 of them, doing so by inspecting all the node descriptors. `cockroach zip` is also peculiar in that it uses both the RPC and SQL interfaces. 1. equip the node descriptor with a field for the SQL address and ensure it gets populated. 2. make the zip logic fetch the SQL address of the primary node by a Node() status request over RPC. 3. for the loop over all nodes, use the SQL address in each node's descriptor for SQL queries separately from the RPC logic. Release note (cli change): CockroachDB now recognizes a flag `--sql-addr` which makes it possible to accept connections by clients on a separate TCP address and/or port number from the one used for intra-cluster (node-node) connections. This is aimed to enable firewalling client traffic from server traffic. Co-authored-by: Raphael 'kena' Poss <knz@cockroachlabs.com>

bdarnell added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) A-kv-security A-kv-server Relating to the KV-level RPC server A-security labels Oct 1, 2018

bdarnell added this to the 2.2 milestone Oct 1, 2018

bdarnell self-assigned this Oct 1, 2018

petermattis removed this from the 2.2 milestone Oct 5, 2018

bdarnell removed their assignment Jul 29, 2019

knz mentioned this issue Aug 4, 2019

server,config,cli: offer a separate port for SQL clients #39305

Merged

knz self-assigned this Aug 5, 2019

craig bot closed this as completed in #39305 Aug 8, 2019

keith-mcclellan mentioned this issue May 28, 2020

cockroach start Doc Update cockroachdb/docs#7436

Closed

tcolgate mentioned this issue Jul 31, 2020

Unsplit the OTLP receiver protocols, use a shared port open-telemetry/opentelemetry-collector#1256

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

server: Three ports to rule them all #30828

server: Three ports to rule them all #30828

bdarnell commented Oct 1, 2018

keith-mcclellan commented Jul 29, 2019

petermattis commented Jul 29, 2019

bdarnell commented Jul 29, 2019

petermattis commented Jul 29, 2019

bdarnell commented Jul 29, 2019

petermattis commented Jul 29, 2019

bdarnell commented Jul 29, 2019

knz commented Jul 31, 2019

server: Three ports to rule them all #30828

server: Three ports to rule them all #30828

Comments

bdarnell commented Oct 1, 2018

keith-mcclellan commented Jul 29, 2019

petermattis commented Jul 29, 2019

bdarnell commented Jul 29, 2019

petermattis commented Jul 29, 2019

bdarnell commented Jul 29, 2019

petermattis commented Jul 29, 2019

bdarnell commented Jul 29, 2019

knz commented Jul 31, 2019