Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

server: Three ports to rule them all #30828

Closed
bdarnell opened this issue Oct 1, 2018 · 8 comments · Fixed by #39305
Closed

server: Three ports to rule them all #30828

bdarnell opened this issue Oct 1, 2018 · 8 comments · Fixed by #39305
Assignees
Labels
A-kv-security A-kv-server Relating to the KV-level RPC server A-security C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)

Comments

@bdarnell
Copy link
Contributor

bdarnell commented Oct 1, 2018

The vision of using cmux to combine all cockroach traffic onto a single port never quite materialized. The HTTP interface has been on a separate port due to performance interactions, but now even if those were resolved we'd probably want to keep it on a separate port so it can use a browser-friendly certificate and other TLS settings.

More importantly, #30821 has shown that combining GRPC and pgwire on the same port is an unnecessary risk. The GRPC interface is only used internally (with a few exceptions for some CLI commands), and the pgwire interface is only used by clients. If these were separate ports, they could have separate firewall rules to provide an extra layer of defense against a bug in the GRPC layer. This is especially important in a hosted environment where the pgwire port may be exposed relatively broadly.

We should add crdb_internal SQL functions for any remaining RPCs that are used by the CLI so that it only interacts with the server over the pgwire protocol. Then we can add an option to run GRPC and pgwire on two separate ports (with the web interface on a third port). We'll still need to support the combined pgwire/grpc port for backwards compatibility, but this will give operators who want it the ability to firewall off the inter-node grpc traffic.

cc @mberhault

@bdarnell bdarnell added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) A-kv-security A-kv-server Relating to the KV-level RPC server A-security labels Oct 1, 2018
@bdarnell bdarnell added this to the 2.2 milestone Oct 1, 2018
@bdarnell bdarnell self-assigned this Oct 1, 2018
@petermattis petermattis removed this from the 2.2 milestone Oct 5, 2018
@keith-mcclellan
Copy link
Contributor

This is something that's being requested by at least one customer. Ideally, this will allow them to use a different TLS config for SQL connections where they want to host a cluster cert.

@petermattis
Copy link
Collaborator

@bdarnell Are you working on this?

@bdarnell
Copy link
Contributor Author

No, I'm not working on it.

@bdarnell bdarnell removed their assignment Jul 29, 2019
@petermattis
Copy link
Collaborator

We should add crdb_internal SQL functions for any remaining RPCs that are used by the CLI so that it only interacts with the server over the pgwire protocol.

I believe this is by far the largest piece of work here. We use the GRPC interface for decommissioning, quitting, bootstrapping, and various debug requests. I'm not clear on why moving these requests over to a SQL interface would be necessary. I'm probably missing something obvious. Can you elaborate?

@bdarnell
Copy link
Contributor Author

The idea is that in MSO, we'd disallow end-user access to the GRPC port completely. Users would only be able to access the SQL and web interfaces. Of course, in MSO we'd also want to disallow many of the operations that are currently grpc-only (decommissioning, quitting, etc), so maybe we don't actually need to move everything over. I'm not sure what's left on the GRPC port that really needs moving over to the SQL or HTTP ports.

@petermattis
Copy link
Collaborator

I'm not sure what's left on the GRPC port that really needs moving over to the SQL or HTTP ports.

Leaving the admin commands alone (decommissioning, bootstrapping, quitting, haproxy), I think the remaining items are debug related: debug gossip-values, debug tsdump, debug zip. I suspect those could be left alone.

Did you have any thoughts about the UX behind specifying an additional port? --port should specify the pgwire port. We could then add a --grpc-port. How to specify that --port should be pgwire only? How to specify the certs to use?

@bdarnell
Copy link
Contributor Author

Yeah, I was thinking we'd add --grpc-port, but hadn't gotten much beyond that (I guess you'd also use the grpc ports in the --join flag?). The certs would be handled by naming conventions in the certs-dir (for which we already have precedent as we have a number of different variations already).

@knz
Copy link
Contributor

knz commented Jul 31, 2019

IMHO the existing --port flag (really, --listen-addr, as --port has been deprecated since 19.1) should remain dedicated to gRPC.

Then we'd have --sql-port (or --listen-addr-sql or similar) to split SQL away. If --sql-port is not specified, its value would default to that of --port.

As to the RPC endpoints. My proposal would be to keep them defined as RPCs in protobufs (and leave them reachable via RPC) and auto-generate SQL built-in functions for all of them, as well as go wrapper functions that would translate a call into a SQL call, given a sql connection.

@knz knz self-assigned this Aug 5, 2019
knz added a commit to knz/cockroach that referenced this issue Aug 5, 2019
This patch introduces the ability to split off the SQL server into a
separate port, using the new command-line flag `--sql-addr`.

**Motivation**

This is a long- and oft-requested feature, aimed at facilitating
deployments in professional networks that use firewalls to fence off
"internal" (server-side) traffic from external (client) traffic.

**Usage**

How it works (simplified):

- the flag `--sql-addr` indicates on which host/port to listen to for
  SQL connections.
- it is possible to specify `--sql-addr` to be equal to
  `--listen-addr`, in which case both will share a single TCP
  connection (internally: using `cmux`).
- in fact, the default for `--sql-addr` is to be equal to
  `--listen-addr`, for compatibility with previous versions
  of CockroachDB.
- when `--sql-addr` and `--listen-addr` are different, then
  the server does not accept SQL connections any more on the
  `--listen-addr` address.

Note (advanced): the computation of defaults is performed separately
for the host and port part of the flag. The logic is the same as that
used for `--http-addr`, using default port 5432 (postgres). For
example, `--sql-addr=localhost` (without port number) is equivalent to
`--sql-addr=localhost:5432`.

**Design notes**

This is one of two possible design directions that were considered:

A. this implementation, where the opt-in flag splits the SQL server
   entirely onto a different port. *The original port becomes unable
   to accept SQL connections.*

B. the opt-in flag *adds* a SQL server listening onto an additional
   address. *The original port remains able to accept SQL
   connections.*

The option A is somewhat simpler to implement and also sufficient
to answer the use cases from issues cockroachdb#5816 and cockroachdb#30828.

However, for backward compatibility with pre-19.2 clients, we cannot
both implement option A and make `--sql-addr=:5432` the new default
configuration. This would force every client to change the port number
they use to connect.

That's why Option B may be more interesting for a next iteration. It
offers the opportunity to let CockroachDB 19.2 *always* listen to a
separate port, presumably the same as postgres, by default. This would
smoothen the path to adoption by existing pg clients a little
further. In this mode, Pre-19.2 clients would not be affected as the
main crdb port (26257) would still accept SQL clients.

Release note (cli change): CockroachDB now recognizes a flag
`--sql-addr` which makes it possible to accept connections by clients
on a separate TCP address and/or port number from the one used
for intra-cluster (node-node) connections. This is aimed to enable
firewalling client traffic from server traffic.
craig bot pushed a commit that referenced this issue Aug 8, 2019
39305: server,config,cli: offer a separate port for SQL clients r=knz a=knz

(First two commits from #39452)
Fixes #30828.
Fixes #5816.

his patch introduces the ability to split off the SQL server into a
separate port, using the new command-line flag `--sql-addr`.

The remainder of this commit message details:
- the motivation for the change
- some usage documentation for users of the new feature
- how to introduce this feature in existing clusters
- reading recommendations to reviewers of this change
- release note

**Motivation**

This is a long- and oft-requested feature, aimed at facilitating
deployments in professional networks that use firewalls to fence off
"internal" (server-side) traffic from external (client) traffic.

**Usage**

How it works (simplified):

- the flag `--sql-addr` indicates on which host/port to listen to for
  SQL connections.
- it is possible to specify `--sql-addr` to be equal to
  `--listen-addr`, in which case both will share a single TCP
  connection (internally: using `cmux`).
- in fact, the default for `--sql-addr` is to be equal to
  `--listen-addr`, for compatibility with previous versions
  of CockroachDB.
- when `--sql-addr` and `--listen-addr` are different, then
  the server does not accept SQL connections any more on the
  `--listen-addr` address.
- the flag `--advertise-sql-addr` complements `--sql-addr` in
  the same way as `--advertise-addr` complements `--listen-addr`.

In addition, the output of `cockroach node status` (and the contents
of `crdb_internal.gossip_nodes`) is extended to display the SQL
address.

Note (intermediate): the new flags enables both using separate ports
on the same host address (e.g. listening on 127.0.0.1 with separate
ports for SQL and RPC), and also using the same port on separate
host addresses (e.g. listening on 127.0.0.1:26257 for SQL, and
192.168.2.123:26257 for RPC). Both use cases are legitimate
and have seen demand in the wild.

Note (advanced): the computation of defaults is performed separately
for the host and port part of the flag. The logic is the same as that
used for `--http-addr`, using default port 26257. For example,
`--sql-addr=localhost` (without port number) is equivalent to
`--sql-addr=localhost:26257`.

Note (advanced): the computation of defaults for
`--advertise-sql-addr` is a bit non-trivial as it pulls its address
and port part separately from `--sql-addr`, `--listen-addr` and
`--advertise-addr`. Effort was made to ensure sane defaults when some
flags but not all are omitted. This is best explained by examples, see
the unit tests in flags_test.go for details.

**Upgrading a cluster to use separate ports**

If a cluster is already online and the need arises to split the ports,
the flag can be introduced as follows:

- when keeping port 26257 for SQL:

  1. restart all nodes in a rolling fashion, updating the `--join`
     flag on each node to add the new RPC address.

  2. restart all nodes in a rolling fashion, adding
     `--sql-addr=:26257` and setting `--listen-addr=xxx` to the new
     RPC address. The previous address with port 26257 can also be
     removed from `--join` in the same step.

- when keeping port 26257 for RPC, introducing a new SQL addr/port:

  1. if load balancers are in use, extend the load balancer
     config to also attempt connections on the new SQL address.
	 If load balancers are not in use, temporarily add one
	 that accepts clients on the old addr/port and redirects the
	 connection on both the new addr/port and the old.

  2. restart all nodes in a rolling fashion, updating the `--sql-addr`
     flag.

**Review notes**

This change was constructed as follows:

1. introducing new fields in `base.Config`
2. implementing the address validation logic
   in `addr_validation.go` and corresponding unit tests.
3. picking up the new fields to listen separately
   in `(s *Server) startListenRPCAndSQL()`
4. adding the command line flag parsing and default
   logic in `flags.go` and corresponding unit tests.
5. manually testing that indeed the server can
   listen separately on separate ports.
6. to ensure that most unit tests also exercise
   the split ports, make TestServer/TestCluster
   set the new flags `SplitListenSQL`
7. extend the TestServer interface to make the SQL
   address available alongside the RPC address.
8. update all the tests  using a SQL connection
   to use the SQL address
9. update the CLI test logic from `cli_test.go`
   to configure the `--host` flag to the RPC
   or SQL address depending on the command
   being invoked.

At this point all tests except for `TestZip` would pass
successfully. The remainder of the work is for the benefit of
`cockroach zip`, which needs to discover the reachable address of
every node in a cluster from the address of just 1 of them, doing so
by inspecting all the node descriptors. `cockroach zip` is also
peculiar in that it uses both the RPC and SQL interfaces.

1. equip the node descriptor with a field for the SQL address
   and ensure it gets populated.
2. make the zip logic fetch the SQL address of the primary
   node by a Node() status request over RPC.
3. for the loop over all nodes, use the SQL address
   in each node's descriptor for SQL queries separately
   from the RPC logic.

Release note (cli change): CockroachDB now recognizes a flag
`--sql-addr` which makes it possible to accept connections by clients
on a separate TCP address and/or port number from the one used
for intra-cluster (node-node) connections. This is aimed to enable
firewalling client traffic from server traffic.

Co-authored-by: Raphael 'kena' Poss <knz@cockroachlabs.com>
@craig craig bot closed this as completed in #39305 Aug 8, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-kv-security A-kv-server Relating to the KV-level RPC server A-security C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants