Skip to content

Commit

Permalink
[devdocs] Add dev doc describing gateways configuration
Browse files Browse the repository at this point in the history
  • Loading branch information
daniel-goldstein committed Apr 26, 2024
1 parent 2d41c6f commit a4746e2
Show file tree
Hide file tree
Showing 2 changed files with 35 additions and 0 deletions.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
35 changes: 35 additions & 0 deletions dev-docs/services/gateways.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
# Overview of the Batch Control Plane External and Internal Load Balancers

Traffic flows into the Kubernetes cluster through one of two LoadBalancers: `gateway`,
which receives traffic from the internet, and `internal-gateway`, which manages traffic
from batch workers to the services in Kubernetes.

Both of these services receive all traffic destined for services in the cluster and act
as reverse proxies, routing those requests to the appropriate service, managing TLS,
sometimes performing authorization checks, and enforcing rate limits. Our reverse proxy
of choice is [Envoy](https://www.envoyproxy.io/).

The general routing rules for the gateways are as follows (Kubernetes DNS provides addresses
for `Service`s in the form of `<service>.<namespace>.svc.cluster.local`):

### Gateway
- `<service>.hail.is/<path> => <service>.default.svc.cluster.local/<path>`
- `internal.hail.is/<dev-or-pr>/<service>/<path> => <service>.<dev-or-pr>.svc.cluster.local/<developer>/<service>/<path>`[^1]

[^1]: At time of writing, developers cannot currently sign in to PR namespaces through the
browser because they are not assigned a callback for GCP/Azure OAuth flows.


### Internal Gateway
- `<service>.hail/<path> => <service>.default.svc.cluster.local/<path>`
- `internal.hail/<dev-or-pr>/<service>/<path> => <service>.<dev-or-pr>.svc.cluster.local/<developer>/<service>/<path>`

For Envoy to properly pool connections to K8s services, it needs to know
which "clusters" (services) exist at any point in time. This list is static for
production services, but test/PR namespaces are ephemeral and are
created/destroyed by CI many times per day. In order to notify the gateways
of new namespaces/services, CI tracks which namespaces are active and periodically
updates a K8s `ConfigMap` with fresh Envoy configuration. The gateways, using the
[Envoy xDS API](https://www.envoyproxy.io/docs/envoy/latest/intro/arch_overview/operations/dynamic_configuration#xds-configuration-api-overview)
can dynamically load this new configuration as it changes without dropping existing traffic.
You can see CI's current view of the cluster's namespaces/services at ci.hail.is/namespaces.

0 comments on commit a4746e2

Please sign in to comment.