Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add initial security best practices documentation #8952

Merged
merged 1 commit into from
Feb 17, 2021
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
196 changes: 165 additions & 31 deletions content/en/docs/ops/best-practices/security/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,36 +7,170 @@ owner: istio/wg-security-maintainers
test: no
---

This section provides some deployment guidelines to help keep a service mesh secure.

## Use namespaces for isolation
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if there should be an anti-patterns section (things not to do) and one of the items should be not relying too much on namespaces for isolation?


If there are multiple service operators (a.k.a. [SREs](https://en.wikipedia.org/wiki/Site_reliability_engineering))
deploying different services in a medium- or large-size cluster, we recommend creating a separate
[Kubernetes namespace](https://kubernetes.io/docs/tasks/administer-cluster/namespaces-walkthrough/) for each SRE team to isolate their access.
For example, you can create a `team1-ns` namespace for `team1`, and `team2-ns` namespace for `team2`, such
that both teams cannot access each other's services.

Let us consider a three-tier application with three services: `photo-frontend`,
`photo-backend`, and `datastore`. The photo SRE team manages the
`photo-frontend` and `photo-backend` services while the datastore SRE team
manages the `datastore` service. The `photo-frontend` service can access
`photo-backend`, and the `photo-backend` service can access `datastore`.
However, the `photo-frontend` service cannot access `datastore`.

In this scenario, a cluster administrator creates two namespaces:
`photo-ns` and `datastore-ns`. The administrator has
access to all namespaces and each team only has access to its own namespace.
The photo SRE team creates two service accounts to run `photo-frontend` and
`photo-backend` respectively in the `photo-ns` namespace. The datastore SRE
team creates one service account to run the `datastore` service in the
`datastore-ns` namespace. Moreover, we need to enforce the service access
control in [Istio Mixer](https://istio.io/v1.6/docs/reference/config/policy-and-telemetry/) such that
`photo-frontend` cannot access datastore.

In this setup, Kubernetes can isolate the operator privileges on managing the services.
Istio manages certificates and keys in all namespaces
and enforces different access control rules to the services.
Istio security features provide strong identity, powerful policy, transparent TLS encryption, and authentication, authorization and audit (AAA) tools to protect your services and data.
However, to fully make use of these features securely, care must be taken to follow best practices. It is recommended to review the [Security overview](/docs/concepts/security/) before proceeding.

## Mutual TLS

Istio will [automatically](/docs/ops/configuration/traffic-management/tls-configuration/#auto-mtls) encrypt traffic using [Mutual TLS](/docs/concepts/security/#mutual-tls-authentication) whenever possible.
However, proxies are configured in [permissive mode](/docs/concepts/security/#permissive-mode) by default, meaning they will accept both mutual TLS and plaintext traffic.

While this is required for incremental adoption or allowing traffic from clients without an Istio sidecar, it also weakens the security stance.
It is recommended to [migrate to strict mode](/docs/tasks/security/authentication/mtls-migration/) when possible, to enforce that mutual TLS is used.

Mutual TLS alone is not always enough to fully secure traffic, however, as it provides only authentication, not authorization.
This means that anyone with a valid certificate can still access a service.

To fully lock down traffic, it is recommended to configure [authorization policies](/docs/tasks/security/authorization/).
These allow creating fine-grained policies to allow or deny traffic. For example, you can allow only requests from the `app` namespace to access the `hello-world` service.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After mutual TLS, I think it'd be good to have a section for using the beta security policies.

We should cover the limitations of ALLOW policies / describe when it's safer to use DENY.

We should also describe how authentication policies are decoupled from authorization policies and make it clear that authentication policies without corresponding authorization policies are just security theater.

@yangminzhu @myidpt @incfly

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would need help with "We should cover the limitations of ALLOW policies / describe when it's safer to use DENY.", if one of the people mentioned above could take this on?

I could take on the later part if needed

## Understand traffic capture limitations

The Istio sidecar works by capturing both inbound traffic and outbound traffic and directing them through the sidecar proxy.

However, not *all* traffic is captured:

* Redirection only handles TCP based traffic. Any UDP or ICMP packets will not be captured or modified.
* Inbound capture is disabled on many [ports used by the sidecar](/docs/ops/deployment/requirements/#ports-used-by-istio) as well as port 22. This list can be expanded by options like `traffic.sidecar.istio.io/excludeInboundPorts`.
* Outbound capture may similarly be reduced through settings like `traffic.sidecar.istio.io/excludeOutboundPorts` or other means.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if there should be a stronger link between this section and the immediately following NetworkPolicy section? You could explain which of the issues raised here can be addressed with k8s NetworkPolicy


In general, there is minimal security boundary between an application and its sidecar proxy. Configuration of the sidecar is allowed on a per-pod basis, and both run in the same network/process namespace.
As such, the application may have the ability to remove redirection rules and remove, alter, terminate, or replace the sidecar proxy.
This allows a pod to intentionally bypass its sidecar for outbound traffic or intentionally allow inbound traffic to bypass its sidecar.

As a result, it is not secure to rely on all traffic being captured unconditionally by Istio.
Instead, the security boundary is that a client may not bypass *another* pod's sidecar.

For example, if I run the `reviews` application on port `9080`, I can assume that all traffic from the `productpage` application will be captured by the sidecar proxy,
where Istio authentication and authorization policies may apply.

### Defense in depth with `NetworkPolicy`

To further secure traffic, Istio policies can be layered with Kubernetes [Network Policies](https://kubernetes.io/docs/concepts/services-networking/network-policies/).
This enables a strong [defense in depth](https://en.wikipedia.org/wiki/Defense_in_depth_(computing)) strategy that can be used to further strengthen the security of your mesh.

For example, you may choose to only allow traffic to port `9080` of our `reviews` application.
howardjohn marked this conversation as resolved.
Show resolved Hide resolved
In the event of a compromised pod or security vulnerability in the cluster, this may limit or stop an attackers progress.

### Securing egress traffic

A common misconception is that options like [`outboundTrafficPolicy: REGISTRY_ONLY`](/docs/tasks/traffic-management/egress/egress-control/#envoy-passthrough-to-external-services) acts as a security policy preventing all access to undeclared services.
However, this is not a strong security boundary as mentioned above, and should be considered best-effort.

While this is useful to prevent accidental dependencies, if you want to secure egress traffic, and enforce all outbound traffic goes through a proxy, you should instead rely on an [Egress Gateway](/docs/tasks/traffic-management/egress/egress-gateway/).
When combined with a [Network Policy](/docs/tasks/traffic-management/egress/egress-gateway/#apply-kubernetes-network-policies), you can enforce all traffic, or some subset, goes through the egress gateway.
This ensures that even if a client accidentally or maliciously bypasses their sidecar, the request will be blocked.

## Configure TLS verification in Destination Rule when using TLS origination

Istio offers the ability to [originate TLS](/docs/tasks/traffic-management/egress/egress-tls-origination/) from the sidecar proxy.
This enables applications that send plaintext HTTP traffic to be transparently "upgraded" to HTTPS.

Care must be taken when configuring the `DestinationRule`'s `tls` setting to specify the `caCertificates` field.
When this is not set, the servers certificate will not be verified.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of scope, but it feels like the API should require either caCertificates or something like insecureSkipVerify: true

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's the plan, but its challenging to change due to backwards compat.


For example:

{{< text yaml >}}
apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
name: google-tls
spec:
host: google.com
trafficPolicy:
tls:
mode: SIMPLE
caCertificates: /etc/ssl/certs/ca-certificates.crt
{{< /text >}}

## Gateways

When running an Istio [gateway](/docs/tasks/traffic-management/ingress/), there are a few resources involved:

* `Gateway`s, which controls the ports and TLS settings for the gateway.
* `VirtualService`s, which control the routing logic. These are associated with `Gateway`s by direct reference in the `gateways` field and a mutual agreement on the `hosts` field in the `Gateway` and `VirtualService`.

### Restrict `Gateway` creation privileges

It is recommended to restrict creation of Gateway resources to trusted cluster administrators. This can be achieved by [Kubernetes RBAC policies](https://kubernetes.io/docs/reference/access-authn-authz/rbac/) or tools like [Open Policy Agent](https://www.openpolicyagent.org/).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May be worth mentioning that OPA/Gatekeeper can also be used to enforce some of these policies, like restricting hosts

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may be misunderstanding, but doesn't this line already recommend OPA?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean that it mentions OPA as a way to restrict Gateway creation to administrators, but it might be worth saying that OPA could be used for more of these recommendations too

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, got it. Agreed 👍


### Avoid overly broad `hosts` configurations

When possible, avoid overly broad `hosts` settings in `Gateway`.

For example, this configuration will allow any `VirtualService` to bind to the `Gateway`, potentially exposing unexpected domains:

{{< text yaml >}}
servers:
- port:
number: 80
name: http
protocol: HTTP
hosts:
- "*"
{{< /text >}}

This should be locked down to allow only specific domains or specific namespaces:

{{< text yaml >}}
servers:
- port:
number: 80
name: http
protocol: HTTP
hosts:
- "foo.example.com" # Allow only VirtualServices that are for foo.example.com
- "default/bar.example.com" # Allow only VirtualServices in the default namespace that are for bar.example.com
- "route-namespace/*" # Allow only VirtualServices in the route-namespace namespace for any host
{{< /text >}}

### Isolate sensitive services

It may be desired to enforce stricter physical isolation for sensitive services. For example, you may want to run a
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May be helpful to offer example reasons for why someone might want to do this

[dedicated gateway instance](/docs/setup/install/istioctl/#configure-gateways) for a sensitive `payments.example.com`, while utilizing a single
shared gateway instance for less sensitive domains like `blog.example.com` and `store.example.com`.

## Protocol detection

Istio will [automatically determine the protocol](/docs/ops/configuration/traffic-management/protocol-selection/#automatic-protocol-selection) of traffic it sees.
To avoid accidental or intentional miss detection, which may result in unexpected traffic behavior, it is recommended to [explicitly declare the protocol](/docs/ops/configuration/traffic-management/protocol-selection/#explicit-protocol-selection) where possible.

## CNI

In order to transparently capture all traffic, Istio relies on `iptables` rules configured by the `istio-init` `initContainer`.
This adds a [requirement](/docs/ops/deployment/requirements/) for the `NET_ADMIN` and `NET_RAW` [capabilities](https://kubernetes.io/docs/tasks/configure-pod-container/security-context/#set-capabilities-for-a-container) to be available to the pod.

To reduce privileges granted to pods, Istio offers a [CNI plugin](/docs/setup/additional-setup/cni/) which removes this requirement.

{{< warning >}}
The Istio CNI plugin is currently an alpha feature.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stewartbutler @mandarjog @justinpettit heads up, this will put pressure on carrying istio/cni into beta

{{< /warning >}}

## Use hardened docker images

Istio's default docker images, including those run by the control plane, gateway, and sidecar proxies, are based on `ubuntu`.
This provides various tools such as `bash` and `curl`, which trades off convenience for an increase attack surface.

Istio also offers a smaller image based on [distroless images](/docs/ops/configuration/security/harden-docker-images/) that reduces the dependencies in the image.

{{< warning >}}
Distroless images are currently an alpha feature.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@howardjohn @sdake can we take this to beta?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

opened a tracker in istio/enhancements#22

{{< /warning >}}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add:

  1. lockdown Istiod debug API (and describe the functionality you will lose by doing so)
  2. locking down istiod insecure port
  3. warning about the limitations of Envoy Admin API - an attacker can extract information that can inform subsequent attacks (not ideal) and can quitquitquit the local sidecar (which isn't a big deal, you're just dos'ing yourself). also reference Admin endpoint security envoyproxy/envoy#2763

## Release and security policy

In order to ensure your cluster has the latest security patches for known vulnerabilities, it is important to stay on the latest patch release of Istio and ensure that you are on a [supported release](/about/supported-releases) that is still receiving security patches.

## Avoid alpha and experimental features

All Istio features and APIs are assigned a [feature status](/about/feature-stages/), defining its stability, deprecation policy, and security policy.

Because alpha and experimental features do not have as strong security guarantees, it is recommended to avoid them whenever possible.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps make this stronger. We do not create security releases for vulnerabilities in experimental features. We decide on a case by case basis whether to fix vulnerabilities in alpha features (basically, if we think there is broad adoption of an alpha feature, we'll patch vulnerabilities following principles of responsible disclosure)


To determine the feature status of features in use in your cluster, consult the [Istio features](/about/feature-stages/#istio-features) list.

<!-- In the future, we should document the `istioctl` command to check this when available. -->

## Configure third party service account tokens

Expand Down Expand Up @@ -66,4 +200,4 @@ To determine if your cluster supports third party tokens, look for the `TokenReq
}
{{< /text >}}

While most cloud providers support this feature now, many local development tools and custom installations may not. To enable this feature, please refer to the [Kubernetes documentation](https://kubernetes.io/docs/tasks/configure-pod-container/configure-service-account/#service-account-token-volume-projection).
While most cloud providers support this feature now, many local development tools and custom installations may not prior to Kubernetes 1.20. To enable this feature, please refer to the [Kubernetes documentation](https://kubernetes.io/docs/tasks/configure-pod-container/configure-service-account/#service-account-token-volume-projection).