Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Admin endpoint security #2763

Open
htuch opened this issue Mar 8, 2018 · 32 comments
Open

Admin endpoint security #2763

htuch opened this issue Mar 8, 2018 · 32 comments

Comments

@htuch
Copy link
Member

htuch commented Mar 8, 2018

The admin endpoint today is unsecured (no authentication or TLS), with the assumption that it is only available to localhost or accessible on a trusted network. Ideally:

  • We want to be able to restrict access to only trusted IPs, client certificates and ensure we have transport security.
  • We want to have some ability to distinguish roles and access to the admin console, i.e. distinct identities might be allowed to operate /quitquitquit vs. stats monitoring.

Beyond just security, there's also the question of what the admin console is. Is it just a curlable utility, an interactive web console or is it a first-class API intended for programatic use? Should it offer gRPC endpoints (in particular as we are moving towards a proto definition of its contents in places such as #2172). Answers to this affect the framing of security considerations.

Opening this issue to start the design discussion here.

@mattklein123
Copy link
Member

A few initial points are in the comment here: envoyproxy/data-plane-api#523 (comment)

@htuch
Copy link
Member Author

htuch commented Mar 9, 2018

One thing I wonder about is whether we should be offering a traditional web interface at all. Here's one alternative design; implement only gRPC or REST endpoints, have folks build out Javascript client side interfaces which can be served from a listener via Envoy's direct response (https://www.envoyproxy.io/docs/envoy/latest/api-v2/api/v2/route/route.proto#envoy-api-field-route-route-direct-response) mechanism. This removes Envoy from being in the business of worrying about XSS/CSRF/other web security concerns, while providing the convenience of browser admin capability.

@mattklein123
Copy link
Member

For reference the web stuff was only recently added by @jmarantz. I objected to this slightly at the time because realistically I think the only production use for the admin endpoint is either curl or other automated tools, but it didn't seem like a big deal to me to serve the landing page in HTML so I didn't worry about it that much.

My overall view of things right now is that the admin endpoint is not secure it all and must be accessed only over a trusted network. In the future I think we should do the following things:

  1. Generally provide only gRPC/REST endpoints codified in the data-plane-api repo (already tracked by various issues).
  2. Promote the admin endpoint to a fuller listener config allowing for both TLS/mTLS and inline RBAC security on a per-endpoint basis (once we have the inline RBAC filter). This will allow operators to configure things as they like it.
  3. Per @htuch we should consider having a "secure by default" admin config which locks admin down to localhost only and then requires operators to open up individual endpoints (.e.g., /stats) as they see fit.

In general I think that worrying about things like XSS/CSRF/etc. is kind of a waste of time for this. I will defer to @jmarantz who knows substantially more about this on what we should do on that front security-wise (hopefully optimizing for realistic usage scenarios).

@mattklein123
Copy link
Member

P.S., it would be great if someone in the community who is passionate about admin security might want to own this. There will be a non-trivial amount of work here to get to where we want to ultimately be.

@jmarantz
Copy link
Contributor

jmarantz commented Mar 10, 2018

One clarification: the http handlers for mutating operations were pre-existing. The change added a web home-page with proper escaping, and reduced XSS through proper http content types, and AFAIK added no additional exposure.

I agree that more restricted access by default would help.

@mattklein123
Copy link
Member

One clarification: the http handlers for mutating operations were pre-existing. The change added a web home-page with proper escaping, and reduced XSS through proper http content types, and AFAIK added no additional exposure.

Sorry I spoke incorrectly. Before your change we had no HTML. I know basically nothing about web security. My real point was that if the existence of HTML is going to cause security consternation, I don't think it's worth maintaining because IMHO the use of the HTML endpoints is not going to happen in production. Assuming there is no additional exposure, than it's fine by me. I just wanted to point out that we should be careful about the HTML stuff if that is going to cause us additional maintenance burden.

@ofek
Copy link
Contributor

ofek commented Mar 12, 2018

Additionally, I would strongly recommend Envoy have a separate listener/endpoint that only serves /stats with optional basic auth.

@ofek
Copy link
Contributor

ofek commented Mar 13, 2018

@DataDog is recommending https://gist.github.com/ofek/6051508cd0dfa98fc6c13153b647c6f8 until this is solved.

Idea courtesy of @ggreenway
Config courtesy of @bndw (with this modification from @htuch)

htuch pushed a commit to envoyproxy/data-plane-api that referenced this issue Mar 13, 2018
Fixes envoyproxy/envoy#2769
References envoyproxy/envoy#2763

Signed-off-by: Matt Klein <mklein@lyft.com>
@taiki45
Copy link
Member

taiki45 commented Mar 14, 2018

I vote to disabling web admin interface. Alternatively:

  • Move admin operations like /cpuprofiler or /logging to runtime configuration flags or simple Unix domain socket commands like haproxy one. This allows us to manage permissions via traditional file system permissions.
  • Promote pull-based endpoints like /stats to gRPC/REST API and let them have fuller listener config.

In additon, these features might be disabled by default. All of programatic use goes gRPC/REST API to supoprt user extendability, and rest of admin interface gets on FS permissions.

Personally, I like Envoy's curl-able interface so I prefer Unix domain socket commands which is similar to the current web admin interface. I have a little passion to move admin operations from web interface to Unix domain socket one.

@ofek
Copy link
Contributor

ofek commented Mar 14, 2018

What's the difference between a "pull-based endpoint" like /stats and a REST API version?

@jmarantz
Copy link
Contributor

jmarantz commented Mar 14, 2018

I think it makes sense to control access via configuration, including disabling the http admin interface completely or restricting it to an IP, with separate controls for read-access vs POSTed mutations.

@taiki45
Copy link
Member

taiki45 commented Mar 14, 2018

What's the difference between a "pull-based endpoint" like /stats and a REST API version?

The main difference that I thought is the API one will have a dedicated listener and will be properly schema controlled. For example, /stats already has a json format for programmatic access.

@taiki45
Copy link
Member

taiki45 commented Mar 14, 2018

It's not a strong objection but, to say source IP base restriction or local loopback binding, we want more detailed permission control in some deployment cases: developers can login a host in which Envoy runs but do not want to allow them to take admin operations of the Envoy instance, but want to allow only administrators to do that operations for easy debbuging.

@htuch
Copy link
Member Author

htuch commented Dec 31, 2019

@justincely I think if we had the admin port as a listener, it would be reasonable to add a simple HTTP filter to block this; it's probably entirely doable with something like the RBAC filter today (although probably more complicated than you'd want from a UX perspective).

I don't think anyone is working on this right now, so go ahead. I would recommend moving to admin listener as the first step here.

@mattklein123
Copy link
Member

+1 let's start by just making the admin listener a real listener.

@cstrahan
Copy link
Contributor

cstrahan commented Jan 2, 2020

I think if we had the admin port as a listener, it would be reasonable to add a simple HTTP filter to block this

@justincely and I were talking about this earlier today. I'll be working on this shortly -- just now getting a sense of what all needs to be done to get us there. As I work on this, feel free to assign this issue to me whenever you feel confident in my ability to deliver.

@ofek
Copy link
Contributor

ofek commented Jan 2, 2020

Hello all! For those of us providing monitoring solutions based on /stats, could someone please briefly explain what would need to be changed config-wise to retain access to that endpoint once the proposed feature lands?

@mattklein123
Copy link
Member

Hello all! For those of us providing monitoring solutions based on /stats, could someone please briefly explain what would need to be changed config-wise to retain access to that endpoint once the proposed feature lands?

I think the default behavior is likely to be the same as it is today (fully open), but we will allow for real listener configuration including the RBAC filter, etc. so that certain endpoints can be blocked. It's possible that eventually we would change the default posture but I'm not sure this would happen in the initial version.

@ofek
Copy link
Contributor

ofek commented Jan 3, 2020

Excellent, thanks!

Shikugawa pushed a commit to Shikugawa/envoy that referenced this issue Mar 28, 2020
Signed-off-by: gargnupur <gargnupur@google.com>
@cstrahan
Copy link
Contributor

cstrahan commented May 7, 2020

A proposal to secure the admin endpoint

I was chatting with @mattklein123 a couple weeks ago about securing the admin endpoint, under the assumption that we'd want some way to allow users to specify arbitrary filters (e.g. RBAC).

I would like to propose that we allow specifying a Listener config in the Admin message, and deprecate the Admin fields that can be taken directly from the Listener (e.g. address details).

The AdminFilter would be made a first class filter (registered with just like the other http filters), but we would validate that the AdminFilter is only used within the Admin config, and that the filter is specified last in the filter chain.

I would appreciate feedback from both Envoy users and developers; would this approach work for you?

/cc @justincely

@mattklein123
Copy link
Member

I would appreciate feedback from both Envoy users and developers; would this approach work for you?

Yes I think this SGTM. Thank you for working on this! cc @envoyproxy/security-team

@htuch
Copy link
Member Author

htuch commented Jan 11, 2021

For posterity, I'd like to note some outcome of a recent discussion around why admin endpoint is sometimes opened more widely than it should. Some users are making use of the Prometheus stats endpoint (https://www.envoyproxy.io/docs/envoy/latest/operations/admin.html?highlight=prometheus#get--stats-prometheus) for exposing out Envoy stats. Unfortunately, due to the lack of fine-grained access control (or any for that matter), we end up exposing the entire endpoint out to the network.

We would probably have some reasonable wins for security by having a separate stats and admin endpoint, but ultimately, making this a first class listener would provide RBAC and per-route granularity.

@geo-y
Copy link

geo-y commented Feb 14, 2021

I can use one light-weight web server backend to access the admin interface back with basic authentication.
Here is my example:

docker-compose.yml(key part):

services:
  h2o:
    image: fukata/h2o-php:latest
    volumes:
      - <path>/h2o/:/etc/h2o/ext/
      - <path>/html/:/var/www/
    command: ["h2o", "-m", "master", "-c", "/etc/h2o/ext/h2o.conf"]
    restart: on-failure
  envoy:
    image: envoyproxy/envoy-alpine-dev:latest
    volumes:
      - <path>/envoy/:/etc/envoy/ext/:ro
      - <path>/cert/:/etc/crts/:ro
    ports:
      - "80:8000"
      - "443:8443"
    depends_on:
      - h2o
    command: /usr/local/bin/envoy -c /etc/envoy/ext/front_v3.yaml
    restart: on-failure

envoy(https://github.com/envoyproxy/examples/blob/main/front-proxy/envoy.yaml):

###
            virtual_hosts:
            - name: envoy_admin1
              domains:
              - "<admin-host1>"
              - "<admin-host2>"
              routes:
              - match:
                  prefix: "/"
                route:
                  cluster: h2o1
###
  clusters:
  - name: h2o1
    connect_timeout: 2s
    type: strict_dns
    lb_policy: round_robin
    load_assignment:
      cluster_name: h2o1
      endpoints:
      - lb_endpoints:
        - endpoint:
            address:
              socket_address:
                address: h2o
                port_value: 80
###

h2o.conf:

hosts:
  "<admin-host>":
    listen:
      port: 80
    paths:
      "/":
        mruby.handler: |
          require "htpasswd.rb"
          Htpasswd.new("/etc/h2o/ext/htpass", "realm-name")
        proxy.reverse.url: http://envoy:<admin-port>
        proxy.preserve-host: ON

The "htpasswd" file manages the admin user and password:

htpasswd ./htpass admin-username

@tpetkov-VMW
Copy link
Contributor

In order to have secure settings by default, we have a local patch that adds a list of explicitly allowed endpoints directly into the bootstrap configuration. There are a few things to consider around the design, but the implementation is pretty straight forward and would give some guarantees that nobody can, for example, do /quitquitquit .
Would this be helpful/desired?

@maxres-ch
Copy link

I was wondering if there's any movement on this issue? We'd really like to see it as a configurable thing.

@jmarantz
Copy link
Contributor

#11367 is stale and needs to be re-started with a dev ready to push it forward.

See also #32346 which just merged, and is somewhat related.

@ira-gordin-sap
Copy link

Hi, what happens with this issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.