Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question] - (route) metadata vs dynamic metadata #12556

Closed
jeremybaumont opened this issue Aug 8, 2020 · 10 comments · Fixed by #13269
Closed

[Question] - (route) metadata vs dynamic metadata #12556

jeremybaumont opened this issue Aug 8, 2020 · 10 comments · Fixed by #13269
Labels
question Questions that are neither investigations, bugs, nor enhancements

Comments

@jeremybaumont
Copy link

Context

We were reviewing the change log of v1.15.0 and we noted some changes that could improve our rate limiting setup. We decided to spike one the new feature, and make sure we understand it correctly:

  • ratelimit: added support for use of dynamic metadata dynamic_metadata as a ratelimit action.

When we first read that line and the documentation linked, we though that it would make our life much more easier. Let me explain succinctly what we mean by that.

We are mapping a path of a request to an operation identifier for reasons (think for example if your request calls with PUT "/admin/user" path, we want to associate it to the operation identifier "create-user"), and then we use this operation identifier in Envoy’s rate limit filter.

To configure that we are doing quite a “pasodoble”. We add temporally, a header with the operation identifier for the route that matches the path. This header will be matched by the rate limit action (header matcher) configuration, and passed as a descriptor key to the gRPC call of the rate limit service. Now to add the header that contains the operation identifier. We use HTTP Lua filters to add and remove an HTTP header on the request. So when we saw that change, we though we could simplify our configuration. Joy was in our heart.

We though we could set metadata at the route level with filter_metadata, and use this metadata to fill the operation descriptor key with the dynamic_metadata rate limit action like the following:

                  virtual_hosts:
                    - name: backend
                      domains:
                        - "*"
                      rate_limits:
                        - actions:
                            dynamic_metadata:
                              descriptor_key: "operation"
                              metadata_key:
                                key: "envoy.filters.http.ratelimit"
                                path:
                                  - key: "descriptor"
                                  - key: "operation"
                        - actions:
                            request_headers:
                              header_name: "x-auth-client"
                              descriptor_key: "client"
                      routes:
                        - name: example-operation-1
                          match:
                            prefix: "/anything/shouldRateLimit"
                          metadata:
                            filter_metadata:
                              envoy.filters.http.ratelimit:
                                descriptor:
                                  operation: "operation-1"
                          route:
                            cluster: "httpbin"
                            include_vh_rate_limits: true
                        - name: default-route
                          match:
                            prefix: "/"
                          route:
                            cluster: httpbin
                            include_vh_rate_limits: false

We tested this rate limit configuration approach on a vanilla Envoy setup, eliminating anything related to our internal setup. We use some Envoy use case examples repository to “learn Envoy's core feature set and how to configure it by hand”. You can find the envoy configuration corresponding to the extract above, and a small docker-compose repository to reproduce what we mean.

At our surprised the description key "operation" was not filled when Envoy rate limit filter was calling the rate limit service mock.

In a previous troubleshooting investigation, we learn to successfully setup remote gdb debugging of an Envoy binary with debug symbol in our local docker-compose world. We decided to have a look at what was happening in Envoy code base in a live debugging session.

In source/common/router/router_ratelimit.cc, we could displayed the values of the configuration of the rate limit action in metadata_key_ and the expected metadata of filter_metadata in dynamic_metadata. We observed that no metadata was present in dynamic_metadata (it was empty). Something was wrong about how we configured the filter_metadata.

Our next step in our troubleshooting was to have a look at the integration tests of Envoy code base, and how they configure dynamic metadata. We find this is often a good place to look since it uses some YAML configuration samples. We could not find anything obvious that we were doing wrong, literally copying and pasting the YAML used there. We think that those integration tests tricked us, they are using filter_metadata, but we think they should not, more on this later.

Our last procrastination, before asking for help publicly, was to review Envoy slack channel history and prior GitHub issues. That’s the minimum you can do. You don’t want to waste people’s time. That’s when we found a crucial clue to this story. We found a discussion in Envoy slack that was referring to this GitHub issue. Some user had problem with metadata set by filter_metadata in a route configuration that was not to be found in the access log filter. Something almost identical to our scenario but with the rate limit filter instead of access log filter. Especially the comments on the GitHub issue triggers the idea that route metadata is potentially different than dynamic metadata.

We think that the route metadata set by filter_metadata are different "data" than the dynamic metadata. They are different abstraction. We understand that route metadata is used, for example, by Lua HTTP filters. We knew that for sure because we already using it in our approach that add and remove operation header with Lua filters. If we are correct, the documentation is not really clear with the distinction (and we are not alone to be confused). We found the best explanation resided in the HTTP Lua filters section, they are the rare place where you can set both route metadata and dynamic metadata, so the separation is flagrant.

But the change adds support for dynamic metadata as a rate limit action, not route metadata. Fair enough, how can we set dynamic metadata for a route, we were asking our-self. From the comments of Dio in the GitHub issue, we understand there are several ways: we can implement an Envoy filter that will match a route and use the StreamInfo API, use the header to metadata filter (but that would mean we would need again header, that is not a big win for us, remember we are trying to get ride of that), or use directly HTTP Lua filters (same, not a big win).

At this point, we were deeply confused about this dynamic metadata. Even more confused by the change, why would somebody add support of dynamic metadata as rate limit action, because in our mind, most of the action in Envoy are driven by a HTTP route. That did not make sense. We were biased obviously. We decided to take a step back. Have a little walk, and brew some coffee. We asked our-self, what is really dynamic metadata? For which purposes people use dynamic metadata?

We looked at the PR that bring the change, and found that it originated from this proposal. When we read the description, we understood the intent of the change. The users wanted to pass information directly from the dynamic metadata to the rate limit action. They wanted to avoid using a Lua filter that will read the dynamic metadata and append it to a forged header that will be matched by a rate limit action. The key element there is they have already the information in the dynamic metadata. No mention of route here to set some dynamic metadata.

Dynamic metadata is actually a great idea (you folks rocks). We found this advanced section in the documentation that completed our understanding. We understand that filters can emit dynamic metadata that are consumed by other filters. Take for example, the network Postgres filter. It will set the table names and the SQL operations as dynamic metadata, seen in the SQL statement. That is fantastic, now you can have for example a rate limit action that will found the table name in the dynamic metadata and you can have a rate limit based on this descriptor key.

Question

Sorry for the long context setup, but we think it helps to understand where we come from. Our question is more or less to confirm or infirm our understanding detailed in the previous paragraphs of what is dynamic metadata? And the difference between (route) metadata and dynamic metadata?

@mattklein123 mattklein123 added the question Questions that are neither investigations, bugs, nor enhancements label Aug 10, 2020
@mattklein123
Copy link
Member

cc @clarakosi @Pchelolo

@Pchelolo
Copy link
Contributor

Hi @jeremybaumont

You are correct in your understanding. dynamic_metadata != filter_metadata. They reuse the same structure internally, thus the confusing integration test. In my understanding the difference is that filter_metadata is configured statically while the dynamic_metadata can be set/modified within each request context by the filters.

Our original goal was to implement support for dynamic_metadata since in our use-case we are getting the descriptors our of JWT token and JWT Authentication filter, which sets the JWT payload to the dynamic metadata.

In your case, what prevents you from configuring the rate_limits on the route level instead of the virtual host level? That way you would be able to just use the GenericKey action and configure the operation-id directly without passing it around via the metadata? Although, this has the downside of being much more verbose.

It seems pretty easy to implement a filter_metadata action type, lay implementing a new RateLimitAction, filter_metadata following the existing MetaDataKey schema:

RateLimitAction.proto:

    // The following descriptor entry is appended when the filter metadata contains a key value:
    //
    // .. code-block:: cpp
    //
    //   ("<descriptor_key>", "<value_queried_from_metadata>")
    message FilterMetaData {
      // The key to use in the descriptor entry.
      string descriptor_key = 1 [(validate.rules).string = {min_bytes: 1}];

      // Metadata struct that defines the key and path to retrieve the string value.
      type.metadata.v3.MetadataKey metadata_key = 2 [(validate.rules).message = {required: true}];

      // An optional value to use if *metadata_key* is empty. If not set and
      // no value is present under the metadata_key then no descriptor is generated.
      string default_value = 3;
    }
 oneof action_specifier {
   ...
   FilterMetaData filter_metadata = 8;
}

Usage example:

actions:
  - { filter_metadata: { key: “…”, path: [] } }

This would make it possible to pass information about routes to a virtual-host level rate-limit configuration.

@jeremybaumont
Copy link
Author

jeremybaumont commented Aug 11, 2020

Hey @Pchelolo

Thank you for the quick answer, we really appreciate 😄

what prevents you from configuring the rate_limits on the route level instead of the virtual host level?
That way you would be able to just use the GenericKey action and configure the operation-id directly without passing it around via the metadata?

Yes we have considered that at the beginning. The problem with GenericKey rate limit action is that we cannot choose the descriptor key name. We would need to ask the rate limit service team to support the mapping between the descriptor key "generic_key" and "operation" (in the gRPC service but also in the rate limit policies definitions services.). It could be also a bit confusing when you troubleshoot issues if you are not aware of the mapping. Nevertheless, I agree that is not the end of the world and that is a valid approach.

We could add a functionality to the GenericKey rate limit action to be able to set the descriptor key, and by default it uses "generic_key" for backward compatibility.

It seems pretty easy to implement a filter_metadata action type, lay implementing a new RateLimitAction, filter_metadata following the existing MetaDataKey schema.

Yes, that is a good idea, I agree. I am a wannabe C++ programmer 😄 but that could be a good way to learn, that seems feasible. I'll pursue this option.

@andrascz
Copy link
Contributor

I am trying to do the same as the OP doing and was struggling for a couple of days to achieve it setting the route metadata and using the dynamic metadata action for rate limiting. Thank you @jeremybaumont for writing this long story about it to point me in the Lua filter way. 👍

Also I would like to add that implementing a FilterMetadataAction as @Pchelolo suggested would simplify this use case even more by eliminating the Lua filter.

For us the route specific actions would be really verbose and brittle as we have many VirtualServices in Istio which should produce the same composit descriptor structure: API+path+method+clientIP where API is specific to the VirtualService. So for every route we would need to provide the same actions list, where the only difference is the API value.

@draeron
Copy link

draeron commented Dec 1, 2020

I've been trying to setup an override limit but i can't seems to find anyplace where it possible configure dynamic metadata, even statically. I don't quite understand why you're forced to pass through this very new and under documented dynamic metadata for something that should be more easily configurable.

This is what my cfg looks like. Maybe i missed something...

static_resources:
  listeners:
  - name: listener_0
    address:
      socket_address:
        address: 0.0.0.0
        port_value: 10000
    filter_chains:
    - filters:
      - name: envoy.http_connection_managerenvoy.filters.http.router
        typed_config:
          "@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
          codec_type: auto
          stat_prefix: envoy_ratelimit_http
          http_filters:
            - name: envoy.filters.http.ratelimit
              typed_config:
                "@type": type.googleapis.com/envoy.extensions.filters.http.ratelimit.v3.RateLimit
                domain: default
                timeout: 0.02s # 20 ms
                rate_limited_as_resource_exhausted: true
                failure_mode_deny: true
                enable_x_ratelimit_headers: DRAFT_VERSION_03
                rate_limit_service:
                  transport_api_version: v3
                  grpc_service:
                    envoy_grpc:
                      cluster_name: ratelimit
            - name: envoy.filters.http.router
          route_config:
            name: default_route
            # config.route.v3.RouteConfiguration
            virtual_hosts:
            # config.route.v3.VirtualHost
            - name: default
              domains: ["*"]
              routes:
              # config.route.v3.Route
              - match:
                  prefix: "/"
                metadata:
                  filter_metadata:
                    envoy.filters.http.ratelimit.override:
                      limit:
                        unit: HOUR
                        requests_per_unit: 42
                route: # config.route.v3.RouteAction
                  cluster: upstream
                  rate_limits:
                  # config.route.v3.RateLimit
                  - actions: # config.route.v3.RateLimit.Action[]
                    - destination_cluster: {}
                    - remote_address: {}
                    - generic_key:
                        descriptor_key: hardcoded
                        descriptor_value: hard-value
                  # config.route.v3.RateLimit
                  - actions:
                    - destination_cluster: {}
                    - generic_key:
                        descriptor_key: max_rate
                        descriptor_value: 'true'
                    limit:
                      dynamic_metadata:
                        metadata_key:
                          key: envoy.filters.http.ratelimit.override
                          path:
                          - key: limit
  clusters:
  - name: upstream
    connect_timeout: 5s
    type: STRICT_DNS
    lb_policy: ROUND_ROBIN
    load_assignment:
      cluster_name: hello-secret
      endpoints:
      - lb_endpoints:
        - endpoint:
            address:
              socket_address:
                address: hello-secret
                port_value: 8080

  - name: ratelimit
    type: STRICT_DNS
    connect_timeout: 1s
    lb_policy: ROUND_ROBIN
    protocol_selection: USE_CONFIGURED_PROTOCOL
    http2_protocol_options: {}
    load_assignment:
      cluster_name: ratelimit
      endpoints:
        - lb_endpoints:
            - endpoint:
                address:
                  socket_address:
                    address: hermes
                    port_value: 9090

@andrascz
Copy link
Contributor

andrascz commented Dec 1, 2020

You are defining your limits in the route metadata section, but that is different from dynamic_metadata. You need a Lua filter before your rate limit filter to fetch the route metadata and inject it as dynamic metadata for the request.

Something similar to the following. We are using it to provide input for actions, but the concept is the same. It is a workaround until the generic MetadataAction is not released:

        - name: envoy.filters.http.lua
          typed_config:
            "@type": "type.googleapis.com/envoy.extensions.filters.http.lua.v3.Lua"
            inline_code: |
              function envoy_on_request(request_handle)
                  local api_group = request_handle:metadata():get("api_group");
                  api_group = (api_group == nil) and 'undefined' or api_group;
                  request_handle:logDebug(" api_group filter metadata: " .. api_group);
                  request_handle:streamInfo():dynamicMetadata():set("envoy.rate_limit", "api_group", api_group);
              end

Probably the override configuration could be enhanced similarly how the generic MetadataAction was introduced in favor of the DynamicMetadataAction.

@draeron
Copy link

draeron commented Dec 1, 2020

Thanks a lot! I succeeded to make it work by adding this in my filters before my rate limit, pasting for posterity. Might i suggest a new feature for either a new filter type to set some dynamic metadata or to enable access to filter_metadata in the limit override?

- name: envoy.filters.http.lua
  typed_config:
    "@type": type.googleapis.com/envoy.extensions.filters.http.lua.v3.Lua
    inline_code: |
      function envoy_on_request(request_handle)
        request_handle:streamInfo():dynamicMetadata():set("envoy.filters.http.ratelimit.override", "limit", {
          unit = "HOUR",
          requests_per_unit = 42
        })
      end

@andrascz
Copy link
Contributor

andrascz commented Dec 1, 2020

Great that the workaround works for you!

to enable access to filter_metadata in the limit override

Probably the override configuration could be enhanced similarly how the generic MetadataAction was introduced in favor of the DynamicMetadataAction.

My proposal is exactly the suggestion you made, just in a bit more formalized form.

@mauro770
Copy link

This post is probably one of the most useful and well explained GH issues I have read in a long time. Thank you all and specially @jeremybaumont for sharing it with the world.

@pjjwpc
Copy link

pjjwpc commented Nov 24, 2023

多谢!我成功地使其工作,通过在我的速率限制之前将其添加到我的过滤器中,粘贴给后代。我是否可以建议新过滤器类型的新功能来设置一些动态元数据或启用filter_metadata限制覆盖中的访问?

- name: envoy.filters.http.lua
  typed_config:
    "@type": type.googleapis.com/envoy.extensions.filters.http.lua.v3.Lua
    inline_code: |
      function envoy_on_request(request_handle)
        request_handle:streamInfo():dynamicMetadata():set("envoy.filters.http.ratelimit.override", "limit", {
          unit = "HOUR",
          requests_per_unit = 42
        })
      end

How to set different requests_per_unit for different services.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Questions that are neither investigations, bugs, nor enhancements
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants