Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: preview_persisted_queries w/opt-in safelisting #3347

Merged
merged 20 commits into from
Jul 14, 2023

Conversation

EverlastingBugstopper
Copy link
Contributor

@EverlastingBugstopper EverlastingBugstopper commented Jun 30, 2023

Persisted Queries w/opt-in safelisting

⚠️ This is an Enterprise feature of the Apollo Router. It requires an organization with a GraphOS Enterprise plan and the feature to be enabled for your account.

If your organization doesn't currently have an Enterprise plan, you can test out this functionality by signing up for a free Enterprise trial and reaching out to enable the feature for your account.

Overview

The persisted queries feature allows you to pre-register operations, allowing clients to send an operation ID over the wire and execute the associated operation. Each operation defines the exact shape of a GraphQL operation that the router expects clients to send. In its simplest form, Persisted Queries (PQ’s) can be used like Automatic Persisted Queries (APQ’s) with one key difference: sending an operation body is never allowed for a PQ. Registering persisted operations allows locking down the router to log unregistered operations, or to reject them outright.

Main Configurations

  • Unregistered operation monitoring
    • Your router can allow all GraphQL operations, while emitting structured traces containing unregistered operation bodies.
  • Operation safelisting
    • Reject unregistered operations
    • Require all operations to be sent as an ID

Usage

preview_persisted_queries:
  enabled: true

This enables additive PQs.

Requires APOLLO_KEY and APOLLO_GRAPH_REF to start up properly (to fetch the license key and the persisted queries themselves), and the graph variant must be linked to a persisted query list. This is only available in preview right now and has to be enabled for a graph.

The router will not start up until all persisted queries have been read into a std::collections::HashMap<String, String> mapping ID to their body. Additionally, just the bodies are stored in a std::collections::HashSet.

After the router starts, persisted queries can be sent over the wire like so:

curl http://localhost:4000/ -X POST --json \
'{"extensions":{"persistedQuery":{"version":1,"sha256Hash":"dc67510fb4289672bea757e862d6b00e83db5d3cbbcfb15260601b6f29bb2b8f"}}}'
  1. ./examples/persisted-queries/safelist_pq_log_only.yaml
preview_persisted_queries:
  enabled: true
  log_unpersisted_queries: true

Starting the router with this configuration logs freeform GraphQL operations that do not match a persisted query.

  1. ./examples/persisted-queries/safelist_pq.yaml
preview_persisted_queries:
  enabled: true
  safelist:
    enabled: true
apq:
  enabled: false

Starting the router with this configuration will require all operations sent over the wire to match either the ID (O(1) retrieval from HashMap) or the body (O(1) retrieval from HashSet). APQ is enabled by default, and is incompatible with the persisted queries feature (clients are not allowed to register their own persisted queries, they must be pre-published), therefore it must be disabled to start properly. An error is returned if APQ is not explicitly disabled in router.yaml.

  1. ./examples/persisted-queries/safelist_pq_require_id.yaml
preview_persisted_queries:
  enabled: true
  safelist:
    enabled: true
    require_id: true
apq:
  enabled: false

This configuration is a stricter version of safelisting that rejects all freeform GraphQL requests, even if they match the body of a persisted query.

@router-perf
Copy link

router-perf bot commented Jun 30, 2023

CI performance tests

  • events_without_dedup - Stress test for events with a lot of users and deduplication DISABLED
  • const - Basic stress test that runs with a constant number of users
  • no-graphos - Basic stress test, no GraphOS.
  • events_big_cap_high_rate - Stress test for events with a lot of users, deduplication enabled and high rate event with a big queue capacity
  • events - Stress test for events with a lot of users and deduplication ENABLED
  • xxlarge-request - Stress test with 100 MB request payload
  • reload - Reload test over a long period of time at a constant rate of users
  • xlarge-request - Stress test with 10 MB request payload
  • step - Basic stress test that steps up the number of users over time
  • large-request - Stress test with a 1 MB request payload

@github-actions

This comment has been minimized.

> ⚠️ **This is an [Enterprise feature](https://www.apollographql.com/blog/platform/evaluating-apollo-router-understanding-free-and-open-vs-commercial-features/) of the Apollo Router.** It requires an organization with a [GraphOS Enterprise plan](https://www.apollographql.com/pricing/) and the feature to be enabled for your account.
>
> If your organization _doesn't_ currently have an Enterprise plan, you can test out this functionality by signing up for a free [Enterprise trial](https://www.apollographql.com/docs/graphos/org/plans/#enterprise-trials) and reaching out to enable the feature for your account.

The persisted queries feature allows you to pre-register operations, allowing clients to send an operation ID over the wire and execute the associated operation. Each operation defines the exact shape of a GraphQL operation that the router expects clients to send. In its simplest form, Persisted Queries (PQ’s) can be used like Automatic Persisted Queries (APQ’s) with one key difference: sending an operation body is never allowed for a PQ. Registering persisted operations allows locking down the router to log unregistered operations, or to reject them outright.

* **Unregistered operation monitoring**
  * Your router can allow all GraphQL operations, while emitting structured traces containing unregistered operation bodies.
* **Operation safelisting**
  * Reject unregistered operations
  * Require all operations to be sent as an ID

```yaml title="router.yaml"
preview_persisted_queries:
  enabled: true
```

This enables additive PQs.

Requires `APOLLO_KEY` and `APOLLO_GRAPH_REF` to start up properly (to fetch the license key and the persisted queries themselves), and the graph variant must be linked to a persisted query list. This is only available in preview right now and has to be enabled for a graph.

The router will not start up until all persisted queries have been read into a `std::collections::HashMap<String, String>` mapping ID to their body. Additionally, just the bodies are stored in a `std::collections::HashSet`.

After the router starts, persisted queries can be sent over the wire like so:

```sh
curl http://localhost:4000/ -X POST --json \
'{"extensions":{"persistedQuery":{"version":1,"sha256Hash":"dc67510fb4289672bea757e862d6b00e83db5d3cbbcfb15260601b6f29bb2b8f"}}}'
```

2) [./examples/persisted-queries/safelist_pq_log_only.yaml](https://github.com/apollographql/router/raw/avery/persisted-queries/examples/persisted-queries/safelist_pq_log_only.yaml)

```yaml title="router.yaml"
preview_persisted_queries:
  enabled: true
  log_unpersisted_queries: true
```

Starting the router with this configuration logs freeform GraphQL operations that do not match a persisted query.

3) [./examples/persisted-queries/safelist_pq.yaml](https://github.com/apollographql/router/raw/avery/persisted-queries/examples/persisted-queries/safelist_pq.yaml)

```yaml title="router.yaml"
preview_persisted_queries:
  enabled: true
  safelist:
    enabled: true
apq:
  enabled: false
```

Starting the router with this configuration will require all operations sent over the wire to match either the ID (O(1) retrieval from `HashMap`) or the body (O(1) retrieval from `HashSet`). APQ is enabled by default, and is incompatible with the persisted queries feature (clients are not allowed to register their own persisted queries, they must be pre-published), therefore it must be disabled to start properly. An error is returned if APQ is not explicitly disabled in `router.yaml`.

4) [./examples/persisted-queries/safelist_pq_require_id.yaml](https://github.com/apollographql/router/raw/avery/persisted-queries/examples/persisted-queries/safelist_pq_require_id.yaml)

```yaml title="router.yaml"
preview_persisted_queries:
  enabled: true
  safelist:
    enabled: true
    require_id: true
apq:
  enabled: false
```

This configuration is a stricter version of safelisting that rejects all freeform GraphQL requests, even if they match the body of a persisted query.
@EverlastingBugstopper EverlastingBugstopper requested a review from a team as a code owner July 12, 2023 15:00
Copy link
Contributor

@o0Ignition0o o0Ignition0o left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Preapproving, I still have two questions, one around an assert! and the other one about the snapshot.

Great work! 🚀

@@ -71,7 +71,7 @@ mod router_factory;
pub mod services;
pub(crate) mod spec;
mod state_machine;
mod test_harness;
pub mod test_harness;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this intentional? iirc we d rather pub use (like at line 92 for example)

Copy link
Contributor Author

@EverlastingBugstopper EverlastingBugstopper Jul 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I import the mocks with use crate::test_harness::mocks::persisted_queries::*; and kind of like that structure. I don't want to expose individual mock structs for something specific to the PQ layer.

apollo-router/Cargo.toml Show resolved Hide resolved
Comment on lines 10 to 13
"message": "couldn't find mock for query {\"query\":\"{computer(id:\\\"Computer1\\\"){id errorField}}\"}",
"extensions": {
"code": "FETCH_ERROR"
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is what is expected.
This often means the query planner was updated and the mock was used to work with {errorField id}

⚠️ i think this needs to be addressed before merging

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree, not sure how this made it in - thought i fixed that up.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If possible can we make the test panic if it detects a missing mock?

Copy link
Contributor Author

@EverlastingBugstopper EverlastingBugstopper Jul 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what do you mean? this file is just around because of the router-bridge update and a local test i ran w/this file accidentally committed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the snapshot contains couldn't find mock for query it should hard fail the test.

@BrynCooke
Copy link
Contributor

I'm going to fix the remaining test failures, make some minor modifications and then merge this.

@o0Ignition0o
Copy link
Contributor

I'm going to bypass branch protections and merge this so we can cut an alpha release today, for testing purposes.

@o0Ignition0o o0Ignition0o merged commit cffd6a4 into dev Jul 14, 2023
@o0Ignition0o o0Ignition0o deleted the avery/persisted-queries branch July 14, 2023 18:26
@glasser
Copy link
Member

glasser commented Jul 14, 2023

(The only branch protection bypassed was the Performance Tests.)

o0Ignition0o pushed a commit that referenced this pull request Jul 14, 2023
> **Note**
>
> When approved, this PR will merge into **the `1.25.0-alpha.0` branch**
which will — upon being approved itself — merge into `main`.
>
> **Things to review in this PR**:
> - Changelog correctness (There is a preview below, but it is not
necessarily the most up to date. See the _Files Changed_ for the true
reality.)
>  - Version bumps
> - That it targets the right release branch (`1.25.0-alpha.0` in this
case!).
>
---
## 🚀 Features

### feat: `preview_persisted_queries` w/opt-in safelisting ([PR
#3347](#3347))

> ⚠️ **This is an [Enterprise
feature](https://www.apollographql.com/blog/platform/evaluating-apollo-router-understanding-free-and-open-vs-commercial-features/)
of the Apollo Router.** It requires an organization with a [GraphOS
Enterprise plan](https://www.apollographql.com/pricing/) and the feature
to be enabled for your account.
>
> If your organization _doesn't_ currently have an Enterprise plan, you
can test out this functionality by signing up for a free [Enterprise
trial](https://www.apollographql.com/docs/graphos/org/plans/#enterprise-trials)
and reaching out to enable the feature for your account.

#### Overview

The persisted queries feature allows you to pre-register operations,
allowing clients to send an operation ID over the wire and execute the
associated operation. Each operation defines the exact shape of a
GraphQL operation that the router expects clients to send. In its
simplest form, Persisted Queries (PQ’s) can be used like Automatic
Persisted Queries (APQ’s) with one key difference: sending an operation
body is never allowed for a PQ. Registering persisted operations allows
locking down the router to log unregistered operations, or to reject
them outright.

#### Main Configurations

* **Unregistered operation monitoring**
* Your router can allow all GraphQL operations, while emitting
structured traces containing unregistered operation bodies.
* **Operation safelisting**
  * Reject unregistered operations
  * Require all operations to be sent as an ID

#### Usage

```yaml title="router.yaml"
preview_persisted_queries:
  enabled: true
```

This enables additive PQs.

Requires `APOLLO_KEY` and `APOLLO_GRAPH_REF` to start up properly (to
fetch the license key and the persisted queries themselves), and the
graph variant must be linked to a persisted query list. This is only
available in preview right now and has to be enabled for a graph.

To create a persisted query list and link it to your graph, see our
[mock
docs](https://docs.google.com/document/d/16EcmcbjmwLfDfAhpMWdF9bHPG8kZ38htXKL-ozVPOUQ/edit#heading=h.r8r7mfcvvw4f),
it walks you through enabling the preview feature for your graph,
creating a persisted query list, and publishing operations to it from
Rover.

The router will not start up until all persisted queries have been read
into a `std::collections::HashMap<String, String>` mapping ID to their
body. Additionally, just the bodies are stored in a
`std::collections::HashSet`.

After the router starts, persisted queries can be sent over the wire
like so:

```sh
curl http://localhost:4000/ -X POST --json \
'{"extensions":{"persistedQuery":{"version":1,"sha256Hash":"dc67510fb4289672bea757e862d6b00e83db5d3cbbcfb15260601b6f29bb2b8f"}}}'
```

2)
[./examples/persisted-queries/safelist_pq_log_only.yaml](https://github.com/apollographql/router/raw/avery/persisted-queries/examples/persisted-queries/safelist_pq_log_only.yaml)

```yaml title="router.yaml"
preview_persisted_queries:
  enabled: true
  log_unpersisted_queries: true
```

Starting the router with this configuration logs freeform GraphQL
operations that do not match a persisted query.

3)
[./examples/persisted-queries/safelist_pq.yaml](https://github.com/apollographql/router/raw/avery/persisted-queries/examples/persisted-queries/safelist_pq.yaml)

```yaml title="router.yaml"
preview_persisted_queries:
  enabled: true
  safelist:
    enabled: true
apq:
  enabled: false
```

Starting the router with this configuration will require all operations
sent over the wire to match either the ID (O(1) retrieval from
`HashMap`) or the body (O(1) retrieval from `HashSet`). APQ is enabled
by default, and is incompatible with the persisted queries feature
(clients are not allowed to register their own persisted queries, they
must be pre-published), therefore it must be disabled to start properly.
An error is returned if APQ is not explicitly disabled in `router.yaml`.

4)
[./examples/persisted-queries/safelist_pq_require_id.yaml](https://github.com/apollographql/router/raw/avery/persisted-queries/examples/persisted-queries/safelist_pq_require_id.yaml)

```yaml title="router.yaml"
preview_persisted_queries:
  enabled: true
  safelist:
    enabled: true
    require_id: true
apq:
  enabled: false
```

This configuration is a stricter version of safelisting that rejects all
freeform GraphQL requests, even if they match the body of a persisted
query.

By [@EverlastingBugstopper](https://github.com/EverlastingBugstopper) in
#3347

## 🐛 Fixes

### Enforce default buckets for metrics ([PR
#3432](#3432))

When you haven't any `telemetry.metrics.common` configuration the
default buckets were wrong and you had no buckets at all. With this fix
by default it set these buckets: [0.001, 0.005, 0.015, 0.05, 0.1, 0.2,
0.3, 0.4, 0.5, 1.0, 5.0, 10.0]

By [@bnjjj](https://github.com/bnjjj) in
#3432

## 🛠 Maintenance

### Coprocessor: Set a default pool idle timeout duration. ([PR
#3434](#3434))

Having a too high idle pool timeout durations can sometimes trigger
situations in which an HTTP request cannot complete (see [this
comment](hyperium/hyper#2136 (comment))
for more information).

This changeset sets a default timeout duration of 5 seconds, which we
may make configurable eventually.

By [@o0Ignition0o](https://github.com/o0Ignition0o) in
#3434
@BrynCooke BrynCooke mentioned this pull request Jul 19, 2023
BrynCooke added a commit that referenced this pull request Jul 20, 2023
> **Note**
>
> When approved, this PR will merge into **the `1.25.0` branch** which
will — upon being approved itself — merge into `main`.
>
> **Things to review in this PR**:
> - Changelog correctness (There is a preview below, but it is not
necessarily the most up to date. See the _Files Changed_ for the true
reality.)
>  - Version bumps
>  - That it targets the right release branch (`1.25.0` in this case!).
>
---
## 🚀 Features

### Persisted Queries w/opt-in safelisting (preview) ([PR
#3347](#3347))

> ⚠️ **Persisted queries is an [Enterprise
feature](https://www.apollographql.com/blog/platform/evaluating-apollo-router-understanding-free-and-open-vs-commercial-features/)
of the Apollo Router.** It requires an organization with a [GraphOS
Enterprise plan](https://www.apollographql.com/pricing/) and the feature
to be enabled for your account.
>
> If your organization _doesn't_ currently have an Enterprise plan, you
can test out this functionality by signing up for a free [Enterprise
trial](https://www.apollographql.com/docs/graphos/org/plans/#enterprise-trials)
and reaching out to enable the feature for your account.

Persisted Queries gives you the tools to prevent unwanted traffic from
reaching your graph.

It has two modes of operation:
* **Unregistered operation monitoring**
* Your router can allow all GraphQL operations, while emitting
structured traces containing unregistered operation bodies.
* **Operation safelisting**
  * Reject unregistered operations
  * Require all operations to be sent as an ID

Unlike automatic persisted queries (APQ), the ability to create a
safelist of operations allows you to prevent a malicious actor from
constructing a free-format query that could overload your subgraphh
services.

For more information con how to register queries and configure your
router see the [Persisted Query
documentation](https://www.apollographql.com/docs/graphos/routing/persisted-queries).

By [@EverlastingBugstopper](https://github.com/EverlastingBugstopper) in
#3347

## 🐛 Fixes

### Fix prometheus statistics issues with _total_total names([Issue
#3443](#3443))

When producing prometheus statistics the otel crate (0.19.0) now
automatically appends `_total` which is unhelpful.

This fix removes `_total_total` from our statistics. However, counter
metrics will still have `_total` appended to them if they did not so
already.

By [@garypen](https://github.com/garypen) in
#3471

### Enforce default buckets for metrics ([PR
#3432](#3432))

When `telemetry.metrics.common` was not configured, no default metrics
buckets were configured.
With this fix by default it set these buckets: `[0.001, 0.005, 0.015,
0.05, 0.1, 0.2, 0.3, 0.4, 0.5, 1.0, 5.0, 10.0]`

By [@bnjjj](https://github.com/bnjjj) in
#3432

## 📃 Configuration

### Add `subscription.enabled` field to enable subscription support
([Issue #3428](#3428))

`enabled` is now required in `subscription` configuration. Example:

```yaml
subscription:
  enabled: true
  mode:
    passthrough:
      all:
        path: /ws
```

By [@bnjjj](https://github.com/bnjjj) in
#3450

### Add option to disable reuse of query fragments ([Issue
#3452](#3452))

A new option has been added to the Router to allow disabling of the
reuse of query fragments. This is useful for debugging purposes.
```yaml
supergraph:
  experimental_reuse_query_fragments: false
```

The default value depends on the version of federation.

By [@BrynCooke](https://github.com/BrynCooke) in
#3453

## 🛠 Maintenance

### Coprocessor: Set a default pool idle timeout duration. ([PR
#3434](#3434))

The default idle pool timeout duration in Hyper can sometimes trigger
situations in which an HTTP request cannot complete (see [this
comment](hyperium/hyper#2136 (comment))
for more information).

This changeset sets a default timeout duration of 5 seconds.

By [@o0Ignition0o](https://github.com/o0Ignition0o) in
#3434

---------

Co-authored-by: bryn <bryn@apollographql.com>
Co-authored-by: Chandrika Srinivasan <chandrikas@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants