Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

prep release: v1.27.0-alpha.0 #3564

Merged
merged 2 commits into from
Aug 11, 2023
Merged

Conversation

o0Ignition0o
Copy link
Contributor

Note

When approved, this PR will merge into the 1.27.0-alpha.0 branch which will — upon being approved itself — merge into main.

Things to review in this PR:

  • Changelog correctness (There is a preview below, but it is not necessarily the most up to date. See the Files Changed for the true reality.)
  • Version bumps
  • That it targets the right release branch (1.27.0-alpha.0 in this case!).

🚀 Features

add a metric tracking coprocessor latency (Issue #2924)

Introduces a new metric for the router:

apollo.router.operations.coprocessor.duration

It has one attributes:

coprocessor.stage: string (RouterRequest, RouterResponse, SubgraphRequest, SubgraphResponse)

It is a histogram metric tracking the time spent calling into the coprocessor

By @Geal in #3513

Configure AWS sigv4 authentication for subgraph requests (PR #3365)

Secure your router to subgraph communication on AWS using Signature Version 4 (Sigv4)!
This changeset provides you with a way to set up hardcoded credentials, as well as a default provider chain.
We recommend using the default provider chain configuration.

Full use example:

    authentication:
      subgraph:
        all: # configuration that will apply to all subgraphs
          aws_sig_v4:
            default_chain:
              profile_name: "my-test-profile" # https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/iam-roles-for-amazon-ec2.html#ec2-instance-profile
              region: "us-east-1" # https://docs.aws.amazon.com/general/latest/gr/rande.html
              service_name: "lambda" # https://docs.aws.amazon.com/IAM/latest/UserGuide/reference_aws-services-that-work-with-iam.html
              assume_role: # https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles.html
                role_arn: "test-arn"
                session_name: "test-session"
                external_id: "test-id"
        subgraphs:
          products:
            aws_sig_v4:
              hardcoded: # Not recommended, prefer using default_chain as shown above
                access_key_id: "my-access-key"
                secret_access_key: "my-secret-access-key"
                region: "us-east-1"
                service_name: "vpc-lattice-svcs" # "s3", "lambda" etc.

The full documentation can be found in the router documentation.

By @o0Ignition0o and @BlenderDude in #3365

Helm: add init containers to deployemt (Issue #3248)

This is a new option when starting the router, so that before starting another container runs and performs necessary tasks.

By @laszlorostas in #3444

🐛 Fixes

Require the main (GraphQL) route to shutdown before other routes (Issue #3521)

This changes router execution so that there is more control over the sequencing of server shutdown. In particular, this modifies how different routes are shutdown so that the main (GraphQL) route is shutdown before other routes are shutdown. Prior to this change all routes shut down in parallel and this would mean that, for example, health checks stopped responding prematurely.

This is particularly undesirable when the router is executing in Kubernetes, since continuing to report live/ready checks during shutdown is a requirement.

By @garypen in #3557

Fix redis reconnections (Issue #3045)

The reconnection policy was using an exponential backoff delay with a maximum number of attempts. Once that maximum is reached, reconnection was never tried again (there's no baseline retry). We change that behaviour by adding infinite retries with a maximum delay of 2 seconds, and a timeout of 1 millisecond on redis commands, so that the router can continue serving requests in the meantime.

This commit contains additional fixes:

  • release the lock on the in memory cache while waiting for redis, to let the in memory cache serve other requests
  • add a custom serializer for SubSelectionKey: this type is used as key in a HashMap, which is converted to a JSON object, and object keys must be strings, so a specific serializer is needed instead of the derived one

By @Geal in #3509

Close the subscription when a new schema has been detected during hot reload (Issue #3320)

Router hot reloads on schema updates didn't close running subscriptions, which could imply out of date query plans.
This changeset allows the router to signal clients that a SUBSCRIPTION_SCHEMA_RELOAD happened, and close the running subscription, so the clients can subscribe again:

{
  "errors": [
    {
      "message": "subscription has been closed due to a schema reload",
      "extensions": {
        "code": "SUBSCRIPTION_SCHEMA_RELOAD"
      }
    }
  ]
}

By @bnjjj in #3341

Fix: handle ping/pong websocket messages before the Ack message is received. (PR #3562)

Websocket servers will sometimes send Ping() messages before they Ack the connection initialization. This changeset allows the router to send Pong() messages, while still waiting until either CONNECTION_ACK_TIMEOUT elapsed, or the server successfully Acked the websocket connection start.

By @o0Ignition0o in #3562

Fix the error count for subscription requests for apollo telemetry (PR #3500)

Count subscription requests only if the feature is enabled.

The router would previously count subscription requests regardless of whether the feature is enabled or not. This changeset will only count subscription requests if the feature has been enabled.

By @bnjjj in #3500

🛠 Maintenance

Update datadog-subgraph npm dependencies (PR #3560)

This changeset updates the dd-trace dependency and the nodeJS version of the example Dockerfile.

By @o0Ignition0o in #3560

Remove some panic! calls from the pq code. (PR #3527)

Replace a few panic! calls with expect() in the persisted query code for code clarity.

By @BrynCooke in #3527

Add a warning if we think istio-proxy injection is causing problems (Issue #3533)

We have encountered situations where the injection of istio-proxy in a router pod (executing in Kubernetes) causes networking errors during uplink retrieval.

The root cause is that the router is executing and attempting to retrieve uplink schemas while the istio-proxy is simultaneously modifying network configuration.

This new warning message directs users to information which should help them to configure their Kubernetes cluster or pod to avoid this problem.

By @garypen in #3545

Add a message to the logs indicating when custom plugins are detected and there is a possibility that log entries may be silenced (Issue #3526)

Since #3477, users who have created custom plugins no longer see their log entries.
This is because the default logging filter now restricts log entries to those that are in the apollo module.

Users that have custom plugins will need to configure the logging filter to include their modules, but they may not realise this.

Now, if a custom plugin is detected then a message will be logged to the console indicating that the logging filter may need to be configured.

By @BrynCooke in #3540

Parent based sampling tests (PR #3136)

This adds test for OpenTelemetry sampling defined either in the configuration or in headers carried by the request

By @Geal in #3136

📚 Documentation

Document the Redis URL format (Issue #3534)

The Redis client used in the Router follows a convention on Redis server URLs to indicate TLS, cluster or sentinel usage

By @Geal in #3556

Document the request lifecycle (PR #3391)

This adds in-depth documentation of:

  • the entire request lifecycle
  • which services exist in the router
  • the request and response types they use
  • where plugins can attach themselves

By @Geal @Meschreiber in #3391

document TLS termination and subgraph override (Issue #3100)

TLS termination was added in #2614 but never documented, and subgraph certificate override was added in #2008 but the documentation was missing some details on self signed certificates.

By @Geal in #3436

self is immutable in the Plugin trait's methods (Issue #3539)

The Native customizations section of the docs was wrongly portraying Plugin services taking &mut self while they take &self.
The doc is now up to date.

By @Geal in #3555

@o0Ignition0o o0Ignition0o requested a review from a team as a code owner August 11, 2023 11:39
@o0Ignition0o o0Ignition0o enabled auto-merge (squash) August 11, 2023 11:39
@router-perf
Copy link

router-perf bot commented Aug 11, 2023

CI performance tests

  • xxlarge-request - Stress test with 100 MB request payload
  • step - Basic stress test that steps up the number of users over time
  • events_without_dedup - Stress test for events with a lot of users and deduplication DISABLED
  • events - Stress test for events with a lot of users and deduplication ENABLED
  • large-request - Stress test with a 1 MB request payload
  • const - Basic stress test that runs with a constant number of users
  • no-graphos - Basic stress test, no GraphOS.
  • reload - Reload test over a long period of time at a constant rate of users
  • step-jemalloc-tuning - Clone of the basic stress test for jemalloc tuning
  • xlarge-request - Stress test with 10 MB request payload
  • events_big_cap_high_rate - Stress test for events with a lot of users, deduplication enabled and high rate event with a big queue capacity

@o0Ignition0o o0Ignition0o merged commit 83586e5 into 1.27.0-alpha.0 Aug 11, 2023
1 check passed
@o0Ignition0o o0Ignition0o deleted the prep-1.27.0-alpha.0 branch August 11, 2023 12:09
@abernix abernix added release and removed release labels Sep 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants