Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add a warning if we think istio-proxy injection is causing problems #3545

Merged
merged 12 commits into from
Aug 9, 2023
Merged
9 changes: 9 additions & 0 deletions .changesets/maint_garypen_3533_istio_warn.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
### Add a warning if we think istio-proxy injection is causing problems ([Issue #3533](https://github.com/apollographql/router/issues/3533))

We have encountered situations where the injection of istio-proxy in a router pod (executing in kubernetes) causes strange networking errors during uplink retrieval.
garypen marked this conversation as resolved.
Show resolved Hide resolved

The root cause of the issue is that the router is executing and attempting to retrieve uplink schemas whilst the istio-proxy is modifying network configuration at the same time.
garypen marked this conversation as resolved.
Show resolved Hide resolved

This new warning message will direct users to information which should help them to configure their kubernetes cluster or pod to avoid this problem.
garypen marked this conversation as resolved.
Show resolved Hide resolved

By [@garypen](https://github.com/garypen) in https://github.com/apollographql/router/pull/3545
23 changes: 22 additions & 1 deletion apollo-router/src/uplink/mod.rs
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
use std::error::Error as stdError;
use std::fmt::Debug;
use std::time::Duration;
use std::time::Instant;
Expand Down Expand Up @@ -359,7 +360,27 @@ where
Query: graphql_client::GraphQLQuery,
{
let client = reqwest::Client::builder().timeout(timeout).build()?;
let res = client.post(url).json(request_body).send().await?;
// It is possible that istio-proxy is re-configuring networking beneath us. If it is, we'll see an error something like this:
// level: "ERROR"
// message: "fetch failed from all endpoints"
// target: "apollo_router::router::event::schema"
// timestamp: "2023-08-01T10:40:28.831196Z"
// That's deeply confusing and very hard to debug. Let's try to help by printing out a helpful error message here
let res = client
.post(url)
.json(request_body)
.send()
.await
.map_err(|e| {
if let Some(hyper_err) = e.source() {
if let Some(os_err) = hyper_err.source() {
if os_err.to_string().contains("tcp connect error: Cannot assign requested address (os error 99)") {
tracing::warn!("If your router is executing within a kubernetes pod, this failure may be caused by istio-proxy injection. See https://github.com/apollographql/router/issues/3533 for more details about how to solve this");
}
}
}
e
})?;
tracing::debug!("uplink response {:?}", res);
let response_body: graphql_client::Response<Query::ResponseData> = res.json().await?;
Ok(response_body)
Expand Down
5 changes: 5 additions & 0 deletions docs/source/containerization/kubernetes.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -254,3 +254,8 @@ If you had a router running on your localhost, with default health-check configu

curl "http://localhost:8088/health"

## Using `istio` with the router

The [istio service mesh](https://istio.io/) is a very popular choice for enhanced traffic routing within kubernetes.
garypen marked this conversation as resolved.
Show resolved Hide resolved

We have encountered an [issue](https://github.com/apollographql/router/issues/3533) with `istio-proxy` pod injection. It is possible for the router to start executing at the same time that istio is reconfiguring networking for the router pod. This is an issue with `istio`, not the router, and the fix is to follow the istio advice documented [here](https://istio.io/latest/docs/ops/common-problems/injection/#pod-or-containers-start-with-network-issues-if-istio-proxy-is-not-ready).
garypen marked this conversation as resolved.
Show resolved Hide resolved