Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NodeJS SDK lambda client does not throw when reaching socketTimeout or Agent.timeout / NAT timeout #3710

Closed
kuhe opened this issue Jun 17, 2022 · 6 comments
Assignees
Labels
bug This issue is a bug. closed-for-staleness p2 This is a standard priority issue workaround-available This issue has a work around available.

Comments

@kuhe
Copy link
Contributor

kuhe commented Jun 17, 2022

Describe the bug

Discovered while testing #3624

const { NodeHttpHandler } = require("@aws-sdk/node-http-handler");
const { Agent } = require("http");

const agent = new Agent({
  keepAlive: true,
  timeout: 400 * 1000,
});

const lambdaClient = new LambdaClient({
  region: "us-east-1",
  logger: console,
  requestHandler: new NodeHttpHandler({
    connectionTimeout: 8000,
    socketTimeout: 400 * 1000,
    httpAgent: agent,
  }),
});

When the SDK Client should time out upon breaching the node http Agent timeout setting or the NodeHttpHandler socketTimeout setting, instead nothing happens and the promise continues to be awaited.

Expected Behavior

When breaching the Agent's timeout value or the NodeHttpHandler's socketTimeout value, I would expect the SDK client to throw a timeout error to be handled by the client's caller.

Current Behavior

It may be restricted to this Lambda NAT timeout case, but the request from client.send(...) continues to be awaited.

Reproduction Steps

AWS resource stack:

  • VPC (default is ok)
  • private subnet with NAT gateway
  • Lambda Invoker
    • place within private subnet and configure timeout to be greater than 420 seconds.
  • Lambda B, returns after 360s to hit the 350s NAT gateway timeout
    • do not configure VPC, but configure timeout to be greater than 420 seconds.

Code for calling B from Invoker (modify function names as needed):

const { LambdaClient, InvokeCommand } = require("@aws-sdk/client-lambda");
const { NodeHttpHandler } = require("@aws-sdk/node-http-handler");
const { Agent } = require("http");

const agent = new Agent({
  keepAlive: true,
  // keepAliveMsecs: 999,
  // maxSockets: 100,
  // maxTotalSockets: 100,
  // maxFreeSockets: 256,
  timeout: 400 * 1000,
  // scheduling: "fifo",
});

const lambdaClient = new LambdaClient({
  region: "us-east-1",
  logger: console,
  requestHandler: new NodeHttpHandler({
    connectionTimeout: 8000,
    socketTimeout: 400 * 1000,
    httpAgent: agent,
  }),
});

exports.handler = async function handler(event, context, callback) {
  let counter = 1;
  const timer = setInterval(() => {
    console.log(new Date(), "waiting...", counter++);
  }, 1000);

  const invokeParamsB = {
    FunctionName: "tcp-keepalive-test-B",
    Payload: "",
  };
  const commandB = new InvokeCommand(invokeParamsB);

  // should throw upon breach of timeout settings, but continues await until Lambda's own timeout.
  const dataB = await lambdaClient.send(commandB); 
  console.log(new TextDecoder("utf-8").decode(dataB.Payload));

  clearTimeout(timer);
};

Possible Solution

Race promise to throw timeout?

Additional Information/Context

This is not a solution for the timeout failing to throw, but incidental workaround to the timeout issue.

Making an SDK client request to a shorter lambda (or perhaps making any client request at all) first before calling the long-running lambda makes it succeed despite taking longer time overall.

Succeeds (!?)

  const invokeParamsA = {
    FunctionName: "tcp-keepalive-test-A",
    Payload: "",
  };
  const commandA = new InvokeCommand(invokeParamsA);

  const invokeParamsB = {
    FunctionName: "tcp-keepalive-test-B",
    Payload: "",
  };
  const commandB = new InvokeCommand(invokeParamsB);

  const dataA = await lambdaClient.send(commandA);
  console.log(new TextDecoder("utf-8").decode(dataA.Payload));

  const dataB = await lambdaClient.send(commandB);
  console.log(new TextDecoder("utf-8").decode(dataB.Payload));

SDK version used

3.110.0

Environment details (OS name and version, etc.)

AWS AL2

@kuhe kuhe added bug This issue is a bug. needs-triage This issue or PR still needs to be triaged. labels Jun 17, 2022
@kuhe kuhe self-assigned this Jun 17, 2022
@kuhe kuhe removed the needs-triage This issue or PR still needs to be triaged. label Jun 17, 2022
@ronak2121
Copy link
Contributor

@kuhe is there any update on this issue? I'm seeing a similar issue with calls out to S3. I've set the socketTimeout value to 3s but I see my calls taking more than 6s and succeeding. This makes me think the socketTimeout value is not actually working

@kuhe kuhe added the closing-soon This issue will automatically close in 4 days unless further comments are made. label Nov 14, 2022
@kuhe
Copy link
Contributor Author

kuhe commented Nov 14, 2022

No update, there is not enough information or priority available to work on this. It may be due to something specific to Lambda.

@kuhe kuhe added closing-soon This issue will automatically close in 4 days unless further comments are made. and removed closing-soon This issue will automatically close in 4 days unless further comments are made. labels Nov 14, 2022
@kizaonline
Copy link

@kuhe This is effecting us in production and a ticket raised months ago with AWS Support who pointed us towards this issue, maybe due to the combination of NAT Gateway and Lambda?

@github-actions github-actions bot removed the closing-soon This issue will automatically close in 4 days unless further comments are made. label Nov 15, 2022
@bbright024
Copy link

@kizaonline @ronak2121 and all others impacted by this bug -

there's a simple hack you can use to avoid this issue: complete a guaranteed fast request with the client, before attempting the >5 minute request.

e.g. if there's a lambda invoke that may take >5 minutes, first use the client to make an invoke request with the DryRun parameter set to true, then attempt the >5 minute invoke.

DryRun – Validate parameter values and verify that the user or role has permission to invoke the function.
https://docs.aws.amazon.com/AWSJavaScriptSDK/latest/AWS/Lambda.html

this works because the keepalive parameters are not applied until after the first request is completed. you can see it in this code -

state.keepAliveTimeoutSet = true;

I highly recommend adding this workaround to your code - bnoordhuis mentions that this is intended behavior in node, so there's no guarantee it will ever be fixed:

The keepAliveMsecs option isn't applied until the socket is added to the connection pool, i.e., after the first request. That's by design, as far as I'm aware.
(2) is a policy change. Best way forward is to open a pull request and see how it's received.
nodejs/node#47137 (comment)

@kuhe kuhe added the workaround-available This issue has a work around available. label Mar 27, 2023
@RanVaknin RanVaknin added the p2 This is a standard priority issue label Mar 28, 2023
Copy link

Greetings! We’re closing this issue because it has been open a long time and hasn’t been updated in a while and may not be getting the attention it deserves. We encourage you to check if this is still an issue in the latest release and if you find that this is still a problem, please feel free to comment or open a new issue.

@github-actions github-actions bot added closing-soon This issue will automatically close in 4 days unless further comments are made. closed-for-staleness and removed closing-soon This issue will automatically close in 4 days unless further comments are made. labels Mar 28, 2024
@github-actions github-actions bot closed this as completed Apr 1, 2024
Copy link

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs and link to relevant comments in this thread.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Apr 15, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug This issue is a bug. closed-for-staleness p2 This is a standard priority issue workaround-available This issue has a work around available.
Projects
None yet
Development

No branches or pull requests

5 participants