Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(core): retry after clock skew correction #1170

Merged
merged 7 commits into from
Feb 27, 2024
Merged

Conversation

kuhe
Copy link
Contributor

@kuhe kuhe commented Feb 26, 2024

PR adds metadata if a clock skew correction occurred after a request error. This metadata, when present, is considered a transient and retryable error. maxAttempts must be >=2 to allow clock skew corrections to be retried.

Test code, using DynamoDB, which returns a date header that can be used to correct clock skew for sigv4.

// run using Node.js as a *.js file.
const { DynamoDB } = require("@aws-sdk/client-dynamodb");
const pkgJson = require("@aws-sdk/client-dynamodb/package.json");

const dynamodb = new DynamoDB({
  region: "us-west-2",
  systemClockOffset: -30 * 60 * 1000, // start with a bad offset so your initial request is rejected.
});

(async () => {
  console.log(pkgJson.name, "@", pkgJson.version);
  console.log("system clock offset is", dynamodb.config.systemClockOffset);
  const listTables = await dynamodb.listTables({});
  // When functioning correctly, this operation retries and the system
  // clock offset is printed again with its corrected value.
  // No error is thrown by the SDK.
  console.log("system clock offset is", dynamodb.config.systemClockOffset);
  console.log("listTables", listTables.$metadata.httpStatusCode);
})().catch(async (e) => {
  console.log("error", e.name, e.$metadata.httpStatusCode);
  console.log("error", e.$response.headers);
  console.log("system clock offset is", dynamodb.config.systemClockOffset);
  // when malfunctioning (currently), the system clock offset is corrected in the error handler
  // but no retry is made.
  console.log("retrying manually");
  const listTables = await dynamodb.listTables({});
  console.log("system clock offset is", dynamodb.config.systemClockOffset);
  console.log("listTables after manual retry", listTables.$metadata.httpStatusCode);
});

Context:

As far as I can tell, a clock skew correction does not trigger a retry anymore, since AWS SDK v3.222.0, which corresponds to the retry SRA.

https://github.com/aws/aws-sdk-js-v3/releases/tag/v3.222.0

Versions tested:

  • "@aws-sdk/client-dynamodb": "<=3.521.0" no retry after clock skew correction
  • "@aws-sdk/client-dynamodb": "<=3.450.0" no retry after clock skew correction
  • "@aws-sdk/client-dynamodb": "<=3.350.0" no retry after clock skew correction
  • "@aws-sdk/client-dynamodb": "<=3.222.0" no retry after clock skew correction
  • "@aws-sdk/client-dynamodb": "<=3.221.0" retry after clock skew correction
  • "@aws-sdk/client-dynamodb": "<=3.100.0" retry after clock skew correction

@kuhe kuhe requested review from a team as code owners February 26, 2024 21:41
@kuhe kuhe requested a review from hpmellema February 26, 2024 21:41
@kuhe kuhe force-pushed the fix/retries branch 2 times, most recently from 344d13b to 68174c9 Compare February 26, 2024 21:51
@syall
Copy link
Contributor

syall commented Feb 27, 2024

Aside: @aws-sdk/util-retry was published in v3.222.0 per aws/aws-sdk-js-v3@a2579b7, client codegen with the retry changes were released in v3.229.0 per aws/aws-sdk-js-v3@2fa1dd5.

@syall
Copy link
Contributor

syall commented Feb 27, 2024

It looks like the pre-SRA retry uses the deprecated StandardRetryStrategy in @smithy/middleware-retry, which checks if an error is retryable using the defaultRetryDecider. The defaultRetryDecider would check and retry on clock skew errors using isClockSkewError().

However, the new StandardRetryStrategy in @smithy/util-retry checks if error are retryable only if they are transient or throttling, which does NOT include clock skew errors.

To fix this regression, we can either:

  1. (Recommended) Classify clock skew errors as TRANSIENT (THROTTLING doesn't seem right) by using isClockSkewError() in getErrorRetryType()
    • Although not perfect semantically, I think this is a good compromise for fixing the regression with minimal changes.
  2. (Not recommended) Add isClockSkewError() as a condition in isTransientError()
  3. (Not recommended) Expand RetryErrorInfo to include a clock skew error classification, and expand StandardRetryStrategy::isRetryableError() to include the clock skew error classification (non-compliant with the SRA specification).

/**
* The error thrown during the initial request, if available.
*/
error?: SdkError;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Users have requested that the thrown exception is available in the retry strategy implementation. Otherwise, they cannot even check (for example) the http status code of an error when deciding whether to retry.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without this, they are forced to write a verbose retry wrapper instead of using a RetryStrategy.

@kuhe
Copy link
Contributor Author

kuhe commented Feb 27, 2024

Updated PR to use option 1. E2E test will be added in AWS SDK.

@kuhe kuhe merged commit dd0d9b4 into smithy-lang:main Feb 27, 2024
7 checks passed
@kuhe kuhe deleted the fix/retries branch February 27, 2024 17:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants