Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add static stability support to IMDS credentials provider #2258

Merged
merged 6 commits into from
Feb 2, 2023

Conversation

ysaito1001
Copy link
Contributor

@ysaito1001 ysaito1001 commented Jan 27, 2023

Motivation and Context

Implements #2117

Description

This PR adds static stability support to the IMDS credentials provider. The goal is to make ImdsCredentialsProvider more robust in providing credentials even when IMDS is unreachable. We do this by letting it hold the last retrieved credentials, and these fallback credentials can be served as a backup.

In addition to the failure scenario above, we also need to consider the case where IMDS may return expired credentials. Static stability support specifies that ImdsCredentialsProvider should extend the credentials expiry by the calculated amount. The implementation of that is defined in ImdsCredentialsProvider::maybe_extend_expiration and references that in botocore.

By implementing the above, the Rust SDK will not fail fast in sending a request with expired credentials and allows the target service to make the ultimate decision as to whether the request sent is valid.

Testing

For the ease of maintaining tests in the future, we'd like to keep them in aws-config. Some of the tests require "SDK can send a request...", but we can reinterpret the test specification and bring it closer to the shore of ImdsCredentialsProvider. From an ImdsCredentialsProvider's perspective, credentials being returned from provide_credentials is equivalent to "SDK can send a request". Furthermore, the Rust SDK, by design, separates credentials providers from credentials caching at the time of writing (01/27/2023); ImdsCredentialsProvider itself does not cache credentials the way LazyCredentialsCache does. Thus, the test specification like "IMDS must only be called once" can be translated to ImdsCredentialsProvider correctly extending expired credentials.

  • Expired Credential Test Cases
    • SDK can use IMDS credential provider if first IMDS call returns expired credentials (implemented by expired_credentials_should_be_extended).
    • SDK can send request if expired credentials are available (here's no explicit test added for this as it is implicitly handled by two tests for Refresh Failures).
    • SDK can perform 3 successive requests with expired credentials. IMDS must only be called once (implicitly realized by expired_credentials_should_be_extended).
  • Refresh Failures
    • SDK can send a request after a read timeout during a refresh (implemented by read_timeout_during_credentials_refresh_should_yield_last_retrieved_credentials).
    • SDK can send a request after receiving a 500 HTTP response from IMDS during a refresh (implemented by fallback_credentials_should_be_used_when_imds_returns_500_during_credentials_refresh).
  • Logging
    • SDK must log a message when expired credentials are used (implemented by expired_credentials_should_be_extended).
  • Passed tests in CI

Checklist

  • I have updated CHANGELOG.next.toml if I made changes to the AWS SDK, generated SDK code, or SDK runtime crates

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

This commit adds static stability support to `ImdsCredentialsProvider`.
Static stability refers to continued availability of a service in the
face of impaired dependencies. In case IMDS is not available, we still
allow requests to be dispatched with expired credentials. This, in turn,
allows the target service to makes the ultimate decision as to whether
requests sent are valid or not instead of the client SDK determining
their validity.

The way it is implemented is `ImdsCredentialsProvider` now stores a last
retrieved credentials which will later be served when IMDS is unreachable.
This commit adds tests to IMDS credentials providers for static stability
support. These tests are prescribed in #2117.
From an IMDS credentials provider' perspective, however, some of the tests
are considered to fall under the same equivalence class with others.
Therefore, a single test can cover multiple test cases.
@github-actions
Copy link

A new generated diff is ready to view.

A new doc preview is ready to view.

@ysaito1001 ysaito1001 marked this pull request as ready for review January 28, 2023 04:18
@ysaito1001 ysaito1001 requested review from a team as code owners January 28, 2023 04:18
Copy link
Collaborator

@jdisanti jdisanti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

CHANGELOG.next.toml Outdated Show resolved Hide resolved
@ysaito1001 ysaito1001 enabled auto-merge (squash) February 1, 2023 22:32
@github-actions
Copy link

github-actions bot commented Feb 1, 2023

A new generated diff is ready to view.

A new doc preview is ready to view.

@github-actions
Copy link

github-actions bot commented Feb 2, 2023

A new generated diff is ready to view.

A new doc preview is ready to view.

@ysaito1001 ysaito1001 merged commit 681d3b3 into main Feb 2, 2023
@ysaito1001 ysaito1001 deleted the ysaito/static-stability-support branch February 2, 2023 00:56
crisidev pushed a commit that referenced this pull request Feb 2, 2023
* Add static stability support to ImdsCredentialsProvider

This commit adds static stability support to `ImdsCredentialsProvider`.
Static stability refers to continued availability of a service in the
face of impaired dependencies. In case IMDS is not available, we still
allow requests to be dispatched with expired credentials. This, in turn,
allows the target service to makes the ultimate decision as to whether
requests sent are valid or not instead of the client SDK determining
their validity.

The way it is implemented is `ImdsCredentialsProvider` now stores a last
retrieved credentials which will later be served when IMDS is unreachable.

* Add tests to IMDS credentials provider

This commit adds tests to IMDS credentials providers for static stability
support. These tests are prescribed in #2117.
From an IMDS credentials provider' perspective, however, some of the tests
are considered to fall under the same equivalence class with others.
Therefore, a single test can cover multiple test cases.

* Update CHANGELOG.next.toml

* Update CHANGELOG.next.toml

Co-authored-by: John DiSanti <jdisanti@amazon.com>

---------

Co-authored-by: Yuki Saito <awsaito@amazon.com>
Co-authored-by: John DiSanti <jdisanti@amazon.com>
thomas-k-cameron added a commit to thomas-k-cameron/smithy-rs that referenced this pull request Feb 4, 2023
* formatting: run pre-commit on all files (smithy-lang#2236)

* formatting: run pre-commit on all files

* fix: test broken by string indent

* Remove old service builder machinery (smithy-lang#2161)

* Remove HandlerGenerator and RegistryGenerator

* Shrink RequestRejection

* Remove inlineable

* Remove Router and RequestParts

* Remove old service builder tests

* Add missing test dependency

* Add missing dev dependency

* Remove unused test

* Move Router types

* Switch AllowUnused to AllowUnusedVariables

* Add changelog entry for "Remove old service builder machinery" (smithy-lang#2242)

* Dense maps cannot deserialize null values (smithy-lang#2239)

* Dense maps cannot deserialize null values

Signed-off-by: Daniele Ahmed <ahmeddan@amazon.de>

* Dense lists cannot deserialize null (smithy-lang#2240)

* Dense lists cannot deserialize null

Signed-off-by: Daniele Ahmed <ahmeddan@amazon.de>

* Update changelog

* Add changelog entry about smithy-lang#2200 (smithy-lang#2250)

Co-authored-by: Julian Antonielli <jjantdev@amazon.co.uk>

* Fix `OperationExtensionFuture` poll order (smithy-lang#2247)

* Fix `OperationExtensionFuture` poll order

* Add CHANGELOG.next.toml entry

Co-authored-by: AWS SDK Rust Bot <97246200+aws-sdk-rust-ci@users.noreply.github.com>

* Bump version to 0.54.1 in `gradle.properties` (smithy-lang#2249)

Co-authored-by: Julian Antonielli <jjantdev@amazon.co.uk>
Co-authored-by: AWS SDK Rust Bot <97246200+aws-sdk-rust-ci@users.noreply.github.com>

* Update changelog

* Add CI action to test aws-sdk-services (smithy-lang#2251)

* run cargo test --all-features instead

* Add check-only option, change check to test

* Support nested APIs in operation input tests

* Newtype FromRequest::Future (smithy-lang#2244)

* Add RFC: Providing fallback credentials on timeout (smithy-lang#2218)

* Add RFC: providing fallback credentials on timeout

* Update design/src/rfcs/rfc0031_providing_fallback_credentials_on_timeout.md

Co-authored-by: Zelda Hessler <zhessler@amazon.com>

* Update design/src/rfcs/rfc0031_providing_fallback_credentials_on_timeout.md

Co-authored-by: Zelda Hessler <zhessler@amazon.com>

* Update design/src/rfcs/rfc0031_providing_fallback_credentials_on_timeout.md

Co-authored-by: Zelda Hessler <zhessler@amazon.com>

* Update design/src/rfcs/rfc0031_providing_fallback_credentials_on_timeout.md

Co-authored-by: Zelda Hessler <zhessler@amazon.com>

* Update design/src/rfcs/rfc0031_providing_fallback_credentials_on_timeout.md

Co-authored-by: Zelda Hessler <zhessler@amazon.com>

* Update design/src/rfcs/rfc0031_providing_fallback_credentials_on_timeout.md

Co-authored-by: Zelda Hessler <zhessler@amazon.com>

* Update design/src/rfcs/rfc0031_providing_fallback_credentials_on_timeout.md

Co-authored-by: Zelda Hessler <zhessler@amazon.com>

* Update design/src/rfcs/rfc0031_providing_fallback_credentials_on_timeout.md

Co-authored-by: Zelda Hessler <zhessler@amazon.com>

* Update design/src/rfcs/rfc0031_providing_fallback_credentials_on_timeout.md

Co-authored-by: Zelda Hessler <zhessler@amazon.com>

* Update design/src/rfcs/rfc0031_providing_fallback_credentials_on_timeout.md

Co-authored-by: Zelda Hessler <zhessler@amazon.com>

* Update design/src/rfcs/rfc0031_providing_fallback_credentials_on_timeout.md

Co-authored-by: John DiSanti <jdisanti@amazon.com>

* Incorporate review feedback into RFC

This commit addresses the review feedback:
smithy-lang#2218 (comment)
smithy-lang#2218 (comment)
smithy-lang#2218 (comment)

In addition, we have renamed the proposed method `on_timeout` to
`fallback_on_interrupt` to make it more descriptive.

* Update rfc0031_providing_fallback_credentials_on_timeout.md

* Update RFC

This commit updates RFC and leaves to discussion how `fallback_on_interrupt`
should be implemented, i.e., either as a synchronous primitive or an
asynchronous primitive.

* Update rfc0031_providing_fallback_credentials_on_timeout.md

* Update rfc0031_providing_fallback_credentials_on_timeout.md

* Update rfc0031_providing_fallback_credentials_on_timeout.md

---------

Co-authored-by: Yuki Saito <awsaito@amazon.com>
Co-authored-by: Zelda Hessler <zhessler@amazon.com>
Co-authored-by: John DiSanti <jdisanti@amazon.com>
Co-authored-by: Luca Palmieri <20745048+LukeMathWalker@users.noreply.github.com>

* Replace PR bot's diff tool with one that supports pagination (smithy-lang#2245)

* Implement paginated diff to html tool
* Wire up the new diff tool to PR bot

* Add fallback_on_interrupt to the ProvideCredentials trait (smithy-lang#2246)

* Implement RFC for providing fallback credentials

This commit implements the changes checklist in the RFC for providing
fallback credentials.

* Remove needless lifetime parameter

* Update CHANGELOG.next.toml

---------

Co-authored-by: Yuki Saito <awsaito@amazon.com>

* Copy non-service integration tests into SDK root tests directory (smithy-lang#2255)

* Copy non-service integration tests into SDK root tests directory

For tests in `aws/sdk/integration-tests` that are not named after a
service module, copy them into the SDK's root `tests/` directory so that
they are run as part of CI on the entire SDK.

* Add missing fields to integration test manifests
* Remove tests from root workspace
* Explicitly exclude the root tests from the root workspace

* Update the event stream section in RFC30 (smithy-lang#2243)

* Python: Support `document` type (smithy-lang#2188)

* Add Python wrapper for `aws_smithy_types::Document`

* Remove unused type

* Make sure Python SSDKs uses Python specific `Document` type

* Allow Python specific `Document` type to be used in serialization and deserialization

* Make `pyo3/extension-module` a default feature

* Add `PythonServerTypesTest`

* Fix linting issues

* Add `Data limit exceeded` to build image throttle messages (smithy-lang#2260)

* Update smoketest models (smithy-lang#2252)

* Make minor fixes to the crate claim workflow (smithy-lang#2259)

* Add support for `@uniqueItems` (smithy-lang#2232)

This commit adds support for the `@uniqueItems` trait on `list` shapes
in server SDKs. Requests with duplicate values for `list` shapes
constrained with `@uniqueItems` will be rejected by servers.

* Update mentions to codegen configuration key in changelog (smithy-lang#2166)

As well as in one error message.

The key is `codegen`, not `codegenConfig`.

* Change the release flow to use release branches (smithy-lang#2253)

* What happens if we comment out the runtime crate version from gradle.properties?

* Allow running the release and the CI workflows from an arbitrary commit.

* Does a fake version work?

* Pass `git_ref` from the release workflow.

* It needs to be a valid semver version.

* Sketch new command to upgrade version in gradle.properties

* Command implementation

* Plug the new publisher command into the `release` action.

* Plumb end-to-end

* Fix copyright header.

* Fix lint.

* Temporarily comment out the sanity check.

* Ignore sanity check

* Add a command that prints out the template for CHANGELOG.next.toml

* Add branch check + empty TOML generation.

* Add copyright headers.

* Fix imports.

* Remove sanity check.

* Move script to a file.

* Add a check to validate the tag.

* Remove second build step.

* Move to .github/scripts folder.

* Make the script easier to run locally

* Fail if anything fails.

* Add comment.

* Update .github/scripts/get-or-create-release-branch.sh

Co-authored-by: david-perez <d@vidp.dev>

* Update .github/scripts/get-or-create-release-branch.sh

Co-authored-by: david-perez <d@vidp.dev>

* Update .github/scripts/get-or-create-release-branch.sh

Co-authored-by: david-perez <d@vidp.dev>

* Update .github/workflows/ci.yml

Co-authored-by: david-perez <d@vidp.dev>

* Remove touch.

* Fix indentation and branch name.

* Update .github/workflows/ci.yml

Co-authored-by: david-perez <d@vidp.dev>

* Update .github/workflows/release.yml

Co-authored-by: david-perez <d@vidp.dev>

* Update .github/workflows/release.yml

Co-authored-by: david-perez <d@vidp.dev>

* Explicit flags.

* Use the path that was provided.

* Format

---------

Co-authored-by: david-perez <d@vidp.dev>

* Reduce Docker image rebuilds (smithy-lang#2269)

* Move `acquire-build-image` into `.github`
* Move the `tools-hash` script into `.github`
* Upload to ECR from PRs as well
* Move build tools into `tools/ci-build/`
* Move CI scripts out of `ci-build`
* Split CI for forks/non-forks
* Remove `fetch-depth: 0` from PR workflows

* Remove teams from `publisher` ownership list (smithy-lang#2257)

* Add static stability support to IMDS credentials provider (smithy-lang#2258)

* Add static stability support to ImdsCredentialsProvider

This commit adds static stability support to `ImdsCredentialsProvider`.
Static stability refers to continued availability of a service in the
face of impaired dependencies. In case IMDS is not available, we still
allow requests to be dispatched with expired credentials. This, in turn,
allows the target service to makes the ultimate decision as to whether
requests sent are valid or not instead of the client SDK determining
their validity.

The way it is implemented is `ImdsCredentialsProvider` now stores a last
retrieved credentials which will later be served when IMDS is unreachable.

* Add tests to IMDS credentials provider

This commit adds tests to IMDS credentials providers for static stability
support. These tests are prescribed in smithy-lang#2117.
From an IMDS credentials provider' perspective, however, some of the tests
are considered to fall under the same equivalence class with others.
Therefore, a single test can cover multiple test cases.

* Update CHANGELOG.next.toml

* Update CHANGELOG.next.toml

Co-authored-by: John DiSanti <jdisanti@amazon.com>

---------

Co-authored-by: Yuki Saito <awsaito@amazon.com>
Co-authored-by: John DiSanti <jdisanti@amazon.com>

* Increase build image backoff time and attempts (smithy-lang#2267)

* Must set a member in unions (smithy-lang#2241)

* Must set a member in unions

Signed-off-by: Daniele Ahmed <ahmeddan@amazon.de>

* Add `runs-on` (smithy-lang#2273)

* Add `runs-on` (smithy-lang#2275)

* Fix job trigger. Clarify that short SHAs won't work. (smithy-lang#2278)

* Clarify what type of reference we are trying to push. (smithy-lang#2279)

* Update gradle properties for dry runs as well. (smithy-lang#2280)

* Get verbose logging from the branch script. (smithy-lang#2281)

* Fix trigger for following jobs. (smithy-lang#2282)

* Fix action paths. (smithy-lang#2283)

* We need to checkout in the `smithy-rs` folder because that's an assumption baked into the definition of the docker-build action. (smithy-lang#2284)

* Use action-arguments for arguments. (smithy-lang#2285)

* Fix if condition. (smithy-lang#2286)

* Persist the modified gradle.properties outside of the Docker context (smithy-lang#2287)

* Fix if condition.

* Make sure that the changes are visible to the push step after the Docker action has executed by persisting the modified repository as an artifact.

* Give a name to the argument.

* Use larger machines for the slowest CI jobs. (smithy-lang#2263)

* Use larger machines for the slowest CI jobs.

* Fix invocation.

* Fix paths. (smithy-lang#2288)

* Commit if modified (smithy-lang#2289)

* Fix paths.

* Fix commit logic - it should commit when there were changes.

* Set user name and email when committing (smithy-lang#2290)

* `git push origin` fails if there is nothing to push. (smithy-lang#2291)

* Use curly braces to group together the commit and push action (smithy-lang#2292)

* Fix syntax error when grouping commands. (smithy-lang#2293)

* [release-branches] Fix the reference that gets checked out (smithy-lang#2294)

* Fix the reference that gets checked out

* Fix

* Fix broken doc link to `tokio_stream::Stream` (smithy-lang#2271)

* Fix broken doc link to `Stream`

This commit fixes broken doc link to `Stream` in codegen clients. That
target is `tokio_stream::Stream`, which in turn is a re-export of
`futures_core::Stream`. Since we do not have a good way to link to the
re-export, we remove a hyper link and just put `Stream`.

* Update CHANGELOG.next.toml

---------

Co-authored-by: Yuki Saito <awsaito@amazon.com>

* Use docker login when possible (smithy-lang#2265)

Login to ECR when credentials are available to improve CI performance

* Simplify `AdHocSection` (smithy-lang#2272)

Co-authored-by: Russell Cohen <rcoh@amazon.com>

* Fix CI on main and don't acquire Docker login for forks (smithy-lang#2295)

* Fix CI on main and don't acquire Docker login for forks

* Convert empty env vars into `None`

* Optimize base image acquisition on main

* Collect more diagnostics. (smithy-lang#2297)

* [release-branches] Fix working directory (smithy-lang#2298)

* Fix working directory.

* Build the Docker image using the latest commit on the invocation branch.

* We need a Docker image with the same SHA later in the process.

* Clone does not preserve uncommitted changes. This was leading to the gradle.properties update being lost. (smithy-lang#2299)

* Make `OperationExtension` store the absolute shape ID (smithy-lang#2276)

* Do not alter Operation shape ID

* Add OperationExtensionExt test

* Add CHANGELOG.next.toml entry

* Apply suggestions from code review

Co-authored-by: Luca Palmieri <20745048+LukeMathWalker@users.noreply.github.com>

---------

Co-authored-by: Luca Palmieri <20745048+LukeMathWalker@users.noreply.github.com>

* Retrieve the output from outside the Docker context (smithy-lang#2300)

Co-authored-by: AWS SDK Rust Bot <97246200+aws-sdk-rust-ci@users.noreply.github.com>

* Fix image tagging in CI on main (smithy-lang#2301)

* Fix handling of repeated headers in AWS request canonicalization (smithy-lang#2261)

* Remove usage of always empty writable in `JsonParserGenerator` (smithy-lang#2192)

The writable calculates a string that it never writes.

This was added in smithy-lang#2131.

* fix

---------

Signed-off-by: Daniele Ahmed <ahmeddan@amazon.de>
Co-authored-by: Zelda Hessler <zhessler@amazon.com>
Co-authored-by: Harry Barber <106155934+hlbarber@users.noreply.github.com>
Co-authored-by: 82marbag <69267416+82marbag@users.noreply.github.com>
Co-authored-by: AWS SDK Rust Bot <aws-sdk-rust-primary@amazon.com>
Co-authored-by: Julian Antonielli <julianantonielli@gmail.com>
Co-authored-by: Julian Antonielli <jjantdev@amazon.co.uk>
Co-authored-by: AWS SDK Rust Bot <97246200+aws-sdk-rust-ci@users.noreply.github.com>
Co-authored-by: Russell Cohen <rcoh@amazon.com>
Co-authored-by: ysaito1001 <gperson22@gmail.com>
Co-authored-by: Yuki Saito <awsaito@amazon.com>
Co-authored-by: John DiSanti <jdisanti@amazon.com>
Co-authored-by: Luca Palmieri <20745048+LukeMathWalker@users.noreply.github.com>
Co-authored-by: Burak <unexge@gmail.com>
Co-authored-by: david-perez <d@vidp.dev>
Co-authored-by: Nipunn Koorapati <nipunn1313@gmail.com>
thomas-k-cameron added a commit to thomas-k-cameron/smithy-rs that referenced this pull request Feb 4, 2023
* formatting: run pre-commit on all files (smithy-lang#2236)

* formatting: run pre-commit on all files

* fix: test broken by string indent

* Remove old service builder machinery (smithy-lang#2161)

* Remove HandlerGenerator and RegistryGenerator

* Shrink RequestRejection

* Remove inlineable

* Remove Router and RequestParts

* Remove old service builder tests

* Add missing test dependency

* Add missing dev dependency

* Remove unused test

* Move Router types

* Switch AllowUnused to AllowUnusedVariables

* Add changelog entry for "Remove old service builder machinery" (smithy-lang#2242)

* Dense maps cannot deserialize null values (smithy-lang#2239)

* Dense maps cannot deserialize null values

Signed-off-by: Daniele Ahmed <ahmeddan@amazon.de>

* Dense lists cannot deserialize null (smithy-lang#2240)

* Dense lists cannot deserialize null

Signed-off-by: Daniele Ahmed <ahmeddan@amazon.de>

* Update changelog

* Add changelog entry about smithy-lang#2200 (smithy-lang#2250)

Co-authored-by: Julian Antonielli <jjantdev@amazon.co.uk>

* Fix `OperationExtensionFuture` poll order (smithy-lang#2247)

* Fix `OperationExtensionFuture` poll order

* Add CHANGELOG.next.toml entry

Co-authored-by: AWS SDK Rust Bot <97246200+aws-sdk-rust-ci@users.noreply.github.com>

* Bump version to 0.54.1 in `gradle.properties` (smithy-lang#2249)

Co-authored-by: Julian Antonielli <jjantdev@amazon.co.uk>
Co-authored-by: AWS SDK Rust Bot <97246200+aws-sdk-rust-ci@users.noreply.github.com>

* Update changelog

* Add CI action to test aws-sdk-services (smithy-lang#2251)

* run cargo test --all-features instead

* Add check-only option, change check to test

* Support nested APIs in operation input tests

* Newtype FromRequest::Future (smithy-lang#2244)

* Add RFC: Providing fallback credentials on timeout (smithy-lang#2218)

* Add RFC: providing fallback credentials on timeout

* Update design/src/rfcs/rfc0031_providing_fallback_credentials_on_timeout.md

Co-authored-by: Zelda Hessler <zhessler@amazon.com>

* Update design/src/rfcs/rfc0031_providing_fallback_credentials_on_timeout.md

Co-authored-by: Zelda Hessler <zhessler@amazon.com>

* Update design/src/rfcs/rfc0031_providing_fallback_credentials_on_timeout.md

Co-authored-by: Zelda Hessler <zhessler@amazon.com>

* Update design/src/rfcs/rfc0031_providing_fallback_credentials_on_timeout.md

Co-authored-by: Zelda Hessler <zhessler@amazon.com>

* Update design/src/rfcs/rfc0031_providing_fallback_credentials_on_timeout.md

Co-authored-by: Zelda Hessler <zhessler@amazon.com>

* Update design/src/rfcs/rfc0031_providing_fallback_credentials_on_timeout.md

Co-authored-by: Zelda Hessler <zhessler@amazon.com>

* Update design/src/rfcs/rfc0031_providing_fallback_credentials_on_timeout.md

Co-authored-by: Zelda Hessler <zhessler@amazon.com>

* Update design/src/rfcs/rfc0031_providing_fallback_credentials_on_timeout.md

Co-authored-by: Zelda Hessler <zhessler@amazon.com>

* Update design/src/rfcs/rfc0031_providing_fallback_credentials_on_timeout.md

Co-authored-by: Zelda Hessler <zhessler@amazon.com>

* Update design/src/rfcs/rfc0031_providing_fallback_credentials_on_timeout.md

Co-authored-by: Zelda Hessler <zhessler@amazon.com>

* Update design/src/rfcs/rfc0031_providing_fallback_credentials_on_timeout.md

Co-authored-by: John DiSanti <jdisanti@amazon.com>

* Incorporate review feedback into RFC

This commit addresses the review feedback:
smithy-lang#2218 (comment)
smithy-lang#2218 (comment)
smithy-lang#2218 (comment)

In addition, we have renamed the proposed method `on_timeout` to
`fallback_on_interrupt` to make it more descriptive.

* Update rfc0031_providing_fallback_credentials_on_timeout.md

* Update RFC

This commit updates RFC and leaves to discussion how `fallback_on_interrupt`
should be implemented, i.e., either as a synchronous primitive or an
asynchronous primitive.

* Update rfc0031_providing_fallback_credentials_on_timeout.md

* Update rfc0031_providing_fallback_credentials_on_timeout.md

* Update rfc0031_providing_fallback_credentials_on_timeout.md

---------

Co-authored-by: Yuki Saito <awsaito@amazon.com>
Co-authored-by: Zelda Hessler <zhessler@amazon.com>
Co-authored-by: John DiSanti <jdisanti@amazon.com>
Co-authored-by: Luca Palmieri <20745048+LukeMathWalker@users.noreply.github.com>

* Replace PR bot's diff tool with one that supports pagination (smithy-lang#2245)

* Implement paginated diff to html tool
* Wire up the new diff tool to PR bot

* Add fallback_on_interrupt to the ProvideCredentials trait (smithy-lang#2246)

* Implement RFC for providing fallback credentials

This commit implements the changes checklist in the RFC for providing
fallback credentials.

* Remove needless lifetime parameter

* Update CHANGELOG.next.toml

---------

Co-authored-by: Yuki Saito <awsaito@amazon.com>

* Copy non-service integration tests into SDK root tests directory (smithy-lang#2255)

* Copy non-service integration tests into SDK root tests directory

For tests in `aws/sdk/integration-tests` that are not named after a
service module, copy them into the SDK's root `tests/` directory so that
they are run as part of CI on the entire SDK.

* Add missing fields to integration test manifests
* Remove tests from root workspace
* Explicitly exclude the root tests from the root workspace

* Update the event stream section in RFC30 (smithy-lang#2243)

* Python: Support `document` type (smithy-lang#2188)

* Add Python wrapper for `aws_smithy_types::Document`

* Remove unused type

* Make sure Python SSDKs uses Python specific `Document` type

* Allow Python specific `Document` type to be used in serialization and deserialization

* Make `pyo3/extension-module` a default feature

* Add `PythonServerTypesTest`

* Fix linting issues

* Add `Data limit exceeded` to build image throttle messages (smithy-lang#2260)

* Update smoketest models (smithy-lang#2252)

* Make minor fixes to the crate claim workflow (smithy-lang#2259)

* Add support for `@uniqueItems` (smithy-lang#2232)

This commit adds support for the `@uniqueItems` trait on `list` shapes
in server SDKs. Requests with duplicate values for `list` shapes
constrained with `@uniqueItems` will be rejected by servers.

* Update mentions to codegen configuration key in changelog (smithy-lang#2166)

As well as in one error message.

The key is `codegen`, not `codegenConfig`.

* Change the release flow to use release branches (smithy-lang#2253)

* What happens if we comment out the runtime crate version from gradle.properties?

* Allow running the release and the CI workflows from an arbitrary commit.

* Does a fake version work?

* Pass `git_ref` from the release workflow.

* It needs to be a valid semver version.

* Sketch new command to upgrade version in gradle.properties

* Command implementation

* Plug the new publisher command into the `release` action.

* Plumb end-to-end

* Fix copyright header.

* Fix lint.

* Temporarily comment out the sanity check.

* Ignore sanity check

* Add a command that prints out the template for CHANGELOG.next.toml

* Add branch check + empty TOML generation.

* Add copyright headers.

* Fix imports.

* Remove sanity check.

* Move script to a file.

* Add a check to validate the tag.

* Remove second build step.

* Move to .github/scripts folder.

* Make the script easier to run locally

* Fail if anything fails.

* Add comment.

* Update .github/scripts/get-or-create-release-branch.sh

Co-authored-by: david-perez <d@vidp.dev>

* Update .github/scripts/get-or-create-release-branch.sh

Co-authored-by: david-perez <d@vidp.dev>

* Update .github/scripts/get-or-create-release-branch.sh

Co-authored-by: david-perez <d@vidp.dev>

* Update .github/workflows/ci.yml

Co-authored-by: david-perez <d@vidp.dev>

* Remove touch.

* Fix indentation and branch name.

* Update .github/workflows/ci.yml

Co-authored-by: david-perez <d@vidp.dev>

* Update .github/workflows/release.yml

Co-authored-by: david-perez <d@vidp.dev>

* Update .github/workflows/release.yml

Co-authored-by: david-perez <d@vidp.dev>

* Explicit flags.

* Use the path that was provided.

* Format

---------

Co-authored-by: david-perez <d@vidp.dev>

* Reduce Docker image rebuilds (smithy-lang#2269)

* Move `acquire-build-image` into `.github`
* Move the `tools-hash` script into `.github`
* Upload to ECR from PRs as well
* Move build tools into `tools/ci-build/`
* Move CI scripts out of `ci-build`
* Split CI for forks/non-forks
* Remove `fetch-depth: 0` from PR workflows

* Remove teams from `publisher` ownership list (smithy-lang#2257)

* Add static stability support to IMDS credentials provider (smithy-lang#2258)

* Add static stability support to ImdsCredentialsProvider

This commit adds static stability support to `ImdsCredentialsProvider`.
Static stability refers to continued availability of a service in the
face of impaired dependencies. In case IMDS is not available, we still
allow requests to be dispatched with expired credentials. This, in turn,
allows the target service to makes the ultimate decision as to whether
requests sent are valid or not instead of the client SDK determining
their validity.

The way it is implemented is `ImdsCredentialsProvider` now stores a last
retrieved credentials which will later be served when IMDS is unreachable.

* Add tests to IMDS credentials provider

This commit adds tests to IMDS credentials providers for static stability
support. These tests are prescribed in smithy-lang#2117.
From an IMDS credentials provider' perspective, however, some of the tests
are considered to fall under the same equivalence class with others.
Therefore, a single test can cover multiple test cases.

* Update CHANGELOG.next.toml

* Update CHANGELOG.next.toml

Co-authored-by: John DiSanti <jdisanti@amazon.com>

---------

Co-authored-by: Yuki Saito <awsaito@amazon.com>
Co-authored-by: John DiSanti <jdisanti@amazon.com>

* Increase build image backoff time and attempts (smithy-lang#2267)

* Must set a member in unions (smithy-lang#2241)

* Must set a member in unions

Signed-off-by: Daniele Ahmed <ahmeddan@amazon.de>

* Add `runs-on` (smithy-lang#2273)

* Add `runs-on` (smithy-lang#2275)

* Fix job trigger. Clarify that short SHAs won't work. (smithy-lang#2278)

* Clarify what type of reference we are trying to push. (smithy-lang#2279)

* Update gradle properties for dry runs as well. (smithy-lang#2280)

* Get verbose logging from the branch script. (smithy-lang#2281)

* Fix trigger for following jobs. (smithy-lang#2282)

* Fix action paths. (smithy-lang#2283)

* We need to checkout in the `smithy-rs` folder because that's an assumption baked into the definition of the docker-build action. (smithy-lang#2284)

* Use action-arguments for arguments. (smithy-lang#2285)

* Fix if condition. (smithy-lang#2286)

* Persist the modified gradle.properties outside of the Docker context (smithy-lang#2287)

* Fix if condition.

* Make sure that the changes are visible to the push step after the Docker action has executed by persisting the modified repository as an artifact.

* Give a name to the argument.

* Use larger machines for the slowest CI jobs. (smithy-lang#2263)

* Use larger machines for the slowest CI jobs.

* Fix invocation.

* Fix paths. (smithy-lang#2288)

* Commit if modified (smithy-lang#2289)

* Fix paths.

* Fix commit logic - it should commit when there were changes.

* Set user name and email when committing (smithy-lang#2290)

* `git push origin` fails if there is nothing to push. (smithy-lang#2291)

* Use curly braces to group together the commit and push action (smithy-lang#2292)

* Fix syntax error when grouping commands. (smithy-lang#2293)

* [release-branches] Fix the reference that gets checked out (smithy-lang#2294)

* Fix the reference that gets checked out

* Fix

* Fix broken doc link to `tokio_stream::Stream` (smithy-lang#2271)

* Fix broken doc link to `Stream`

This commit fixes broken doc link to `Stream` in codegen clients. That
target is `tokio_stream::Stream`, which in turn is a re-export of
`futures_core::Stream`. Since we do not have a good way to link to the
re-export, we remove a hyper link and just put `Stream`.

* Update CHANGELOG.next.toml

---------

Co-authored-by: Yuki Saito <awsaito@amazon.com>

* Use docker login when possible (smithy-lang#2265)

Login to ECR when credentials are available to improve CI performance

* Simplify `AdHocSection` (smithy-lang#2272)

Co-authored-by: Russell Cohen <rcoh@amazon.com>

* Fix CI on main and don't acquire Docker login for forks (smithy-lang#2295)

* Fix CI on main and don't acquire Docker login for forks

* Convert empty env vars into `None`

* Optimize base image acquisition on main

* Collect more diagnostics. (smithy-lang#2297)

* [release-branches] Fix working directory (smithy-lang#2298)

* Fix working directory.

* Build the Docker image using the latest commit on the invocation branch.

* We need a Docker image with the same SHA later in the process.

* Clone does not preserve uncommitted changes. This was leading to the gradle.properties update being lost. (smithy-lang#2299)

* Make `OperationExtension` store the absolute shape ID (smithy-lang#2276)

* Do not alter Operation shape ID

* Add OperationExtensionExt test

* Add CHANGELOG.next.toml entry

* Apply suggestions from code review

Co-authored-by: Luca Palmieri <20745048+LukeMathWalker@users.noreply.github.com>

---------

Co-authored-by: Luca Palmieri <20745048+LukeMathWalker@users.noreply.github.com>

* Retrieve the output from outside the Docker context (smithy-lang#2300)

Co-authored-by: AWS SDK Rust Bot <97246200+aws-sdk-rust-ci@users.noreply.github.com>

* Fix image tagging in CI on main (smithy-lang#2301)

* Fix handling of repeated headers in AWS request canonicalization (smithy-lang#2261)

* Remove usage of always empty writable in `JsonParserGenerator` (smithy-lang#2192)

The writable calculates a string that it never writes.

This was added in smithy-lang#2131.

---------

Signed-off-by: Daniele Ahmed <ahmeddan@amazon.de>
Co-authored-by: Zelda Hessler <zhessler@amazon.com>
Co-authored-by: Harry Barber <106155934+hlbarber@users.noreply.github.com>
Co-authored-by: 82marbag <69267416+82marbag@users.noreply.github.com>
Co-authored-by: AWS SDK Rust Bot <aws-sdk-rust-primary@amazon.com>
Co-authored-by: Julian Antonielli <julianantonielli@gmail.com>
Co-authored-by: Julian Antonielli <jjantdev@amazon.co.uk>
Co-authored-by: AWS SDK Rust Bot <97246200+aws-sdk-rust-ci@users.noreply.github.com>
Co-authored-by: Russell Cohen <rcoh@amazon.com>
Co-authored-by: ysaito1001 <gperson22@gmail.com>
Co-authored-by: Yuki Saito <awsaito@amazon.com>
Co-authored-by: John DiSanti <jdisanti@amazon.com>
Co-authored-by: Luca Palmieri <20745048+LukeMathWalker@users.noreply.github.com>
Co-authored-by: Burak <unexge@gmail.com>
Co-authored-by: david-perez <d@vidp.dev>
Co-authored-by: Nipunn Koorapati <nipunn1313@gmail.com>
@mustakimali
Copy link

mustakimali commented May 9, 2023

Hi, we are seeing some errors in production following the addition of the logic here

    fn maybe_extend_expiration(&self, expiration: SystemTime) -> SystemTime {
        // [ ... ]
        // calculate credentials' refresh offset with jitter
        let refresh_offset =
            CREDENTIAL_EXPIRATION_INTERVAL + Duration::from_secs(rng.u64(120..=600));
        let new_expiry = self.time_source.now() + refresh_offset;

        if new_expiry < expiration {
            return expiration;
        }

        tracing::warn!(
            "Attempting credential expiration extension due to a credential service availability issue. \
            A refresh of these credentials will be attempted again within the next {:.2} minutes.",
            refresh_offset.as_secs_f64() / 60.0,
        );

        new_expiry
    }

The error we are seeing happening infrequently and when it happens, it lasts for 10/15mins.

Error { code: \"ExpiredTokenException\", message: \"The security token included in the request is expired\", aws_request_id: \"U9LKTDURCE8893LKBN5R1Q4ASFVV4KQNSO5AEMVJF66Q9ASUAAJG\" }

In our setup the credentials are coming from kube2iam and the expiry is set to maximum 30 mins. We are wondering if this is conflicting with any of the assumptions made here? Of course this could entrely be a bug in kube2iam but I wanted to ask here first since this is an ongoing effort to change how the SDK threats expired credentials.

Looking at the notes in #2117

Per internal AWS guidelines, SDKs must not "fail fast" and attempt to refresh expired IMDS credentials without first receiving a service exception. This is because a service may begin accepting expired credentials due to a credential service availability issue.

Is it possible the sdk has been updated to allow the usage of expired tokens without updating the upstream services to relax the validation?

@rcoh
Copy link
Collaborator

rcoh commented May 9, 2023

It seems like our logic might be incorrect—I don't think the intention was to always increase the expiry on credentials coming from IMDS (as seems to be the case in the diff). We'll dig into it and get a fix out.

Do you mind verifying if you see

        tracing::warn!(
            "Attempting credential expiration extension due to a credential service availability issue. \
            A refresh of these credentials will be attempted again within the next {:.2} minutes.",
            refresh_offset.as_secs_f64() / 60.0,
        );

in your logs? just to confirm that this is the culprit?

@mustakimali
Copy link

Yes I see this log line in the affected pods. In fact this is what lead us to this commit.

ysaito1001 added a commit that referenced this pull request May 10, 2023
@ysaito1001
Copy link
Contributor Author

Thank you for reporting the issue. Opened a PR to address it.

@rcoh
Copy link
Collaborator

rcoh commented May 24, 2023

this should be fixed in the latest release to aws-sdk-rust

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants