Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(amazonq): retry S3 upload #6246

Merged
merged 7 commits into from
Dec 17, 2024
Merged

Conversation

dhasani23
Copy link
Contributor

@dhasani23 dhasani23 commented Dec 15, 2024

Problem

S3 upload sometimes fails due to transient issues.

Solution

Add up to 3 retries.


  • Treat all work as PUBLIC. Private feature/x branches will not be squash-merged at release time.
  • Your code changes must meet the guidelines in CONTRIBUTING.md.

License: I confirm that my contribution is made under the terms of the Apache 2.0 license.

@dhasani23 dhasani23 requested review from a team as code owners December 15, 2024 00:26
@dhasani23 dhasani23 marked this pull request as draft December 15, 2024 00:26
const mavenJavaProjects = await getMavenJavaProjects(javaProjects)

// These projects we know must contain a pom.xml and a .java file
const projectsValidToTransform = await getProjectsValidToTransform(mavenJavaProjects)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deleted this function entirely because 1) it was expensive (running a shell command (javap) on every open project) and 2) it was just used to format a String (see createLanguageUpgradeConfirmationPrompt above); every project returned by getMavenJavaProjects above was also returned by getProjectsValidToTransform

@@ -747,7 +747,7 @@ export const cleanInstallErrorNotification = `Amazon Q could not run the Maven c
export const enterJavaHomeChatMessage = 'Enter the path to JDK '

export const projectPromptChatMessage =
'I can upgrade your JAVA_VERSION_HERE. To start the transformation, I need some information from you. Choose the project you want to upgrade and the target code version to upgrade to. Then, choose Confirm.'
'I can upgrade your Java project. To start the transformation, I need some information from you. Choose the project you want to upgrade and the target code version to upgrade to. Then, choose Confirm.'
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comment below; rather than having 75+ LOC and running a shell command for each open project just to say "Java 8 project" or "Java 11 project" or "Java 8 and 11 projects", etc. just say "Java project"

getLogger().error(`CodeTransformation: ${uploadFailedError}`)
throw new Error(uploadFailedError)
}
getLogger().info('CodeTransformation: Upload to S3 succeeded')
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a big fan of implementing our own retry logic rather than doing it at the SDK layer, but the existing S3 client's upload functionality requires a bucket name as a part of the request, and 1) I don't think we should have the internal S3 bucket names we use floating around this repo, 2) we use multiple different buckets, and 3) we don't have access to the bucket name in the response of our CreateUploadUrl — this API would have to be changed and then we can use the existing S3 client.

Copy link
Contributor

@justinmk3 justinmk3 Dec 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for that context! That info is probably worth a code comment here or in the docstring of this function.

For reference, we have discussed some sort of util to "poll/retry stuff", see https://taskei.amazon.dev/tasks/IDE-15280 (startDevEnvironmentWithProgress is a very complex case, maybe not exactly analogous to standard "request retry").

@dhasani23
Copy link
Contributor Author

Looks like this tech debt-related test is failing

Comment on lines 547 to 558
try {
await uploadArtifactToS3(
'test.zip',
{ uploadId: '123', uploadUrl: 'http://test.com', kmsKeyArn: 'arn' },
'sha256',
Buffer.from('test')
)
} catch (e) {
error = e as Error
}
sinon.assert.match(
error?.message.includes('Upload failed with status code = 400; did not automatically retry'),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason not to use assert.rejects here (and in the other tests)? Search the codebase for usage examples.

Comment on lines 115 to 121
for (let i = 0; i < 3; i++) {
try {
response = await request.fetch('PUT', resp.uploadUrl, {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Non-blocker) Looks like waitUntil wouldn't work here because it only has "max time", not "max retries". Probably should add that. Being be able to "find all waitUntil references" can be useful to see all the areas in the codebase that are doing this pattern.

Copy link
Contributor Author

@dhasani23 dhasani23 Dec 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

waitUntil using "max time" is actually fine for our purposes. But it looks like it will always retry until the timeout period elapses, compared to this function which will only retry as long as the status code is retriable. I think in addition to supporting a maximum number of retries, it's probably worth extending waitUntil to support some kind of retryOnStatusCodes or something (similar to how IntelliJ's version of waitUntil lets you specify exceptionsToIgnore which is effectively used to retry only on those exceptions).

Until then, I think it's still better to use waitUntil, so I will update this PR accordingly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On second thought, I don't see a point in retrying the upload if we know it failed for non-retriable reasons, so I will keep the PR as it is for now. Updated the comment to reflect this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But it looks like it will always retry until the timeout period elapses, compared to this function which will only retry as long as the status code is retriable

Your callback passed to waitUntil can decide whether to retry by its return value. However, it would be nicer if waitUntil supported a more explicit way to stop retrying.

@dhasani23 dhasani23 marked this pull request as ready for review December 17, 2024 20:11
Copy link
Contributor

@justinmk3 justinmk3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can someone on your team review

@dhasani23 dhasani23 merged commit 6650cb9 into aws:master Dec 17, 2024
24 of 26 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants