-
Notifications
You must be signed in to change notification settings - Fork 632
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NXDOMAIN errors should not be retried #2083
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR. You've provided motivation for the PR but a code example of what is currently not working would be helpful.
Some minor correctness issues but otherwise looks ok.
aws/retry/retryable_error.go
Outdated
|
||
switch { | ||
case errors.As(err, &dnsError): | ||
// NXDOMAIN errors should not be retried | ||
retryable = false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
correctness: Not all instances of net.DNSError
are related to NXDOMAIN errors. This should be predicated on the fields in net.DNSError.
e.g. something like
case errors.As(err, &dnsError):
// NXDOMAIN and other non temporary DNS errors should not be retried
retryable = !dnsError.IsNotFound && dnsError.IsTemporary
The standard retryer keeps retrying domains that do not resolve. This hangs the execution until retries are exhausted. NXDOMAIN or 'no such host' errors are not transient and should fail immediately. An example would be when a user enters an invalid region that doesn't resolve.
Hi @aajtodd thanks for reviewing my code and providing feedback. I've updated the PR to include your suggestion and tested it. Here is a code example. Use case is when a user is trying to upload an object to Amazon S3 and enters the wrong region. If the user increases the var (
region = flag.String("region", "us-west-2", "region")
)
func init() {
flag.Parse()
}
func main() {
fmt.Printf("region input: %s\n", *region)
cfg, err := config.LoadDefaultConfig(context.TODO(),
config.WithRegion(*region),
config.WithRetryer(func() aws.Retryer {
return retry.NewStandard(func(o *retry.StandardOptions) {
o.MaxAttempts = 10
})
}))
if err != nil {
log.Fatal(err.Error())
}
client := s3.NewFromConfig(cfg)
bucket := "bolyanko-temp"
key := "test-object"
_, err = client.PutObject(context.TODO(), &s3.PutObjectInput{
Bucket: &bucket,
Key: &key,
Body: new(bytes.Buffer),
})
if err != nil {
panic(err)
}
fmt.Printf("object uploaded successfully\n")
} # with code as-is today it takes 1min 49 seconds to fail:
time ./pathtest --region us-west-x
region input: us-west-x
panic: operation error S3: PutObject, exceeded maximum number of attempts, 10, https response error StatusCode: 0, RequestID: , HostID: , request send failed, Put "https://bolyanko-temp.s3.us-west-x.amazonaws.com/test-object?x-id=PutObject": dial tcp: lookup bolyanko-temp.s3.us-west-x.amazonaws.com: no such host
goroutine 1 [running]:
main.main()
/Users/bolyanko/src/s3-obj-lambda/pathTests/main.go:45 +0x318
./pathtest --region us-west-x 0.01s user 0.02s system 0% cpu 1:48.99 total # with this PR it fails immediately
time ./pathtest --region us-west-x
region input: us-west-x
panic: operation error S3: PutObject, https response error StatusCode: 0, RequestID: , HostID: , request send failed, Put "https://bolyanko-temp.s3.us-west-x.amazonaws.com/test-object?x-id=PutObject": dial tcp: lookup bolyanko-temp.s3.us-west-x.amazonaws.com: no such host
goroutine 1 [running]:
main.main()
/Users/bolyanko/src/s3-obj-lambda/pathTests/main.go:45 +0x318
./pathtest --region us-west-x 0.00s user 0.01s system 1% cpu 0.706 total |
The standard retryer keeps retrying domains that do not resolve. This hangs the execution until retries are exhausted.
NXDOMAIN or 'no such host' errors are not transient and should fail immediately. An example would be when a user enters an invalid region that doesn't resolve.
For changes to files under the
/codegen/aws-models
folder, and manual edits to autogenerated code (e.g./service/s3/api.go
) please create an Issue instead of a PR for those type of changes.If the PR addresses an existing bug or feature, please reference it here.
To help speed up the process and reduce the time to merge please ensure that
Allow edits by maintainers
is checked before submitting your PR. This will allow the project maintainers to make minor adjustments or improvements to the submitted PR, allow us to reduce the roundtrip time for merging your request.