Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Intermittent failure to connect to S3 with credentials issue #174

Closed
MarkRoss-Eviden opened this issue May 11, 2024 · 4 comments
Closed

Comments

@MarkRoss-Eviden
Copy link
Contributor

This issue occurs randomly and has done for a long time (i.e. not been introduced to my knowledge by recent changes to 5.1 or 5.2), and not very often, making it difficult to troubleshoot. Training works fine for hours (perhaps days), and then robomaker exits.

For example this one happened 960 episodes in, you can see it's working and then suddenly it's not. Instance uses an IAM Instance Profile with full access to S3 (if permissions were an issue it'd fail immediately) : -
image

Seems this is an issue not limited to DRfC, but is seen by other users doing other things: -
boto/botocore#2117
rom1504/img2dataset#137

There's a suggestion increasing var 'AWS_METADATA_SERVICE_NUM_ATTEMPTS' could work, as we might be getting throttled: -
image

https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html

@larsll
Copy link
Contributor

larsll commented May 11, 2024

So two possible workarounds:

  • Add normal AWS IAM credentials
  • Add the environment variable into docker/docker-compose-training.yml

@MarkRoss-Eviden
Copy link
Contributor Author

I'll work on 'Add the environment variable into docker/docker-compose-training.yml' and see what happens, as long as it doesn't introduce new issues it should be safe to merge as adding static creds to instances isn't aws best practice and is actively discourage for security.

@MarkRoss-Eviden
Copy link
Contributor Author

are there any commands in the containers that would be getting the creds specifically, or is it just background stuff the instance is doing?

@MarkRoss-Eviden
Copy link
Contributor Author

MarkRoss-Eviden commented May 13, 2024

fixed by #178

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants