Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EC2 resource detector hangs for a long time outside of an EC2 instance #1088

Closed
jemisonf opened this issue May 13, 2022 · 0 comments · Fixed by #1089
Closed

EC2 resource detector hangs for a long time outside of an EC2 instance #1088

jemisonf opened this issue May 13, 2022 · 0 comments · Fixed by #1089
Labels
bug Something isn't working

Comments

@jemisonf
Copy link
Contributor

Describe your environment Describe any aspect of your environment relevant to the problem, including your Python version, platform, version numbers of installed dependencies, information about your cloud hosting provider, etc. If you're reporting a problem with a specific version of a library in this repo, please check whether the problem has been fixed on main.

The environment I initially saw this in was a container running in Docker compose on an AWS EC2 instance but I've been able to reproduce it on my laptop as well. I think it will show up in anything not directly running in AWS.

Steps to reproduce
Describe exactly how to reproduce the error. Include a code sample if applicable.

The following code reproduced the issue on my laptop:

from opentelemetry.sdk.extension.aws.resource.ec2 import AwsEc2ResourceDetector
from opentelemetry.sdk.resources import get_aggregated_resources

resource = get_aggregated_resources(
    detectors=[AwsEc2ResourceDetector()]
)

What is the expected behavior?

It should complete quickly (this is the behavior I see running on an EC2 instance).

What is the actual behavior?

What did you see instead?

On my laptop, it will hand ~indefinitely.

Note: one solution is just to remove the resource detector but we'd like to be able to include it and just have it fail silently, which is the behavior we've seen in other resource detectors.

Additional context

I think the problem is here:

It looks like the request is using a 1000 second timeout which I suspect is intended to be a 1000 millisecond timeout. At least with the server program I've been working on that will block the startup of the program until the request completes.

You can verify by running:

curl http://169.254.169.254/latest/api/token

Which is one of the requests that the resource detector makes -- it should hang indefinitely as well.

@jemisonf jemisonf added the bug Something isn't working label May 13, 2022
jemisonf added a commit to jemisonf/opentelemetry-python-contrib that referenced this issue May 13, 2022
Fix open-telemetry#1088 

According to the docs, the value for `timeout` is in seconds: https://docs.python.org/3/library/urllib.request.html#urllib.request.urlopen. 1000 seconds seems slow and in some cases can block the startup of the program being instrumented (see open-telemetry#1088 as an example), because the request will hang indefinitely in non-AWS environments. Using a much shorter 1 second timeout seems like a reasonable workaround for this.
ocelotl pushed a commit that referenced this issue May 24, 2022
* Use a shorter timeout for AWS EC2 metadata requests

Fix #1088 

According to the docs, the value for `timeout` is in seconds: https://docs.python.org/3/library/urllib.request.html#urllib.request.urlopen. 1000 seconds seems slow and in some cases can block the startup of the program being instrumented (see #1088 as an example), because the request will hang indefinitely in non-AWS environments. Using a much shorter 1 second timeout seems like a reasonable workaround for this.

* add changelog entry for timeout change

* use 5s timeout for ECS and EKS, update changelog

Co-authored-by: Srikanth Chekuri <srikanth.chekuri92@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant