EC2 resource detector hangs for a long time outside of an EC2 instance #1088

jemisonf · 2022-05-13T21:41:33Z

Describe your environment Describe any aspect of your environment relevant to the problem, including your Python version, platform, version numbers of installed dependencies, information about your cloud hosting provider, etc. If you're reporting a problem with a specific version of a library in this repo, please check whether the problem has been fixed on main.

The environment I initially saw this in was a container running in Docker compose on an AWS EC2 instance but I've been able to reproduce it on my laptop as well. I think it will show up in anything not directly running in AWS.

Steps to reproduce
Describe exactly how to reproduce the error. Include a code sample if applicable.

The following code reproduced the issue on my laptop:

from opentelemetry.sdk.extension.aws.resource.ec2 import AwsEc2ResourceDetector
from opentelemetry.sdk.resources import get_aggregated_resources

resource = get_aggregated_resources(
    detectors=[AwsEc2ResourceDetector()]
)

What is the expected behavior?

It should complete quickly (this is the behavior I see running on an EC2 instance).

What is the actual behavior?

What did you see instead?

On my laptop, it will hand ~indefinitely.

Note: one solution is just to remove the resource detector but we'd like to be able to include it and just have it fail silently, which is the behavior we've seen in other resource detectors.

Additional context

I think the problem is here:

opentelemetry-python-contrib/sdk-extension/opentelemetry-sdk-extension-aws/src/opentelemetry/sdk/extension/aws/resource/ec2.py

Line 37 in 80969a0

timeout=1000,

It looks like the request is using a 1000 second timeout which I suspect is intended to be a 1000 millisecond timeout. At least with the server program I've been working on that will block the startup of the program until the request completes.

You can verify by running:

curl http://169.254.169.254/latest/api/token

Which is one of the requests that the resource detector makes -- it should hang indefinitely as well.

The text was updated successfully, but these errors were encountered:

Fix open-telemetry#1088 According to the docs, the value for `timeout` is in seconds: https://docs.python.org/3/library/urllib.request.html#urllib.request.urlopen. 1000 seconds seems slow and in some cases can block the startup of the program being instrumented (see open-telemetry#1088 as an example), because the request will hang indefinitely in non-AWS environments. Using a much shorter 1 second timeout seems like a reasonable workaround for this.

* Use a shorter timeout for AWS EC2 metadata requests Fix #1088 According to the docs, the value for `timeout` is in seconds: https://docs.python.org/3/library/urllib.request.html#urllib.request.urlopen. 1000 seconds seems slow and in some cases can block the startup of the program being instrumented (see #1088 as an example), because the request will hang indefinitely in non-AWS environments. Using a much shorter 1 second timeout seems like a reasonable workaround for this. * add changelog entry for timeout change * use 5s timeout for ECS and EKS, update changelog Co-authored-by: Srikanth Chekuri <srikanth.chekuri92@gmail.com>

jemisonf added the bug Something isn't working label May 13, 2022

jemisonf mentioned this issue May 13, 2022

Use a shorter timeout for AWS EC2 metadata requests #1089

Merged

11 tasks

ocelotl closed this as completed in #1089 May 24, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

EC2 resource detector hangs for a long time outside of an EC2 instance #1088

EC2 resource detector hangs for a long time outside of an EC2 instance #1088

jemisonf commented May 13, 2022

EC2 resource detector hangs for a long time outside of an EC2 instance #1088

EC2 resource detector hangs for a long time outside of an EC2 instance #1088

Comments

jemisonf commented May 13, 2022