-
Notifications
You must be signed in to change notification settings - Fork 657
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Runtime context fails to detach token #2606
Comments
I have the same request. Maybe the code could also be changed to except Exception as e: # pylint: disable=broad-except
logger.exception("Failed to detach context") Is a PR possible? |
@jha-hitesh |
Traceback (most recent call last):
File "/project/.venv/lib/python3.10/site-packages/opentelemetry/context/__init__.py", line 153, in detach
_RUNTIME_CONTEXT.detach(token) # type: ignore
File "/project/.venv/lib/python3.10/site-packages/opentelemetry/context/contextvars_context.py", line 56, in detach
self._current_context.reset(token) # type: ignore
ValueError: <Token var=<ContextVar name='current_context' default={} at 0x7fa9ac2df380> at 0x7fa9ac341780> was created in a different Context This kind of exception :) |
@sihrc |
@lzchen I'm not able to change this locally. Would you accept an exception which changes the log from |
@kasium |
This is fixed in #2774 |
Hi @sihrc, are you able to root cause this issue and fix it? I am having this exception and not sure how to fix it.
But I am not sure how to troubleshoot further. Any ideas? |
I'm having a similar issue to @junli-lyft regarding hitting this exception in Is it an actual error for |
Since this appears to be an issue for multiple users, I am going to re-open this with a different title. The original issue was about showing the exception with a log instead of a simple message.
Yes, |
Do any of you use |
Yeah, I am using gevent 21.12.0 (with monkey patching) and flask 1.1.4.
|
Is there any update on this issue? I am having a similar issue.
would work, but
would fail, because _RUNTIME_CONTEXT is None. I suspected that there is another global _RUNTIME_CONTEXT that was interfering, but it was not the case. Any guess what is causing the issue? |
Any updates on this? |
I'm starting to see this error now too after upgrading to python 3.11 and updating opentelemetry-api and sdk to 1.14.0. But don't fully understand why it is being thrown only what seems to be in our deployed GCS flask application and not on development. We are using gunicorn. |
What's the worker class you are using with gunicorn? |
gthread |
Are there any leads on how to reproduce consistently? I'm getting this at random times for my deployed app and will need to basically turn off all tracing if I can't identify the issue. I have a pubsub listener endpoint which triggers things to be done in the background, that seems to be particularly susceptible to it but I see it happening to others. I don't think _RUNTIME_CONTEXT is None or else it would be an attribute error on _RUNTIME_CONTEXT.detach(). I'm not sure how it determines that the token has already been used once? How could this fail more gracefully?
|
server: FastAPI (uvicorn with uvloop) Chiming in with yet another observed instance of this behavior. We're instrumenting a FastAPI app and first started seeing this original method (does NOT cause the exception): @tracer.start_as_current_span("get_db")
def get_db(db_state: Any = Depends(reset_db_state)) -> Generator[None, None, None]:
with db.connection_context():
yield tweaked method (DOES cause the exception): def get_db(db_state: Any = Depends(reset_db_state)) -> Generator[None, None, None]:
with tracer.start_as_current_span("get_db") as span:
span.add_event(name="entry", attributes=dict(db_is_closed=db.is_closed()))
with db.connection_context():
span.add_event(name="in context, pre-yield", attributes=dict(db_is_closed=db.is_closed()))
yield
span.add_event(name="in context, post-yield", attributes=dict(db_is_closed=db.is_closed()))
span.add_event(name="exit", attributes=dict(db_is_closed=db.is_closed())) The original motivation for the instrumentation change was when we noticed that we weren't getting full spans using the decorator approach, and we wanted to add some events around the method's Since trying out the tweaked method and observing this exception, we've tried a few different things:
|
I'm getting the same on 1.17.0. |
Are you using any of gevent/eventlet/tornado etc...? |
No. |
Getting the same here, using
Is a FastAPI app using |
I've been experiencing the same or similar issue, and digging into the code of opentelemetry and Flask I think I have the beginnings of a solution - "works for me".
My use-case is I want to use # Suppose executor is a ThreadPoolExecutor
# Same results from the flask-executor package
f1 = executor.submit(remote_call_1, "foo")
f2 = executor.submit(remote_call_2, "bar")
f3 = executor.submit(remote_call_3, "baz")
r1 = f1.result()
r2 = f2.result()
r3 = f3.result()
do_something(r1, r2, r3) When I cause the logger to log the However, should (for example) note I have not been able to reproduce this with programmatic instrumentation, I can only get it to occur in a Kubernetes environment with auto-instrumentation. The solution is threefold.
If this is interesting ping me and I will see if I am allowed to share the code. |
We have the same issue using dramatiq. We're using opentelemetry-python with gunicorn and threads and we don't have any issues but with dramatiq and a middleware similar to this one we're suffering this issue |
@srikanthccv, I'm actively getting this issue with a tornado app. python3.11, tornado 6.3.2. Why do you ask? |
I've run into this. I can reproduce it with the script below. I've tried my absolute best to reduce it to a minimal example but it's very weird and sensitive. Even running this code in a Docker container removes the error. Here's the script: import openai
from opentelemetry import context
import logfire
logfire.configure(
token='LCyjlJqNbFXqVmQ6lZVLKwxSrPJmKWgXmZG0rJpkS0c4'
# Uncommenting this removes the error:
# send_to_logfire=False
# This might be because sending to logfire adds a thread for a BatchSpanProcessor.
)
class MyStream:
def __iter__(self):
# Replacing self._it with a plain variable removes the error,
# probably because it affects garbage collection.
self._it = self.stream()
for item in self._it:
yield item
# Replacing the above loop with this removes the error.
# yield from self._it
def stream(self):
with logfire.span('streaming'):
yield 1
yield 2
def patched_post(*_, **__):
# Removing this line removes the error. It returns None.
context.get_value('foo')
return MyStream()
# Uncommenting this removes the error, whether or not you keep the same line above within the function.
# context.get_value('foo')
# You DON'T need to fill in a real API key here.
client = openai.Client(api_key='...')
client.post = patched_post
# Uncommenting this removes the error, as does messing with the create method in various ways.
# client.chat.completions.create = patched_post
# Extracting the result of this method call into a global variable removes the error,
# because it means the generator will only be garbage collected on shutdown.
for _ in client.chat.completions.create(model=1, messages=1, stream=1):
# This is critical - it ensures that the `stream()` generator is left suspended when garbage collected.
break And the error output:
To run this code you will need to
The essential part is that a context is attached in a generator which never completes properly, so the generator exits when it's garbaged collected and that's when detaching the context tries and fails. I don't know why it fails, I'm guessing garbage collections happens 'in a different context'? It seems like a CPython bug. |
another example, the error occured when the reset is done in a async/await function :
result :
|
@sylvainmouquet that example can be reduced to: import asyncio
import contextvars
var = contextvars.ContextVar('var')
token = var.set(1)
async def main():
var.reset(token)
asyncio.run(main()) Here the error message makes sense. The token was created when calling |
Here's a good repro: import asyncio
from opentelemetry import context, trace
async def gen():
token = context.attach(trace.set_span_in_context('span'))
try:
yield 1
yield 2
finally:
context.detach(token)
async def main():
async for i in gen():
print(i)
break
asyncio.run(main()) The error isn't there if you remove the |
When using a generator, the solution is to use |
Here's a minimal repro without async: import threading
from opentelemetry import context, trace
class Foo:
pass
def gen(_foo):
token = context.attach(trace.set_span_in_context('span'))
try:
yield 1
yield 2
finally:
context.detach(token)
def main():
# Create a circular reference to delay garbage collection
foo = Foo()
foo.gen = gen(foo)
for _ in foo.gen:
# Leave the generator suspended
break
threading.Thread(target=main).start() |
there is the function closing :
|
The problem is that OTEL SDKs can typically only add spans to iterators to instrument them, they can't ensure/require that the code creating/using those spans closes/finishes the iterators. For example, consider this code in def _end_span_after_iterating(iterable, span, token):
try:
with trace.use_span(span):
yield from iterable
finally:
close = getattr(iterable, "close", None)
if close:
close()
span.end()
if token is not None:
context.detach(token) Here |
Ensure that the body of |
Describe your environment
python 3.7
ujson
sanic==20.9.1
opentelemetry-api==1.9.1
opentelemetry-sdk==1.9.1
opentelemetry-propagator-jaeger==1.9.1
opentelemetry-exporter-jaeger-thrift==1.9.1
opentelemetry-instrumentation==0.28b1
opentelemetry-exporter-otlp-proto-http==1.10.0
Steps to reproduce
i have used the opentelemetry-instrumentation to create a middleware for sanic web framework, this middleware allows to trace a request, all seems good but sometimes the error
Failed to detach context
comes randomly for some request.What is the expected behavior?
instead of
Failed to detach context
error , original exception should be logged as exception so that the actual issue can be debugged with proper stacktrace etc.opentelemetry-python/opentelemetry-api/src/opentelemetry/context/init.py line
here instead of
it should be
What is the actual behavior?
getting
Failed to detach context
error message instead of original message.Additional context
The text was updated successfully, but these errors were encountered: