-
Notifications
You must be signed in to change notification settings - Fork 482
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Flush full batches instead of silently dropping #245
Conversation
|
@samschlegel looks good, should be easy to rebase, or just apply your changes here now. |
There may be other questions around backpressure here if the intent is to have spans never be dropped for load shedding purposes. If spans are being produced faster than the exporter can send, where should they buffer and should they buffer until OOM or should there be a cap elsewhere at which point spans start dropping? Currently for example you could set your |
Yeah I was thinking that as well right after I posted this. Perhaps what would be better instead is to add metrics so that we can track the buffer size and how many spans have been dropped. As is it's pretty silent, and in our services we hit that default queue size pretty quickly (which which works out to ~410 spans/sec) and I imagine others probably would as well. |
@samschlegel yeah definitely folks interested in having more metrics awareness generally on the trace side open-telemetry/opentelemetry-specification#381 for example. There is also now a global error handler, it's currently used for errors in metrics but could (and should) be extended to trace errors, this could be a reasonable way to get a hook do do app-specific behavior. |
open-telemetry/opentelemetry-specification#381 seems to be more about deriving metrics from traces, not adding traces to the internals of the SDK side of things? What I'd want is the span_processor to export internal metrics for the queue length that we could hookup through normal opentelemetry metrics means to export to prometheus. The current metrics I'm thinking about are |
@samschlegel that seems reasonable, but likely outside the scope of this PR. Should we close this an open an issue for span processor metrics? |
Closing this for now, can follow up with a metrics approach in the future. |
Currently if we have an export rate higher than the buffer can hold for the given interval, we just lose spans which doesn't seem great. This changes the worker logic to flush when the buffer is full.
I believe it will probably have some conflicts with master due to the async changes, so will probably need some changes, but wanted to throw up a PR as I've had this sitting in a branch for a while