-
Notifications
You must be signed in to change notification settings - Fork 633
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: safe kafka partition extraction #872
fix: safe kafka partition extraction #872
Conversation
return instance._partition( | ||
topic, partition, key, value, key_bytes, value_bytes | ||
) | ||
except Exception as exception: # pylint: disable=W0703 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does this affect users? In case this case does trigger, will we end up not recording a span, omitting some info from the span or will this affect the instrumented service or kafka client in some way?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will affect the instrumented service - hence the need to protect it with try/except
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand that. I'm asking how does this fix affect users? Will it result in missing spans, missing attributes on spans or something else entirely?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing attribute - the partition
topic, partition, key, value, key_bytes, value_bytes | ||
) | ||
except Exception as exception: # pylint: disable=W0703 | ||
_LOG.debug("Unable to extract partition: %s", exception) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this be info/warn so we can actually find and fix the issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the impact of not collecting this data is low, I would not want to "bother" the user with a warning. I can go with info, but debug is also ok - whatever you think is best.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice to know what exactly in this function is brittle so we can fix it later instead of wrapping it in a try/except and forgetting about it forever.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I definitely agree - that line that I've addedall_partitions = instance._metadata.partitions_for_topic(topic)
would've solved the crash. But I wanted to be on the safe side
Please update the changelog. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should find out the real reason for exceptions and add defensive checks around that but in the spirit of getting this into the next release, we can merge as is.
@owais Do I need to update the changelog, even though this instrumentation wasn't released yet? |
Yes, please update changelog |
@owais Done |
Description
In some cases, which aren't consistently reproducible, the partition extraction of the kafka-python instrumentation fails. Since this isn't a crucial part of the instrumentation, we simply protect it with try/except to avoid a crash.
Fixes # (issue)
Type of change
How Has This Been Tested?
A simple try/catch to prevent an issue that happened in our production environment.
Does This PR Require a Core Repo Change?
Checklist: