Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KAFKA-16371; fix lingering pending commit when handling OFFSET_METADATA_TOO_LARGE #16072
KAFKA-16371; fix lingering pending commit when handling OFFSET_METADATA_TOO_LARGE #16072
Changes from all commits
67c815d
faf5e06
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we complete this pending offset if one of the offsets in the transaction didn't write?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, if the client does not retry the failed offset and commit the transaction. we basically commit whatever is pending.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
interesting -- so the client doesn't even retry. Do they at least get a clear error that it failed and they can choose to retry?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at the TxnOffsetCommitHandler, it seems like this should be a fatal error?
kafka/clients/src/main/java/org/apache/kafka/clients/producer/internals/TransactionManager.java
Line 1336 in d585a49
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You brought a good point. The server returns INVALID_OFFSET to the client and the java client does consider it as a fatal error. Therefore, the transaction won’t be committed in the end.
Hum… It does not explain the situation that I was investigating then. Except if another client was used. I have another theory that I must validate. I will keep you posted.
That being said, this is still a bug that leaves the state on the server inconsistent. The patch is still valid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the patch is valid but I do wonder if it was needed to fix for transactinoal clients and/or if we should change the logic for the transactional clients to not allow only some offsets to be committed.
I guess that is trickier to enforce. I guess for this PR I just wonder if this test suggests a bad behavior. 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding is that the client already fails in this case because it transitions to the fatal state. This means that the transaction will be aborted, no? Therefore, the partial offsets are not committed in the end.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right. I guess it's just a little confusing to have the test model a behavior that shouldn't happen. But I'm not sure if there is a simple fix for this.