Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Skip processing buffer after partial message detected #1051

Merged
merged 1 commit into from
Aug 30, 2018

Conversation

verakruhliakova
Copy link
Contributor

Kafka server may add partial message to the end of the message set with, for example, the following data:

MessageSet => [Offset MessageSize Message]
  Offset => 0
  MessageSize => 16216
Message => Crc MagicByte Attributes Key Value 
  Crc => 0
  MagicByte => 0
  Attributes => 0
  Key => buffer of 0 bits
  Value => buffer of 0 bits

Instead of key/value the message is filled with 0 bits sequence. I couldn't find any special requirements on the format of the partial message, so maybe this is allowed behavior. But with such message from the server consumer continues decoding 0 bits, emits empty messages and the worst - it ends up with array of 0 offsets, e.g. [96510, 96511, 96512, 0, 0, 0, 0, 0], which leads to starting reading the next batch from 0 offset and gets into an infinite cycle.

(Took info about protocol from https://cwiki.apache.org/confluence/display/KAFKA/A+Guide+To+The+Kafka+Protocol#AGuideToTheKafkaProtocol-Messagesets and https://kafka.apache.org/documentation.html#impl_reads )

@aikar
Copy link
Contributor

aikar commented Aug 17, 2018

Could this be what is happening in #998 ?

@verakruhliakova
Copy link
Contributor Author

@aikar I think #998 is different. Here client emits fake empty messages, like

{
  "topic": "events-streaming",
  "value": "",
  "offset": 0,
  "partition": 0,
  "highWaterOffset": 167780,
  "key": ""
}

And after this it starts reading from 0 offset again. But the order of messages and offset sequence are correct.
In #998 client emits real messages with 0 offset but with correct values.

@ahmedlafta
Copy link
Contributor

FYI I've tested this code change locally against a 2.0 broker and it resolves some issues reported in against this library (e.g. #1054)

If we could get a release with this code fix that'd be great.

@thomaslee
Copy link
Contributor

If we could get a release with this code fix that'd be great.

+1 to that -- if this is what I think it is, this is a consumer-breaking bug for folks on 2.x. We were bitten by a similar issue in Sarama: IBM/sarama#1149

@praveenkumaresan
Copy link

+1 to that. We have tested this code locally against 2.0 broker and it resolves issue we faced consuming empty messages in a loop!

@aikar
Copy link
Contributor

aikar commented Aug 29, 2018

Why is kafka sending these messages now anyways?

@thomaslee
Copy link
Contributor

Why is kafka sending these messages now anyways?

Possible red herring, but reading between the lines on the Sarama bug & KAFKA-7030, I think it might be KIP-283.

From the KIP (emphasis mine):

Given this, we have three possible scenarios:

Spre = Spost: This is the ideal scenario where size of down-converted messages is exactly equal to the size before down-conversion. This requires no special handling.
Spre < Spost: Because we committed to and cannot write more than Spre, we will not be able to send out all the messages to the consumer. We will down-convert and send all message batches that fit within Spre.
Spre > Spost: Because we need to write exactly Spre, we append a "fake" message at the end with maximum message size (= Integer.MAX_VALUE). Consumer will treat this as a partial message batch and should thus ignore it.

If this is what's causing the partial messages, it seems like setting message.downconversion.enable=false in broker config might be an easy work-around for folks who are using an older log.message.format.version & can't upgrade their consumers ... can somebody confirm this? @praveenkumaresan @ahmedlafta maybe? If not, would love to know if we see a different error.

@hyperlink hyperlink merged commit 55f5dc2 into SOHU-Co:master Aug 30, 2018
@verakruhliakova verakruhliakova deleted the skip-partial-message branch September 3, 2018 10:47
@verakruhliakova verakruhliakova restored the skip-partial-message branch September 3, 2018 10:48
@tvvignesh
Copy link

We are facing this bug too. Kindly release this fix as soon as possible. Thanks.

@hyperlink hyperlink mentioned this pull request Sep 4, 2018
@hyperlink
Copy link
Collaborator

Published as 3.0.0

corlobepy pushed a commit to opentable/kafka-node that referenced this pull request May 24, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants