Skip to content
This repository has been archived by the owner on Mar 17, 2024. It is now read-only.

Another " java.lang.IllegalArgumentException: Invalid negative offset" #120

Closed
andreminelli opened this issue Jan 27, 2020 · 17 comments · Fixed by #134
Closed

Another " java.lang.IllegalArgumentException: Invalid negative offset" #120

andreminelli opened this issue Jan 27, 2020 · 17 comments · Fixed by #134

Comments

@andreminelli
Copy link

andreminelli commented Jan 27, 2020

I am getting the following error - the message is the same as issue 41, but already running with 0.5.5 (and just tried 0.6.0, with same problem):

2020-01-27 19:00:08,718 DEBUG o.a.k.clients.consumer.KafkaConsumer  - [Consumer clientId=blip-server-lag-exporter, groupId=kafkalagexporter] Kafka consumer has been closed
2020-01-27 19:00:08,719 ERROR c.l.k.ConsumerGroupCollector$ akka://kafka-lag-exporter/user/consumer-group-collector-blip-server - Supervisor RestartSupervisor saw failure: A failure occurred while retrieving offsets.  Shutting down. java.lang.Exception: A failure occurred while retrieving offsets.  Shutting down.
        at com.lightbend.kafkalagexporter.ConsumerGroupCollector$CollectorBehavior.$anonfun$collector$1(ConsumerGroupCollector.scala:197)
        at akka.actor.typed.internal.BehaviorImpl$ReceiveBehavior.receive(BehaviorImpl.scala:134)
        at akka.actor.typed.Behavior$.interpret(Behavior.scala:274)
        at akka.actor.typed.Behavior$.interpretMessage(Behavior.scala:230)
        at akka.actor.typed.internal.InterceptorImpl$$anon$2.apply(InterceptorImpl.scala:55)
        at akka.actor.typed.internal.RestartSupervisor.aroundReceive(Supervision.scala:262)
        at akka.actor.typed.internal.InterceptorImpl.receive(InterceptorImpl.scala:83)
        at akka.actor.typed.Behavior$.interpret(Behavior.scala:274)
        at akka.actor.typed.Behavior$.interpretMessage(Behavior.scala:230)
        at akka.actor.typed.internal.adapter.ActorAdapter.handleMessage(ActorAdapter.scala:126)
        at akka.actor.typed.internal.adapter.ActorAdapter.aroundReceive(ActorAdapter.scala:106)
        at akka.actor.ActorCell.receiveMessage(ActorCell.scala:573)
        at akka.actor.ActorCell.invoke(ActorCell.scala:543)
        at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:269)
        at akka.dispatch.Mailbox.run(Mailbox.scala:230)
        at akka.dispatch.Mailbox.exec(Mailbox.scala:242)
        at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
        at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)
2020-01-27 19:00:08,718 DEBUG o.a.k.clients.consumer.KafkaConsumer  - [Consumer clientId=blip-server-lag-exporter, groupId=kafkalagexporter] Kafka consumer has been closed
2020-01-27 19:00:08,719 ERROR c.l.k.ConsumerGroupCollector$ akka://kafka-lag-exporter/user/consumer-group-collector-blip-server - Supervisor RestartSupervisor saw failure: A failure occurred while retrieving offsets.  Shutting down. java.lang.Exception: A failure occurred while retrieving offsets.  Shutting down.
        at com.lightbend.kafkalagexporter.ConsumerGroupCollector$CollectorBehavior.$anonfun$collector$1(ConsumerGroupCollector.scala:197)
        at akka.actor.typed.internal.BehaviorImpl$ReceiveBehavior.receive(BehaviorImpl.scala:134)
        at akka.actor.typed.Behavior$.interpret(Behavior.scala:274)
        at akka.actor.typed.Behavior$.interpretMessage(Behavior.scala:230)
        at akka.actor.typed.internal.InterceptorImpl$$anon$2.apply(InterceptorImpl.scala:55)
        at akka.actor.typed.internal.RestartSupervisor.aroundReceive(Supervision.scala:262)
        at akka.actor.typed.internal.InterceptorImpl.receive(InterceptorImpl.scala:83)
        at akka.actor.typed.Behavior$.interpret(Behavior.scala:274)
        at akka.actor.typed.Behavior$.interpretMessage(Behavior.scala:230)
        at akka.actor.typed.internal.adapter.ActorAdapter.handleMessage(ActorAdapter.scala:126)
        at akka.actor.typed.internal.adapter.ActorAdapter.aroundReceive(ActorAdapter.scala:106)
        at akka.actor.ActorCell.receiveMessage(ActorCell.scala:573)
        at akka.actor.ActorCell.invoke(ActorCell.scala:543)
        at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:269)
        at akka.dispatch.Mailbox.run(Mailbox.scala:230)
        at akka.dispatch.Mailbox.exec(Mailbox.scala:242)
        at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
        at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)
        at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)
        at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157)
Caused by: java.util.concurrent.ExecutionException: java.lang.IllegalArgumentException: Invalid negative offset
        at org.apache.kafka.common.internals.KafkaFutureImpl.wrapAndThrow(KafkaFutureImpl.java:45)
        at org.apache.kafka.common.internals.KafkaFutureImpl.access$000(KafkaFutureImpl.java:32)
        at org.apache.kafka.common.internals.KafkaFutureImpl$SingleWaiter.await(KafkaFutureImpl.java:104)
        at org.apache.kafka.common.internals.KafkaFutureImpl.get(KafkaFutureImpl.java:272)
        at com.lightbend.kafkalagexporter.KafkaClient$.$anonfun$kafkaFuture$1(KafkaClient.scala:50)
        at scala.concurrent.Future$.$anonfun$apply$1(Future.scala:659)
        at scala.util.Success.$anonfun$map$1(Try.scala:255)
        at scala.util.Success.map(Try.scala:213)
        at scala.concurrent.Future.$anonfun$map$1(Future.scala:292)
        at scala.concurrent.impl.Promise.liftedTree1$1(Promise.scala:33)
        at scala.concurrent.impl.Promise.$anonfun$transform$1(Promise.scala:33)
        at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:64)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.IllegalArgumentException: Invalid negative offset
        at org.apache.kafka.clients.consumer.OffsetAndMetadata.<init>(OffsetAndMetadata.java:50)
        at org.apache.kafka.clients.admin.KafkaAdminClient$25.handleResponse(KafkaAdminClient.java:3018)
        at org.apache.kafka.clients.admin.KafkaAdminClient$AdminClientRunnable.handleResponses(KafkaAdminClient.java:1076)
        at org.apache.kafka.clients.admin.KafkaAdminClient$AdminClientRunnable.run(KafkaAdminClient.java:1204)
        ... 1 common frames omitted

This is trying to connect to Azure Event Hub for Apache Kafka. Some instances work, others fail with this problem.

Any thoughts?

Regards.
Any thoughts?

@dylanmei
Copy link
Contributor

dylanmei commented Feb 2, 2020

I've also run into this on a Kafka 2.3.1 cluster; still investigating but this is not specific to kafka-lag-exporter.

Loop over all the consumer groups and simply pass those into the kafka-consumer-groups.sh tool that comes with your distribution you'll end up with the same root error message on at least one of the consumer groups.

$  bin/kafka-consumer-groups.sh --describe --group <bad group>
Error: Executing consumer group command failed due to java.lang.IllegalArgumentException: Invalid negative offset
java.util.concurrent.ExecutionException: java.lang.IllegalArgumentException: Invalid negative offset
	at org.apache.kafka.common.internals.KafkaFutureImpl.wrapAndThrow(KafkaFutureImpl.java:45)
	at org.apache.kafka.common.internals.KafkaFutureImpl.access$000(KafkaFutureImpl.java:32)
	at org.apache.kafka.common.internals.KafkaFutureImpl$SingleWaiter.await(KafkaFutureImpl.java:89)
	at org.apache.kafka.common.internals.KafkaFutureImpl.get(KafkaFutureImpl.java:260)
	at kafka.admin.ConsumerGroupCommand$ConsumerGroupService.getCommittedOffsets(ConsumerGroupCommand.scala:595)
	at kafka.admin.ConsumerGroupCommand$ConsumerGroupService.$anonfun$collectGroupsOffsets$2(ConsumerGroupCommand.scala:421)
	at scala.collection.TraversableLike$WithFilter.$anonfun$map$2(TraversableLike.scala:827)
	at scala.collection.mutable.HashMap.$anonfun$foreach$1(HashMap.scala:149)
	at scala.collection.mutable.HashTable.foreachEntry(HashTable.scala:237)
	at scala.collection.mutable.HashTable.foreachEntry$(HashTable.scala:230)
	at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:44)
	at scala.collection.mutable.HashMap.foreach(HashMap.scala:149)
	at scala.collection.TraversableLike$WithFilter.map(TraversableLike.scala:826)
	at kafka.admin.ConsumerGroupCommand$ConsumerGroupService.collectGroupsOffsets(ConsumerGroupCommand.scala:419)
	at kafka.admin.ConsumerGroupCommand$ConsumerGroupService.describeGroups(ConsumerGroupCommand.scala:312)
	at kafka.admin.ConsumerGroupCommand$.main(ConsumerGroupCommand.scala:63)
	at kafka.admin.ConsumerGroupCommand.main(ConsumerGroupCommand.scala)
Caused by: java.lang.IllegalArgumentException: Invalid negative offset
	at org.apache.kafka.clients.consumer.OffsetAndMetadata.<init>(OffsetAndMetadata.java:50)
	at org.apache.kafka.clients.admin.KafkaAdminClient$24$1.handleResponse(KafkaAdminClient.java:2832)
	at org.apache.kafka.clients.admin.KafkaAdminClient$AdminClientRunnable.handleResponses(KafkaAdminClient.java:1032)
	at org.apache.kafka.clients.admin.KafkaAdminClient$AdminClientRunnable.run(KafkaAdminClient.java:1160)
	at java.lang.Thread.run(Thread.java:748)

In my case, when I try to list the members of this group, the tool reports that there are no members.

$ bin/kafka-consumer-groups.sh --describe --group <bad group> --members

Consumer group '<bad group>' has no active members.

After deleting this group, kafka-lag-exporter stops crashing.

$ bin/kafka-consumer-groups.sh --delete --group <bad group>

Deletion of requested consumer groups ('<bad group>') was successful.

@seglo
Copy link
Owner

seglo commented Feb 2, 2020

@dylanmei Thanks for checking. I can't recreate this issue so I don't have much to go on. The exception itself is exactly what it sounds like, an offset returned by a OffsetFetchRequest sent by the admin client is < 0. There's no opportunity to log out the response AFAICT.

Do you have any theories here? Next time it happens could you describe the topics involved and paste their configuration here?

@andreminelli
Copy link
Author

I will check the groups as @dylanmei suggested and return.

@andreminelli
Copy link
Author

I am having a hard time in order to get those scripts running over Azure EventHub for Kafka...
Has someone done this already?

@dylanmei
Copy link
Contributor

dylanmei commented Feb 3, 2020

I am unfamiliar with Azure EventHub. However, you can pass a --command.config file to this command. Try editing and then passing this config taken from their docs as eventhubs.config:

bootstrap.servers={YOUR.EVENTHUBS.FQDN}:9093
security.protocol=SASL_SSL
sasl.mechanism=PLAIN
sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule required username="$ConnectionString" password="{YOUR.EVENTHUBS.CONNECTION.STRING}";

Then run:

bin/kafka-consumer-groups.sh \
  --bootstrap-server={YOUR.EVENTHUBS.FQDN}:9093 \
  --command-config eventhubs.config \
  --list

@andreminelli
Copy link
Author

@dylanmei , the MS support gave us an important information about Azure EventHub For Kafka: new consumer group has an offset value of -1. I don´t know if Kafka itself works like this too, but that info and your hint about kafka-consumer-groups.sh with --command.config argument saved the day.

We found indeed topics without any message produced but with consumer groups created for it anyway, which then caused the Invalid negative offset exception.

The current workaround was to insert a dummy message so the offset was set.

Thank you very much

@seglo
Copy link
Owner

seglo commented Feb 4, 2020

@andreminelli That's an interesting discovery. Thanks for following up.

I suppose you could also use Kafka Lag Exporter whitelisting to ignore groups that have offsets initialized this way, as long as you know what they are ahead of time.

@seglo
Copy link
Owner

seglo commented Feb 11, 2020

I'm going to leave this open for now. Worst case we can catch and handle this exception message and report NaN.

@seglo seglo reopened this Feb 11, 2020
@andreminelli
Copy link
Author

I'm going to leave this open for now. Worst case we can catch and handle this exception message and report NaN.

This handling makes sense - stop the whole topic checking because of some consumer groups 'sitting idle' is terrible...
I am looking forward to this enhancement.
Thanks, @seglo

@dylanmei
Copy link
Contributor

dylanmei commented Mar 1, 2020

A fix for this is on its way in Kafka 2.4.1: https://issues.apache.org/jira/browse/KAFKA-9507

@pimpelsang
Copy link

pimpelsang commented Mar 13, 2020

Ran into it also. Please consider the NaN by catching that exception as it will be at least year until that latest kafka version is upgraded in production...

Can't use blacklisting groups workaround as group naming is under devs control..

@dylanmei
Copy link
Contributor

I expect the fix to appear in the Kafka native client library used by this exporter. There will be no need to upgrade your clusters.

@seglo
Copy link
Owner

seglo commented Mar 15, 2020

Yes, the fixed admin client should resolve the issue. I'll cut a release next week.

@tarvip
Copy link

tarvip commented Mar 30, 2020

@seglo any updates on new release?

@pimpelsang
Copy link

Pretty please ;)

@seglo seglo closed this as completed in #134 May 7, 2020
@seglo
Copy link
Owner

seglo commented May 7, 2020

0.6.1 is released with a Kafka clients bump to 2.5.0. I've had trouble reproducing this issue so it would be great if somebody in this thread could verify this is resolved. Thanks!

@dprangnell
Copy link

dprangnell commented Jun 7, 2023

Hi, I was a bit behind on my kafka-lag-exporter maintenance and was running the container lightbend/kafka-lag-exporter:0.6.7, I upgraded to seglo/kafka-lag-exporter:0.8.2 and found the exact problem listed here. I rolled back to 0.6.7 and issue went away.

To find out where the regression occurred, I upgraded sequentially and had no issues with 0.6.8, 0.6.9, 0.7.0 nor 0.7.1. However with 0.7.2 I got the problem. So assume there is a regression introduced in that version?

I did loop through our kafka-consumer-groups:

for i in `kafka-consumer-groups --bootstrap-server localhost:9092 --list`; do kafka-consumer-groups --bootstrap-server localhost:9092 --describe --group $i --members; done

I found a bunch that had no active members:

Consumer group '<bad group>' has no active members.

We are running confluent-kafka 6.2.10. I hope this is helpful.

Looking through the diff I see this is the first time the kafka client has been upgraded, from 2.5.0 to 3.2.1. So the regression must be upstream.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants