Skip to content
This repository has been archived by the owner on May 13, 2019. It is now read-only.

some partitions have no owner #102

Open
Huang-lin opened this issue Aug 16, 2016 · 4 comments
Open

some partitions have no owner #102

Huang-lin opened this issue Aug 16, 2016 · 4 comments

Comments

@Huang-lin
Copy link

Hi,
I find a problem that when I restart my consumers, a partition has no owner.

Consumer group will trigger rebalance when I stop a consumer. Its timeout in function finalizePartition is cg.config.Offsets.ProcessingTimeout, and retrial times is also cg.config.Offsets.ProcessingTimeout in rebalancing.

It should finish some logical processing which it was doing and running function finalizePartition before a consumer stop. This operating maybe spend time more than cg.config.Offset.ProcessingTimeout, so rebalancing will maybe fail, and some partitions have no owner.

To solve this problem, maybe we can add a goroutine to watch partition's owner, and it can also avoid some problem when partition numbers make changes such as kafka broker capacity expansion.

@jimmy1911
Copy link

@wvanbergen I have the same trouble.

@wvanbergen
Copy link
Owner

In my experience, the most common reason for partitions not being consumed is locks still being present in zookeeper because a previous consumer was not shut down properly. After the zookeeper connection timeout, these locks should be garbage collected and the consumergroup should pickup the partitions. However, it's possible there are bugs in this.

It's unlikely that I'll be working on this myself any time soon. I am happy to accept patches for this though.

@Huang-lin
Copy link
Author

I think watching partition owner is difficult because we don't know when consumer finish registering owner to zookeeper. It looks like a better method to solve this trouble that trigger rebalance after claim partition failed. I have finished this patch and created pull request.

@rengawm
Copy link

rengawm commented Sep 8, 2016

There are now three PRs open which attempt to address roughly the same problem with varying amounts of change:
#88
#101
#103

@wvanbergen At least in my experience with this particular issue, I've seen two race conditions. I've documented them in the comment on #101.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants