Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix producer topic metadata on-demand fetch when topic error happens #1125

Merged
merged 1 commit into from
Jul 12, 2018
Merged

fix producer topic metadata on-demand fetch when topic error happens #1125

merged 1 commit into from
Jul 12, 2018

Conversation

andyxning
Copy link
Contributor

@andyxning andyxning commented Jul 6, 2018

This PR can help fix producer topic metadata on-demand fetch when topic error happens in metadata response.

There is an option Full in Config to config whether producer should fetch all topic metadata in a cluster or do it on-demand.
This feature is added in #937.

But there is an situation where we may make on-demand topic metadata fetching fall back to empty once an topic error happens in metadata update.

For example, if ErrUnknownTopicOrPartition or ErrInvalidTopic or ErrTopicAuthorizationFailed or any other errors that will make the switch case continue happens, the topic name will be deleted from client.metadata. This will result in an empty topic slice when refresh topic metadata next time in https://github.com/Shopify/sarama/blob/master/client.go#L648-L666 from client.Topics() which actually read topics from client.metadata. Thus we lost the topic name which we need to periodically refresh metadata.

We should not update client.metadata or delete any topic key from it.

@andyxning
Copy link
Contributor Author

/cc @eapache @chandradeepak @wvanbergen

@andyxning
Copy link
Contributor Author

/cc @burke

@eapache
Copy link
Contributor

eapache commented Jul 6, 2018

If ErrLeaderNotAvailable happens, the topic will be deleted from the map but will immediately be re-added at line 756. I don't understand how this is a bug.

@andyxning
Copy link
Contributor Author

andyxning commented Jul 7, 2018

If ErrLeaderNotAvailable happens, the topic will be deleted from the map but will immediately be re-added at line 756. I don't understand how this is a bug.

@eapache Sorry for not making the description clear and correct firstly. The error maybe ErrUnknownTopicOrPartition or ErrInvalidTopic or ErrTopicAuthorizationFailed or any other errors that will make the switch case continue.

I have also updated the pr description.

@andyxning
Copy link
Contributor Author

@eapache
Copy link
Contributor

eapache commented Jul 9, 2018

I'm not sure I agree. If the broker returns one of those errors the topic either doesn't exist or is fundamentally inaccessible. Leaving it in the metadata hash at that point would be extremely misleading to callers and would likely lead to other problems when trying to iterate over the set of topics.

But there is an situation where we may make on-demand topic metadata fetching fall back to empty once an topic error happens in metadata update.

I'm also not sure how this can happen. The refreshMetadata method already checks if the list of Topics is empty when trying to do a partial refresh: https://github.com/Shopify/sarama/blob/f7df95cff1bc4d14a4194a5be4c88a86253afb98/client.go#L654-L655

@andyxning
Copy link
Contributor Author

@eapache

I'm also not sure how this can happen. The refreshMetadata method already checks if the list of Topics is empty when trying to do a partial refresh

Yes. That's right. But refreshMetadata will return ErrNoTopicsToUpdateMetadata forever since the topic error happens. This is a problem because the topic error may disappear, but the topic name has been removed from metadata and it will not recover automatically alongside the topic error recover. A restart of producer must be done. This hits me.

If the broker returns one of those errors the topic either doesn't exist or is fundamentally inaccessible.

The error maybe not all related with producer side configuration. Some errors are related with server side operations or status.

@eapache
Copy link
Contributor

eapache commented Jul 10, 2018

OK, I understand. As mentioned, leaving it in the metadata would cause other issues. A better solution would be to keep a separate copy of the topics when not doing full metadata, and use that (in combination with client.Topics() still) in refreshMetadata().

@andyxning
Copy link
Contributor Author

As mentioned, leaving it in the metadata would cause other issues. A better solution would be to keep a separate copy of the topics when not doing full metadata, and use that (in combination with client.Topics() still) in refreshMetadata().

I have thought that before i propose this PR. I think directly update the client.Topics is fine. According to the discuss above, it seems that a new field should be added to store topic names whose metadata are need to be updated periodically. @eapache
Will update this later today.

@andyxning
Copy link
Contributor Author

@eapache PTAL.

client.go Outdated
controllerID int32 // cluster controller broker id
brokers map[int32]*Broker // maps broker ids to brokers
metadata map[string]map[int32]*PartitionMetadata // maps topics to partition ids to metadata
metadataTopics map[string]struct{} // topics that need to collect metadata
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sarama defines a helper type called none to make uses of struct{} like this more explicit

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eapache Done. PTAL.

@eapache eapache merged commit 5cd4d86 into IBM:master Jul 12, 2018
@eapache
Copy link
Contributor

eapache commented Jul 12, 2018

thanks!

@andyxning andyxning deleted the fix_topic_metadata_on-demand_fetch branch July 12, 2018 17:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants