Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: log progress of recovery #182

Closed
TheL1ne opened this issue Apr 18, 2019 · 4 comments
Closed

Proposal: log progress of recovery #182

TheL1ne opened this issue Apr 18, 2019 · 4 comments
Assignees

Comments

@TheL1ne
Copy link
Contributor

TheL1ne commented Apr 18, 2019

When we create new instances which have to sync from scratch with kafka we have no clue if the instance is still alive while it recovers the data from kafka which can lead to uncertainty over multiple hours.
It would be great to log at least how much was already synced and it would be even better when we can log an approx of how long it will take to recover the remaining missing data.

@TheL1ne TheL1ne changed the title Logs are silent when in recovery Proposal: log progress of recovery Apr 18, 2019
@TheL1ne TheL1ne self-assigned this Apr 18, 2019
@db7
Copy link
Collaborator

db7 commented Apr 24, 2019

Predicting won't work. There are two problems:
1- The only thing you know is the current offset and the high-water mark. You don't know how many of the records have been deleted because they may be updates to the same keys. So (HWM-offset) don't tell you how many records you still really have to read from a compacted topic.
2- Even if you would use (HWM-offset) as an upper bound, you cannot know how long LevelDB will need to do its compaction after the state is restored -- and in some cases that takes longer than reading the records from Kafka.

Stats() already gives you information of how many bytes are being read from which topic. You can use that as a rule-of-thumb for the specific application to guess how long it will take to recover. Only when the LevelDB compaction is running that you won't see any Kafka traffic for the topic/partition being recovered. During that period one should probably look at disk IO and CPU to guess if there is progress.

@TheL1ne
Copy link
Contributor Author

TheL1ne commented Apr 25, 2019

If we cannot approx the future I would at least add some logging to show that we are still alive

@db7
Copy link
Collaborator

db7 commented May 6, 2019

I guess this has been merged now, right? Can I close the ticket?

@frairon
Copy link
Contributor

frairon commented May 6, 2019

You're right, let's close it

@frairon frairon closed this as completed May 6, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants