-
Notifications
You must be signed in to change notification settings - Fork 734
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add last_fork_start_time to INFO STATS #978
base: unstable
Are you sure you want to change the base?
Conversation
Sometimes we want to know the last fork time, this can help user to analyze the stability since right now fork is still a costly operation and user can monitor forks by drawing graphs based on these fork fields. Signed-off-by: Binbin <binloveplay1314@qq.com>
Currently the time is, a successful fork, when it start, is recorded. We can also record the last failed fork time if needed. |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## unstable #978 +/- ##
============================================
- Coverage 70.66% 70.65% -0.02%
============================================
Files 114 114
Lines 63150 63152 +2
============================================
- Hits 44624 44619 -5
- Misses 18526 18533 +7
|
Signed-off-by: Binbin <binloveplay1314@qq.com>
Signed-off-by: Binbin <binloveplay1314@qq.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Were these config files and text file committed by accident?
Signed-off-by: Binbin <binloveplay1314@qq.com>
yes.. sorry for that, i did not check after push |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need also to update https://valkey.io/commands/info/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This information looks like helpful.
@valkey-io/core-team Please vote if you are comfortable with this field, Thanks |
@hwware @enjoy-binbin I suppose the counter question I have is what is the specific use case here when we have the logs for when it actually started? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'm ok with the new metric, and is it better to add the unit for it?
@@ -5843,6 +5844,7 @@ sds genValkeyInfoString(dict *section_dict, int all_sections, int everything) { | |||
"pubsubshard_channels:%llu\r\n", kvstoreSize(server.pubsubshard_channels), | |||
"latest_fork_usec:%lld\r\n", server.stat_fork_time, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not specifically related to this PR, but this info field is confusing. We thought it was how long the fork latest, but apparently it just covers the system call time. This made the info name you proposed a bit odd since it also used fork.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, it is indeed confusing at the first time. A field show how long the fork latest
also will be helpful.
We shouldn't recommend relying on logs, should we? Getting from INFO should be easy for clients. |
My concern was that if someone was concerned about a latency event that was 5 minutes ago, and there was a last fork time that was 180 seconds ago, that doesn't really rule out the possibility of a fork that was 5 minutes ago. At some point they will have to rely on logs for anything historical. Adding more info fields just means it's more complex for new operators to figure out what they should be looking at. |
It is true. I don't have a good argument about it, if people care the latency, they can look at the LATENCY LATEST command or the logs to figure it out. From my point, this field is more suitable for making graphs. For example, if we record the time of each fork and draw a graph, combined with other stats, we can better see the impact of fork/COW, etc. |
Sometimes we want to know the last fork start time, this can help
user to analyze the stability since right now fork is still
a costly operation and user can monitor forks by drawing graphs
based on these fork fields.
fork is still a very resource-intensive process, so it makes sense
to record the time each fork is triggered. Adding a last_fork_start_time
field in INFO STATS, to record the last fork start time in us.