Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add last_fork_start_time to INFO STATS #978

Open
wants to merge 4 commits into
base: unstable
Choose a base branch
from

Conversation

enjoy-binbin
Copy link
Member

@enjoy-binbin enjoy-binbin commented Sep 2, 2024

Sometimes we want to know the last fork start time, this can help
user to analyze the stability since right now fork is still
a costly operation and user can monitor forks by drawing graphs
based on these fork fields.

fork is still a very resource-intensive process, so it makes sense
to record the time each fork is triggered. Adding a last_fork_start_time
field in INFO STATS, to record the last fork start time in us.

Sometimes we want to know the last fork time, this can help
user to analyze the stability since right now fork is still
a costly operation and user can monitor forks by drawing graphs
based on these fork fields.

Signed-off-by: Binbin <binloveplay1314@qq.com>
@enjoy-binbin
Copy link
Member Author

Currently the time is, a successful fork, when it start, is recorded. We can also record the last failed fork time if needed.

Copy link

codecov bot commented Sep 2, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 70.65%. Comparing base (789a73b) to head (d914f67).
Report is 112 commits behind head on unstable.

Additional details and impacted files
@@             Coverage Diff              @@
##           unstable     #978      +/-   ##
============================================
- Coverage     70.66%   70.65%   -0.02%     
============================================
  Files           114      114              
  Lines         63150    63152       +2     
============================================
- Hits          44624    44619       -5     
- Misses        18526    18533       +7     
Files with missing lines Coverage Δ
src/server.c 87.71% <100.00%> (+<0.01%) ⬆️
src/server.h 100.00% <ø> (ø)

... and 11 files with indirect coverage changes

Signed-off-by: Binbin <binloveplay1314@qq.com>
@enjoy-binbin enjoy-binbin changed the title Add latest_fork_time to INFO STATS Add last_fork_start_time to INFO STATS Oct 30, 2024
Signed-off-by: Binbin <binloveplay1314@qq.com>
Copy link
Member

@xbasel xbasel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Were these config files and text file committed by accident?

Signed-off-by: Binbin <binloveplay1314@qq.com>
@enjoy-binbin
Copy link
Member Author

Were these config files and text file committed by accident?

yes.. sorry for that, i did not check after push

Copy link
Member

@xbasel xbasel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need also to update https://valkey.io/commands/info/

@enjoy-binbin enjoy-binbin added the major-decision-pending Major decision pending by TSC team label Dec 6, 2024
@zuiderkwast zuiderkwast added release-notes This issue should get a line item in the release notes needs-doc-pr This change needs to update a documentation page. Remove label once doc PR is open. labels Dec 16, 2024
Copy link
Member

@hwware hwware left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This information looks like helpful.

@hwware
Copy link
Member

hwware commented Jan 29, 2025

@valkey-io/core-team Please vote if you are comfortable with this field, Thanks

@madolson
Copy link
Member

madolson commented Feb 3, 2025

@hwware @enjoy-binbin I suppose the counter question I have is what is the specific use case here when we have the logs for when it actually started?

Copy link
Member

@soloestoy soloestoy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm ok with the new metric, and is it better to add the unit for it?

@@ -5843,6 +5844,7 @@ sds genValkeyInfoString(dict *section_dict, int all_sections, int everything) {
"pubsubshard_channels:%llu\r\n", kvstoreSize(server.pubsubshard_channels),
"latest_fork_usec:%lld\r\n", server.stat_fork_time,
Copy link
Member

@madolson madolson Feb 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not specifically related to this PR, but this info field is confusing. We thought it was how long the fork latest, but apparently it just covers the system call time. This made the info name you proposed a bit odd since it also used fork.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, it is indeed confusing at the first time. A field show how long the fork latest also will be helpful.

@enjoy-binbin
Copy link
Member Author

I suppose the counter question I have is what is the specific use case here when we have the logs for when it actually started?

We shouldn't recommend relying on logs, should we? Getting from INFO should be easy for clients.

@madolson
Copy link
Member

madolson commented Feb 4, 2025

We shouldn't recommend relying on logs, should we? Getting from INFO should be easy for clients.

My concern was that if someone was concerned about a latency event that was 5 minutes ago, and there was a last fork time that was 180 seconds ago, that doesn't really rule out the possibility of a fork that was 5 minutes ago. At some point they will have to rely on logs for anything historical. Adding more info fields just means it's more complex for new operators to figure out what they should be looking at.

@enjoy-binbin
Copy link
Member Author

My concern was that if someone was concerned about a latency event that was 5 minutes ago, and there was a last fork time that was 180 seconds ago, that doesn't really rule out the possibility of a fork that was 5 minutes ago. At some point they will have to rely on logs for anything historical. Adding more info fields just means it's more complex for new operators to figure out what they should be looking at.

It is true. I don't have a good argument about it, if people care the latency, they can look at the LATENCY LATEST command or the logs to figure it out. From my point, this field is more suitable for making graphs. For example, if we record the time of each fork and draw a graph, combined with other stats, we can better see the impact of fork/COW, etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
major-decision-pending Major decision pending by TSC team needs-doc-pr This change needs to update a documentation page. Remove label once doc PR is open. release-notes This issue should get a line item in the release notes
Projects
Status: After RC1
Development

Successfully merging this pull request may close these issues.

6 participants