-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(vlog): making vlog threshold dynamic 6ce3b7c #1635
Conversation
ac57104
to
d16bb34
Compare
…dger into aman/vlog_threshold_dynamic
85bceb1
to
8d6887f
Compare
@aman-bansal can you please resolve the conflicts? |
cb2e7ad
to
54d5591
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great @aman-bansal . Got few comments. Also, we are considering the vlog rewrites but not the keys which are deleted while compaction. Should be okay I believe.
structs.go
Outdated
if v < e.valThreshold { | ||
return k + v + 2 // Meta, UserMeta | ||
} | ||
return k + 12 + 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use vPtrSize
instead of 12. Also, please add back the comment.
value.go
Outdated
case <-v.closer.HasBeenClosed(): | ||
return | ||
case val := <-v.valueCh: | ||
if val == nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When will this be nil
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
safe check only. No use case per se.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got some comments
Reviewed 1 of 16 files at r4.
Reviewable status: 1 of 18 files reviewed, 12 unresolved discussions (waiting on @aman-bansal, @jarifibrahim, @manishrjain, and @NamanJain8)
options.go, line 62 at r8 (raw file):
VLogPercentile float64 MinValueThreshold int64
This is a breaking change. This PR doesn't need to be a breaking change.
options.go, line 348 at r8 (raw file):
// [MinValueThreshold, Options.maxValueThreshold] // // Say VLogPercentile with 1.0 means threshold will eventually sets too Options.maxValueThreshold
will be eventually set to ...
structs.go, line 172 at r8 (raw file):
func (e *Entry) skipVlog(threshold int64) bool { if e.valThreshold == 0 { e.valThreshold = threshold
When would valThreshold be zero? This function doesn't need to change the value of the valThreshold
.
structs.go, line 174 at r8 (raw file):
e.valThreshold = threshold } if int64(len(e.Value)) < e.valThreshold {
return int64(len(e.Value)) < e.valThreshold
value.go, line 1164 at r7 (raw file):
Previously, aman-bansal (aman bansal) wrote…
safe check only. No use case per se.
If we don't expect this to be nil, I'd at least log it.
value.go, line 892 at r8 (raw file):
vlog.numEntriesWritten += uint32(written) vlog.db.threshold.update(valueSizes)
@aman-bansal do you need to update the sizes one txn at a time? Can't we do this outside both the for loops? One single call to threshold.update(...)
value.go, line 1175 at r8 (raw file):
p := int64(v.vlMetrics.Percentile(v.percentile)) if atomic.LoadInt64(&v.valueThreshold) != p { // v.opt.Infof("updating value threshold from: %d to: %d", vt, p)
stale comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reset the histogram on drop all otherwise LGTM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @aman-bansal , what's the status here?
Reviewable status: 1 of 18 files reviewed, 13 unresolved discussions (waiting on @aman-bansal, @jarifibrahim, and @NamanJain8)
db.go, line 119 at r4 (raw file):
opt.maxBatchSize = (15 * opt.MemTableSize) / 100 opt.maxBatchCount = opt.maxBatchSize / int64(skl.MaxNodeSize) //opt.maxValueThreshold = 1024.0
Rename opt.ValueThreshold to opt.MinValueThreshold. And then this one is max. And do add a comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 1 of 15 files reviewed, 13 unresolved discussions (waiting on @jarifibrahim, @manishrjain, and @NamanJain8)
db.go, line 119 at r4 (raw file):
Previously, manishrjain (Manish R Jain) wrote…
Rename opt.ValueThreshold to opt.MinValueThreshold. And then this one is max. And do add a comment.
Done. We have kept the name same because otherwise it would become breaking change. we believe this change doesnt need to be breaking change.
options.go, line 62 at r8 (raw file):
Previously, jarifibrahim (Ibrahim Jarif) wrote…
This is a breaking change. This PR doesn't need to be a breaking change.
Done.
options.go, line 348 at r8 (raw file):
Previously, jarifibrahim (Ibrahim Jarif) wrote…
will be eventually set to ...
Done.
structs.go, line 167 at r7 (raw file):
Previously, NamanJain8 (Naman Jain) wrote…
Use
vPtrSize
instead of 12. Also, please add back the comment.
Done.
structs.go, line 172 at r8 (raw file):
Previously, jarifibrahim (Ibrahim Jarif) wrote…
When would valThreshold be zero? This function doesn't need to change the value of the
valThreshold
.
Done.
structs.go, line 174 at r8 (raw file):
Previously, jarifibrahim (Ibrahim Jarif) wrote…
return int64(len(e.Value)) < e.valThreshold
Done.
txn.go, line 381 at r7 (raw file):
Previously, aman-bansal (aman bansal) wrote…
This check is for inmemory. In this mode, the value threshold will work in a default behaviour. It adds basic validation for that mode i believe.
Done.
value.go, line 72 at r7 (raw file):
Previously, aman-bansal (aman bansal) wrote…
It was looking bit noisy to add int64 everywhere i.e. channels, req tracker etc. I did this so as to add context to the variable. Any thoughts about this?
Done.
value.go, line 845 at r7 (raw file):
Previously, NamanJain8 (Naman Jain) wrote…
This needs to be inside the for loop. Also, we can specify the capacity upfront.
make(ReqValSiz, 0, len(req.Entries))
Done.
value.go, line 1164 at r7 (raw file):
Previously, jarifibrahim (Ibrahim Jarif) wrote…
If we don't expect this to be nil, I'd at least log it.
Done.
value.go, line 1171 at r7 (raw file):
Previously, aman-bansal (aman bansal) wrote…
yes thanks. It was changed recently. comment was outdated thanks :).
Done.
value.go, line 892 at r8 (raw file):
Previously, jarifibrahim (Ibrahim Jarif) wrote…
@aman-bansal do you need to update the sizes one txn at a time? Can't we do this outside both the for loops? One single call to
threshold.update(...)
Done.
value.go, line 1175 at r8 (raw file):
Previously, jarifibrahim (Ibrahim Jarif) wrote…
stale comment.
Done.
I ran this for 200M entries. Value threshold is set to 128 initially with max of 2048 and threshold range of 128-1024.
With Dynamic Threshold set to true it takes 52m. Vlog file size were around 4Gi. Threshold sets to around 770 after certain iterations.
With Dynamic Threshold set to false, it takes around 35 min for the write bench to write the data. vlog file size in this case is around 70Gi
Percentage distribution of value sizes were 80% less than 512B, 19% 512B-768B, 0.99% 768-1024B, and 0.01% 1024-2048 B.
This change is![Reviewable](https://camo.githubusercontent.com/1541c4039185914e83657d3683ec25920c672c6c5c7ab4240ee7bff601adec0b/68747470733a2f2f72657669657761626c652e696f2f7265766965775f627574746f6e2e737667)