Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(vlog): making vlog threshold dynamic 6ce3b7c #1635

Merged
merged 41 commits into from
Feb 4, 2021

Conversation

aman-bansal
Copy link
Contributor

@aman-bansal aman-bansal commented Jan 5, 2021

I ran this for 200M entries. Value threshold is set to 128 initially with max of 2048 and threshold range of 128-1024.
With Dynamic Threshold set to true it takes 52m. Vlog file size were around 4Gi. Threshold sets to around 770 after certain iterations.

With Dynamic Threshold set to false, it takes around 35 min for the write bench to write the data. vlog file size in this case is around 70Gi

Percentage distribution of value sizes were 80% less than 512B, 19% 512B-768B, 0.99% 768-1024B, and 0.01% 1024-2048 B.

This change is Reviewable

@CLAassistant
Copy link

CLAassistant commented Jan 5, 2021

CLA assistant check
All committers have signed the CLA.

@aman-bansal aman-bansal marked this pull request as ready for review January 8, 2021 06:01
@aman-bansal aman-bansal force-pushed the aman/vlog_threshold_dynamic branch from ac57104 to d16bb34 Compare January 8, 2021 06:23
@aman-bansal aman-bansal force-pushed the aman/vlog_threshold_dynamic branch from 85bceb1 to 8d6887f Compare January 21, 2021 06:56
@jarifibrahim
Copy link
Contributor

@aman-bansal can you please resolve the conflicts?

@aman-bansal aman-bansal force-pushed the aman/vlog_threshold_dynamic branch from cb2e7ad to 54d5591 Compare January 27, 2021 13:54
Copy link
Contributor

@NamanJain8 NamanJain8 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great @aman-bansal . Got few comments. Also, we are considering the vlog rewrites but not the keys which are deleted while compaction. Should be okay I believe.

structs.go Outdated
if v < e.valThreshold {
return k + v + 2 // Meta, UserMeta
}
return k + 12 + 2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use vPtrSize instead of 12. Also, please add back the comment.

txn.go Show resolved Hide resolved
value.go Outdated Show resolved Hide resolved
value.go Outdated Show resolved Hide resolved
value.go Outdated
case <-v.closer.HasBeenClosed():
return
case val := <-v.valueCh:
if val == nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When will this be nil?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

safe check only. No use case per se.

value.go Outdated Show resolved Hide resolved
Copy link
Contributor

@jarifibrahim jarifibrahim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got some comments

Reviewed 1 of 16 files at r4.
Reviewable status: 1 of 18 files reviewed, 12 unresolved discussions (waiting on @aman-bansal, @jarifibrahim, @manishrjain, and @NamanJain8)


options.go, line 62 at r8 (raw file):

	VLogPercentile    float64
	MinValueThreshold int64

This is a breaking change. This PR doesn't need to be a breaking change.


options.go, line 348 at r8 (raw file):

// [MinValueThreshold, Options.maxValueThreshold]
//
// Say VLogPercentile with 1.0 means threshold will eventually sets too Options.maxValueThreshold

will be eventually set to ...


structs.go, line 172 at r8 (raw file):

func (e *Entry) skipVlog(threshold int64) bool {
	if e.valThreshold == 0 {
		e.valThreshold = threshold

When would valThreshold be zero? This function doesn't need to change the value of the valThreshold.


structs.go, line 174 at r8 (raw file):

		e.valThreshold = threshold
	}
	if int64(len(e.Value)) < e.valThreshold {

return int64(len(e.Value)) < e.valThreshold


value.go, line 1164 at r7 (raw file):

Previously, aman-bansal (aman bansal) wrote…

safe check only. No use case per se.

If we don't expect this to be nil, I'd at least log it.


value.go, line 892 at r8 (raw file):

		vlog.numEntriesWritten += uint32(written)
		vlog.db.threshold.update(valueSizes)

@aman-bansal do you need to update the sizes one txn at a time? Can't we do this outside both the for loops? One single call to threshold.update(...)


value.go, line 1175 at r8 (raw file):

			p := int64(v.vlMetrics.Percentile(v.percentile))
			if atomic.LoadInt64(&v.valueThreshold) != p {
				// v.opt.Infof("updating value threshold from: %d to: %d", vt, p)

stale comment.

Copy link
Contributor

@jarifibrahim jarifibrahim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reset the histogram on drop all otherwise LGTM.

Copy link
Contributor

@manishrjain manishrjain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @aman-bansal , what's the status here?

Reviewable status: 1 of 18 files reviewed, 13 unresolved discussions (waiting on @aman-bansal, @jarifibrahim, and @NamanJain8)


db.go, line 119 at r4 (raw file):

	opt.maxBatchSize = (15 * opt.MemTableSize) / 100
	opt.maxBatchCount = opt.maxBatchSize / int64(skl.MaxNodeSize)
	//opt.maxValueThreshold = 1024.0

Rename opt.ValueThreshold to opt.MinValueThreshold. And then this one is max. And do add a comment.

Copy link
Contributor Author

@aman-bansal aman-bansal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 1 of 15 files reviewed, 13 unresolved discussions (waiting on @jarifibrahim, @manishrjain, and @NamanJain8)


db.go, line 119 at r4 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

Rename opt.ValueThreshold to opt.MinValueThreshold. And then this one is max. And do add a comment.

Done. We have kept the name same because otherwise it would become breaking change. we believe this change doesnt need to be breaking change.


options.go, line 62 at r8 (raw file):

Previously, jarifibrahim (Ibrahim Jarif) wrote…

This is a breaking change. This PR doesn't need to be a breaking change.

Done.


options.go, line 348 at r8 (raw file):

Previously, jarifibrahim (Ibrahim Jarif) wrote…

will be eventually set to ...

Done.


structs.go, line 167 at r7 (raw file):

Previously, NamanJain8 (Naman Jain) wrote…

Use vPtrSize instead of 12. Also, please add back the comment.

Done.


structs.go, line 172 at r8 (raw file):

Previously, jarifibrahim (Ibrahim Jarif) wrote…

When would valThreshold be zero? This function doesn't need to change the value of the valThreshold.

Done.


structs.go, line 174 at r8 (raw file):

Previously, jarifibrahim (Ibrahim Jarif) wrote…

return int64(len(e.Value)) < e.valThreshold

Done.


txn.go, line 381 at r7 (raw file):

Previously, aman-bansal (aman bansal) wrote…

This check is for inmemory. In this mode, the value threshold will work in a default behaviour. It adds basic validation for that mode i believe.

Done.


value.go, line 72 at r7 (raw file):

Previously, aman-bansal (aman bansal) wrote…

It was looking bit noisy to add int64 everywhere i.e. channels, req tracker etc. I did this so as to add context to the variable. Any thoughts about this?

Done.


value.go, line 845 at r7 (raw file):

Previously, NamanJain8 (Naman Jain) wrote…

This needs to be inside the for loop. Also, we can specify the capacity upfront.
make(ReqValSiz, 0, len(req.Entries))

Done.


value.go, line 1164 at r7 (raw file):

Previously, jarifibrahim (Ibrahim Jarif) wrote…

If we don't expect this to be nil, I'd at least log it.

Done.


value.go, line 1171 at r7 (raw file):

Previously, aman-bansal (aman bansal) wrote…

yes thanks. It was changed recently. comment was outdated thanks :).

Done.


value.go, line 892 at r8 (raw file):

Previously, jarifibrahim (Ibrahim Jarif) wrote…

@aman-bansal do you need to update the sizes one txn at a time? Can't we do this outside both the for loops? One single call to threshold.update(...)

Done.


value.go, line 1175 at r8 (raw file):

Previously, jarifibrahim (Ibrahim Jarif) wrote…

stale comment.

Done.

@aman-bansal aman-bansal merged commit 5dbae10 into master Feb 4, 2021
@aman-bansal aman-bansal deleted the aman/vlog_threshold_dynamic branch February 4, 2021 12:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

5 participants