-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Badger not cleaning data in expected manner #1916
Comments
@emailtovamos thanks for a separate issue, I will move it to the jaeger repo. |
cc @burmanm |
Hi, I don't know what the "expected manner" is in this case, but perhaps something more aggressive than what the badger currently does? To verify the behavior, I wrote a small test: func TestFreeDiskspace(t *testing.T) {
// For Codecov - this does not test anything
f := NewFactory()
v, command := config.Viperize(f.AddFlags)
// Lets speed up the maintenance ticker..
command.ParseFlags([]string{
"--badger.maintenance-interval=1s",
"--badger.span-store-ttl=1s",
})
f.InitFromViper(v)
mFactory := metricstest.NewFactory(0)
f.Initialize(mFactory, zap.NewNop())
insertTicker := time.Tick(1 * time.Millisecond)
stopTicker := time.Tick(20 * time.Minute)
defer f.store.Close()
sw, _ := f.CreateSpanWriter()
for {
select {
case <-insertTicker:
// insert something
s1 := model.Span{
TraceID: model.TraceID{
Low: rand.Uint64(),
High: 0,
},
SpanID: model.SpanID(rand.Uint64()),
OperationName: "/",
Process: &model.Process{
ServiceName: "nginx",
},
Tags: model.KeyValues{
model.KeyValue{
Key: "http.request_id",
VStr: "first",
VType: model.StringType,
},
},
StartTime: time.Now(),
Duration: 1 * time.Second,
}
sw.WriteSpan(&s1)
case <-stopTicker:
return
}
}
} And then started to monitor the disk usage and reclaim process, here are some parts of the log:
From that it looks like the diskspace is correctly reclaimed, however I wonder if your expectation was something quicker? While the compaction can't be triggered more often, it would be possible to adjust certain LSM tree settings to ensure that in certain cases there aren't so many tables waiting for compaction. |
Thanks @burmanm for the detailed test. Please let me know if I am not clear in explaining my issue. |
It should reach equilibrium at some point, where deletes are done at the same rate as inserts. There is no set upper limit when this would occur, it depends on the compaction rate and how the levels are built on the LSM tree. Are the .SST files taking most space in your use or the .vlog files? I could add some more configuration parameters to allow you to get the deletes coming in faster (with the cost of doing additional I/O for data management). |
Since the pod is already running with the assigned PVC, I am not sure how to get data from the underlying PV WHILE the pod is still running. |
I'm not entirely sure if this is going to answer your question, but if you fetch the badger metrics, then there's |
But you can of course subtract the other if you have the total. |
|
I sshed and got info about the SST and VLOG files as shown above. Below is the disk usage.
As you can see both of them have significant usage. SST files account for 27.8GB and VLOG files are 19GB. The combined volume has been increasing for the past 3 weeks with no signs to plateau. |
I think this is connected to dgraph-io/badger#1228 |
@emailtovamos can you try this fix?
|
Signed-off-by: Matthias Schneider <matthias.schneider@retarus.de>
I have created a docker all-in-one image with the patch, for me it works and the files are cleaned up, if someone wants to test: Please note that old badger files needs to be cleaned, it only works for new key/values (see related thread in the badger issue) |
Hey guys, we've closed dgraph-io/badger#1228 with fixes in master. We'll do a new badger release soon. It would be very useful if you can test the master branch. |
It would be very interesting to have this work, to allow deploying small jaeger setup without more complex solutions like elasticsearch. |
@jarifibrahim Has the fixes for dgraph-io/badger#1228 been backported to a 1.x release? We are currently using 1.5.3, but if was backported to 1.6 branch we could try moving to that? |
@objectiser The fix isn't backported yet. We ran into some crashes and so we had to delay the release. Apologies for the delay. I don't have an ETA on the next release but we are working on it. :) |
Closing as should be resolved by #2613. If the problem still occurs please reopen. |
I'm using
all-in-one
with badger storage in google Kubernetes.My file is:
This means I expect the old traces to be deleted in 24 hours. I searched for the old traces, I couldn't find them which is the expected behaviour. So far all good.
But the thing I am worried about is the constant increase of the disk space. I was expecting it to stay around the level which it reached at the end of the first 24 hours but it has almost always been increasing except a few drops. So no matter how much space I assign, there is always a chance of hitting the limit!
What am I doing wrong?
The text was updated successfully, but these errors were encountered: