Badger not cleaning data in expected manner #1916

emailtovamos · 2019-11-11T13:58:30Z

I'm using all-in-one with badger storage in google Kubernetes.
My file is:

apiVersion: jaegertracing.io/v1
kind: Jaeger
metadata:
  name: simplest
spec: 
  strategy: allInOne
  allInOne:
    image: jaegertracing/all-in-one:latest 
    # options:
    #   log-level: debug
  storage:
    type: badger
    options:
      badger:
        ephemeral: false
        directory-key: "/badger/key"
        directory-value: "/badger/data"
        span-store-ttl: 24h0m0s 
        truncate: true
  volumeMounts:
  - name: data
    mountPath: /badger
  volumes:
  - name: data
    persistentVolumeClaim:
      claimName: jaegerpvc4

This means I expect the old traces to be deleted in 24 hours. I searched for the old traces, I couldn't find them which is the expected behaviour. So far all good.

But the thing I am worried about is the constant increase of the disk space. I was expecting it to stay around the level which it reached at the end of the first 24 hours but it has almost always been increasing except a few drops. So no matter how much space I assign, there is always a chance of hitting the limit!

What am I doing wrong?

The text was updated successfully, but these errors were encountered:

pavolloffay · 2019-11-11T14:13:04Z

@emailtovamos thanks for a separate issue, I will move it to the jaeger repo.

pavolloffay · 2019-11-11T14:14:33Z

cc @burmanm

burmanm · 2019-11-13T08:27:33Z

Hi,

I don't know what the "expected manner" is in this case, but perhaps something more aggressive than what the badger currently does? To verify the behavior, I wrote a small test:

func TestFreeDiskspace(t *testing.T) {
	// For Codecov - this does not test anything
	f := NewFactory()
	v, command := config.Viperize(f.AddFlags)
	// Lets speed up the maintenance ticker..
	command.ParseFlags([]string{
		"--badger.maintenance-interval=1s",
		"--badger.span-store-ttl=1s",
	})
	f.InitFromViper(v)
	mFactory := metricstest.NewFactory(0)
	f.Initialize(mFactory, zap.NewNop())

	insertTicker := time.Tick(1 * time.Millisecond)
	stopTicker := time.Tick(20 * time.Minute)

	defer f.store.Close()

	sw, _ := f.CreateSpanWriter()

	for {
		select {
		case <-insertTicker:
			// insert something
			s1 := model.Span{
				TraceID: model.TraceID{
					Low:  rand.Uint64(),
					High: 0,
				},
				SpanID:        model.SpanID(rand.Uint64()),
				OperationName: "/",
				Process: &model.Process{
					ServiceName: "nginx",
				},
				Tags: model.KeyValues{
					model.KeyValue{
						Key:   "http.request_id",
						VStr:  "first",
						VType: model.StringType,
					},
				},
				StartTime: time.Now(),
				Duration:  1 * time.Second,
			}
			sw.WriteSpan(&s1)
		case <-stopTicker:
			return
		}
	}
}

And then started to monitor the disk usage and reclaim process, here are some parts of the log:

[michael@nina badger118483394]$ ls -ltrh
total 117M
-rw-r--r-- 1 michael michael    6 13.11. 10:08 LOCK
-rw-r--r-- 1 michael michael  39M 13.11. 10:10 000001.sst
-rw-r--r-- 1 michael michael   28 13.11. 10:10 MANIFEST
-rw-r--r-- 1 michael michael  73M 13.11. 10:10 000000.vlog
-rw-r--r-- 1 michael michael 5,2M 13.11. 10:11 000001.vlog
[michael@nina badger118483394]$ date
ke 13.11.2019 10.11.13 +0200
[michael@nina badger118483394]$

000000.vlog reclaimed:

[michael@nina badger118483394]$ ls -ltrh
total 141M
-rw-r--r-- 1 michael michael   6 13.11. 10:08 LOCK
-rw-r--r-- 1 michael michael 39M 13.11. 10:10 000001.sst
-rw-r--r-- 1 michael michael  28 13.11. 10:10 MANIFEST
-rw-r--r-- 1 michael michael 73M 13.11. 10:10 000000.vlog
-rw-r--r-- 1 michael michael 30M 13.11. 10:12 000001.vlog
[michael@nina badger118483394]$ ls -ltrh
total 169M
-rw-r--r-- 1 michael michael   6 13.11. 10:08 LOCK
-rw-r--r-- 1 michael michael 39M 13.11. 10:10 000001.sst
-rw-r--r-- 1 michael michael 39M 13.11. 10:13 000002.sst
-rw-r--r-- 1 michael michael  40 13.11. 10:13 MANIFEST
-rw-r--r-- 1 michael michael 73M 13.11. 10:13 000001.vlog
-rw-r--r-- 1 michael michael 20M 13.11. 10:14 000002.vlog
[michael@nina badger118483394]$ date
ke 13.11.2019 10.14.35 +0200
[michael@nina badger118483394]$

More vlogs have been reclaimed:

michael@nina badger118483394]$ ls -ltrh
total 233M
-rw-r--r-- 1 michael michael    6 13.11. 10:08 LOCK
-rw-r--r-- 1 michael michael  39M 13.11. 10:10 000001.sst
-rw-r--r-- 1 michael michael  39M 13.11. 10:13 000002.sst
-rw-r--r-- 1 michael michael  39M 13.11. 10:16 000003.sst
-rw-r--r-- 1 michael michael  39M 13.11. 10:19 000004.sst
-rw-r--r-- 1 michael michael   64 13.11. 10:19 MANIFEST
-rw-r--r-- 1 michael michael  73M 13.11. 10:19 000003.vlog
-rw-r--r-- 1 michael michael 6,6M 13.11. 10:19 000004.vlog
[michael@nina badger118483394]$ date
ke 13.11.2019 10.19.42 +0200
[michael@nina badger118483394]$

SST files reclaimed:

[michael@nina badger118483394]$ ls -ltrh
total 132M
-rw-r--r-- 1 michael michael   6 13.11. 10:08 LOCK
-rw-r--r-- 1 michael michael 186 13.11. 10:22 000006.sst
-rw-r--r-- 1 michael michael 39M 13.11. 10:24 000007.sst
-rw-r--r-- 1 michael michael 132 13.11. 10:24 MANIFEST
-rw-r--r-- 1 michael michael 73M 13.11. 10:24 000005.vlog
-rw-r--r-- 1 michael michael 21M 13.11. 10:25 000006.vlog
[michael@nina badger118483394]$ date
ke 13.11.2019 10.25.48 +0200
[michael@nina badger118483394]$

From that it looks like the diskspace is correctly reclaimed, however I wonder if your expectation was something quicker? While the compaction can't be triggered more often, it would be possible to adjust certain LSM tree settings to ensure that in certain cases there aren't so many tables waiting for compaction.

emailtovamos · 2019-11-14T12:20:18Z

Thanks @burmanm for the detailed test.
Does it mean the data should never go beyond certain limit?
In my case even though the deletion happen the overall disk usage of the jaeger pod keeps increasing. By "expected manner" I meant that the disk usage should have an upper limit. This doesnt happen for me currently. Which means I will have to replace my PVC which I give as part of badger every now and then because at this rate the disk will get filled up in a few days.

Please let me know if I am not clear in explaining my issue.

burmanm · 2019-11-14T12:36:10Z

It should reach equilibrium at some point, where deletes are done at the same rate as inserts. There is no set upper limit when this would occur, it depends on the compaction rate and how the levels are built on the LSM tree.

Are the .SST files taking most space in your use or the .vlog files? I could add some more configuration parameters to allow you to get the deletes coming in faster (with the cost of doing additional I/O for data management).

emailtovamos · 2019-11-14T18:29:56Z

Since the pod is already running with the assigned PVC, I am not sure how to get data from the underlying PV WHILE the pod is still running.
And Kubernetes/GKE doesn't have an easy way to look into a PV to know if the .SST files are taking more space or the .vlog files.

burmanm · 2019-11-20T08:59:48Z

I'm not entirely sure if this is going to answer your question, but if you fetch the badger metrics, then there's badger_lsm_size_bytes and badger_vlog_size_bytes. If both datas aren't stored in the same directory that is. Although I think 1.5.3 has a bug that it always reports the same number for both metrics (it's been fixed upstream, but I don't remember the version).

burmanm · 2019-11-20T09:00:20Z

But you can of course subtract the other if you have the total.

emailtovamos · 2019-11-27T13:01:18Z

root@ubuntu:/data# ls
data  key  lost+found
root@ubuntu:/data# cd data/
root@ubuntu:/data/data# ls
000017.vlog  000356.vlog  000506.vlog  000680.vlog  000738.vlog  000782.vlog  000804.vlog  000819.vlog  000828.vlog  000837.vlog  000844.vlog  000850.vlog  000856.vlog  000862.vlog
000041.vlog  000369.vlog  000509.vlog  000682.vlog  000741.vlog  000788.vlog  000805.vlog  000823.vlog  000831.vlog  000838.vlog  000845.vlog  000851.vlog  000857.vlog  LOCK
000049.vlog  000395.vlog  000511.vlog  000708.vlog  000762.vlog  000791.vlog  000808.vlog  000824.vlog  000832.vlog  000839.vlog  000846.vlog  000852.vlog  000858.vlog
000171.vlog  000502.vlog  000540.vlog  000723.vlog  000765.vlog  000796.vlog  000809.vlog  000825.vlog  000833.vlog  000840.vlog  000847.vlog  000853.vlog  000859.vlog
000333.vlog  000503.vlog  000658.vlog  000729.vlog  000766.vlog  000800.vlog  000815.vlog  000826.vlog  000835.vlog  000841.vlog  000848.vlog  000854.vlog  000860.vlog
000344.vlog  000505.vlog  000675.vlog  000734.vlog  000775.vlog  000803.vlog  000816.vlog  000827.vlog  000836.vlog  000843.vlog  000849.vlog  000855.vlog  000861.vlog
root@ubuntu:/data/data# cd ..
root@ubuntu:/data# ls
data  key  lost+found
root@ubuntu:/data# cd key/
root@ubuntu:/data/key# ls
005805.sst  007624.sst  008094.sst  008433.sst  008610.sst  008969.sst  009084.sst  009236.sst  009358.sst  009518.sst  009673.sst  009744.sst  009806.sst  009838.sst  009867.sst
005807.sst  007626.sst  008096.sst  008434.sst  008611.sst  008970.sst  009086.sst  009238.sst  009359.sst  009554.sst  009675.sst  009745.sst  009807.sst  009839.sst  009868.sst
005808.sst  007628.sst  008098.sst  008435.sst  008612.sst  008971.sst  009088.sst  009239.sst  009406.sst  009563.sst  009676.sst  009746.sst  009809.sst  009840.sst  009869.sst
005809.sst  007630.sst  008099.sst  008436.sst  008614.sst  008972.sst  009090.sst  009240.sst  009407.sst  009565.sst  009713.sst  009747.sst  009811.sst  009841.sst  009870.sst
005810.sst  007632.sst  008103.sst  008437.sst  008617.sst  008973.sst  009091.sst  009241.sst  009408.sst  009567.sst  009716.sst  009748.sst  009812.sst  009842.sst  009871.sst
005811.sst  007633.sst  008208.sst  008438.sst  008619.sst  009013.sst  009092.sst  009242.sst  009409.sst  009568.sst  009718.sst  009749.sst  009813.sst  009843.sst  009872.sst
006152.sst  007636.sst  008210.sst  008439.sst  008760.sst  009014.sst  009093.sst  009243.sst  009410.sst  009569.sst  009720.sst  009762.sst  009814.sst  009844.sst  009873.sst
006346.sst  007637.sst  008213.sst  008440.sst  008762.sst  009015.sst  009094.sst  009244.sst  009411.sst  009576.sst  009721.sst  009765.sst  009815.sst  009845.sst  009874.sst
006350.sst  007638.sst  008214.sst  008441.sst  008764.sst  009016.sst  009095.sst  009245.sst  009480.sst  009577.sst  009722.sst  009766.sst  009816.sst  009846.sst  009875.sst
006353.sst  007639.sst  008371.sst  008442.sst  008773.sst  009017.sst  009096.sst  009246.sst  009484.sst  009578.sst  009723.sst  009770.sst  009817.sst  009847.sst  009876.sst
006358.sst  007640.sst  008373.sst  008443.sst  008774.sst  009018.sst  009097.sst  009247.sst  009488.sst  009579.sst  009724.sst  009771.sst  009818.sst  009848.sst  009877.sst
006361.sst  007641.sst  008406.sst  008444.sst  008776.sst  009019.sst  009098.sst  009248.sst  009489.sst  009614.sst  009725.sst  009772.sst  009819.sst  009849.sst  009878.sst
006364.sst  007642.sst  008408.sst  008445.sst  008778.sst  009020.sst  009099.sst  009249.sst  009491.sst  009616.sst  009726.sst  009775.sst  009820.sst  009850.sst  009879.sst
006365.sst  007643.sst  008410.sst  008446.sst  008779.sst  009021.sst  009100.sst  009250.sst  009492.sst  009618.sst  009727.sst  009777.sst  009821.sst  009851.sst  009880.sst
006366.sst  007644.sst  008411.sst  008448.sst  008780.sst  009022.sst  009101.sst  009251.sst  009493.sst  009620.sst  009728.sst  009779.sst  009822.sst  009852.sst  009881.sst
006367.sst  007694.sst  008413.sst  008449.sst  008817.sst  009023.sst  009102.sst  009252.sst  009495.sst  009621.sst  009729.sst  009780.sst  009823.sst  009853.sst  009882.sst
006368.sst  007696.sst  008415.sst  008450.sst  008820.sst  009051.sst  009103.sst  009253.sst  009496.sst  009622.sst  009731.sst  009781.sst  009824.sst  009854.sst  009883.sst
006369.sst  007698.sst  008417.sst  008451.sst  008821.sst  009055.sst  009104.sst  009254.sst  009499.sst  009623.sst  009732.sst  009782.sst  009825.sst  009855.sst  LOCK
006370.sst  007700.sst  008420.sst  008534.sst  008914.sst  009059.sst  009105.sst  009255.sst  009501.sst  009624.sst  009733.sst  009784.sst  009826.sst  009856.sst  MANIFEST
006371.sst  007724.sst  008422.sst  008536.sst  008917.sst  009061.sst  009106.sst  009256.sst  009503.sst  009625.sst  009734.sst  009785.sst  009827.sst  009857.sst
006677.sst  007731.sst  008424.sst  008573.sst  008919.sst  009063.sst  009107.sst  009257.sst  009505.sst  009644.sst  009735.sst  009787.sst  009828.sst  009858.sst
006680.sst  007846.sst  008425.sst  008574.sst  008921.sst  009065.sst  009108.sst  009258.sst  009507.sst  009647.sst  009736.sst  009788.sst  009829.sst  009859.sst
007198.sst  007847.sst  008426.sst  008603.sst  008923.sst  009068.sst  009151.sst  009259.sst  009509.sst  009650.sst  009737.sst  009789.sst  009830.sst  009860.sst
007199.sst  007848.sst  008427.sst  008604.sst  008925.sst  009070.sst  009153.sst  009260.sst  009511.sst  009653.sst  009738.sst  009790.sst  009831.sst  009861.sst
007200.sst  007849.sst  008428.sst  008605.sst  008959.sst  009072.sst  009155.sst  009288.sst  009512.sst  009657.sst  009739.sst  009795.sst  009832.sst  009862.sst
007201.sst  007850.sst  008429.sst  008606.sst  008962.sst  009075.sst  009200.sst  009289.sst  009513.sst  009660.sst  009740.sst  009796.sst  009833.sst  009863.sst
007402.sst  007851.sst  008430.sst  008607.sst  008964.sst  009077.sst  009204.sst  009313.sst  009514.sst  009663.sst  009741.sst  009797.sst  009834.sst  009864.sst
007403.sst  008062.sst  008431.sst  008608.sst  008966.sst  009079.sst  009232.sst  009315.sst  009516.sst  009664.sst  009742.sst  009798.sst  009835.sst  009865.sst
007404.sst  008063.sst  008432.sst  008609.sst  008968.sst  009081.sst  009234.sst  009357.sst  009517.sst  009666.sst  009743.sst  009804.sst  009836.sst  009866.sst
root@ubuntu:/data/key# cd ..

emailtovamos · 2019-11-27T13:03:47Z

I sshed and got info about the SST and VLOG files as shown above. Below is the disk usage.

root@ubuntu:/data# du data
19422964        data
root@ubuntu:/data# du key/
27826628        key/

As you can see both of them have significant usage. SST files account for 27.8GB and VLOG files are 19GB. The combined volume has been increasing for the past 3 weeks with no signs to plateau.
So maybe some of the older files are not getting deleted?

oncilla · 2020-03-28T09:50:22Z

I think this is connected to dgraph-io/badger#1228

mschneider82 · 2020-04-02T14:00:43Z

@emailtovamos can you try this fix?

diff --git a/plugin/storage/badger/factory.go b/plugin/storage/badger/factory.go
index 5572ffe..d456474 100644
--- a/plugin/storage/badger/factory.go
+++ b/plugin/storage/badger/factory.go
@@ -91,6 +91,8 @@ func (f *Factory) Initialize(metricsFactory metrics.Factory, logger *zap.Logger)
 
        opts := badger.DefaultOptions
        opts.TableLoadingMode = options.MemoryMap
+       // This is important. Without this Badger would keep atleast 1 version of very key.
+       opts.NumVersionsToKeep = 0
 
        if f.Options.primary.Ephemeral {
                opts.SyncWrites = false

Signed-off-by: Matthias Schneider <matthias.schneider@retarus.de>

mschneider82 · 2020-04-03T19:20:23Z

I have created a docker all-in-one image with the patch, for me it works and the files are cleaned up, if someone wants to test: docker pull mschneider82/all-in-one:latest
/cc @emailtovamos

Please note that old badger files needs to be cleaned, it only works for new key/values (see related thread in the badger issue)

jarifibrahim · 2020-06-02T19:02:58Z

Hey guys, we've closed dgraph-io/badger#1228 with fixes in master. We'll do a new badger release soon. It would be very useful if you can test the master branch.

jeff1985 · 2020-08-11T14:48:42Z

It would be very interesting to have this work, to allow deploying small jaeger setup without more complex solutions like elasticsearch.

objectiser · 2020-08-12T09:01:14Z

@jarifibrahim Has the fixes for dgraph-io/badger#1228 been backported to a 1.x release? We are currently using 1.5.3, but if was backported to 1.6 branch we could try moving to that?

jarifibrahim · 2020-08-13T18:18:00Z

@objectiser The fix isn't backported yet. We ran into some crashes and so we had to delay the release. Apologies for the delay. I don't have an ETA on the next release but we are working on it. :)

objectiser · 2020-11-17T15:27:53Z

Closing as should be resolved by #2613.

If the problem still occurs please reopen.

pavolloffay transferred this issue from jaegertracing/jaeger-operator Nov 11, 2019

pavolloffay added the storage/badger Issues related to badger storage label Nov 11, 2019

emailtovamos mentioned this issue Nov 28, 2019

Is all-in-one with badger recommended for production cases? jaegertracing/jaeger-operator#795

Closed

mschneider82 pushed a commit to mschneider82/jaeger that referenced this issue Apr 3, 2020

fix jaegertracing#1916: dont keep old versions on disk

2923a3d

mschneider82 mentioned this issue Apr 3, 2020

Do not keep old key versions on disk in Badger #2150

Closed

mschneider82 pushed a commit to mschneider82/jaeger that referenced this issue Apr 3, 2020

fix jaegertracing#1916: dont keep old versions on disk

2a31ad7

Signed-off-by: Matthias Schneider <matthias.schneider@retarus.de>

oncilla mentioned this issue Apr 17, 2020

Tracing: limit amount of data stored scionproto/scion#3322

Closed

objectiser mentioned this issue Jun 3, 2020

Expired spans in badger still exists in disk even after long time #2266

Closed

Ackar mentioned this issue Nov 3, 2020

Bump Badger to v1.6.2 #2613

Merged

objectiser closed this as completed Nov 17, 2020

dk-lockdown mentioned this issue Dec 24, 2020

Badger still not cleaning data in jaegertracing/all-in-one:1.21.0 #2703

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Badger not cleaning data in expected manner #1916

Badger not cleaning data in expected manner #1916

emailtovamos commented Nov 11, 2019

pavolloffay commented Nov 11, 2019

pavolloffay commented Nov 11, 2019

burmanm commented Nov 13, 2019

emailtovamos commented Nov 14, 2019

burmanm commented Nov 14, 2019

emailtovamos commented Nov 14, 2019

burmanm commented Nov 20, 2019

burmanm commented Nov 20, 2019

emailtovamos commented Nov 27, 2019

emailtovamos commented Nov 27, 2019

oncilla commented Mar 28, 2020

mschneider82 commented Apr 2, 2020

mschneider82 commented Apr 3, 2020

jarifibrahim commented Jun 2, 2020

jeff1985 commented Aug 11, 2020

objectiser commented Aug 12, 2020

jarifibrahim commented Aug 13, 2020

objectiser commented Nov 17, 2020

Badger not cleaning data in expected manner #1916

Badger not cleaning data in expected manner #1916

Comments

emailtovamos commented Nov 11, 2019

pavolloffay commented Nov 11, 2019

pavolloffay commented Nov 11, 2019

burmanm commented Nov 13, 2019

emailtovamos commented Nov 14, 2019

burmanm commented Nov 14, 2019

emailtovamos commented Nov 14, 2019

burmanm commented Nov 20, 2019

burmanm commented Nov 20, 2019

emailtovamos commented Nov 27, 2019

emailtovamos commented Nov 27, 2019

oncilla commented Mar 28, 2020

mschneider82 commented Apr 2, 2020

mschneider82 commented Apr 3, 2020

jarifibrahim commented Jun 2, 2020

jeff1985 commented Aug 11, 2020

objectiser commented Aug 12, 2020

jarifibrahim commented Aug 13, 2020

objectiser commented Nov 17, 2020