-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Expired spans in badger still exists in disk even after long time #2266
Comments
@objectiser, Could you make almost all badger configurations to be passed as environment variables/cli flags.? Right now, jaeger supports only a subset of configuration options as env. variables. So I had to hardcode to pass further options to badger. To avoid the above problem #2266, following options worked for my requirements.
Now, the memory usage is good as less number of tables are kept in memory and compaction happens faster as table size and level one size is low. Also a new vlog file will be created for every 32Mi chunk of value data, so value log gc will happen faster. Previously new vlog file will be created for every 1Gi , so gc will be delayed until a new vlog file is created. Also to trigger gc (reclaim space), compaction has to happen frequently as it determines which keys (n values) to be removed. NumVersionsToKeep is set to 0. If it is set to 1 (default), expired keys will not be marked for deletion. There is an inherent problem in v1.5.3 which can be solved by upgrading. [dgraph-io/badger/pull/1006] I gathered these data by going through badger github issues. Especially by following comments of @jarifibrahim. Thanks to him. Note: The above options worked well for my use case but it has downsides like slowness in accessing data etc. So user should evaluate these options carefully before modifying them. |
@objectiser We've tried to backport all the fixes. Is there a specific that's in v2.x release and not in v1.x release? We recently made some bug fixes which are not yet backported but I will do that before the next badger 1.x release |
@vasanth29797 Please do not set We have a fix for the number of versions bug in badger master. We'll do a release soon with the fix. |
@jarifibrahim, I have seen many people using NumVersionsToKeep=0 to solve their issues. So I gave it a try and it worked well. It seems like the issue dgraph-io/badger#1228 has been closed. So I will use the latest version and leave it to 1. |
@vasanth29797 Yeah. There's no severe side effect of setting the number of versions to zero except that only the deleted keys will be removed and all the other keys would stay in the system forever. |
@jarifibrahim great - if you could let us know when a candidate release is available, we can get someone to test it. In the meantime, if @vasanth29797 is able to verify the fix that would be awesome. |
@objectiser Sure! I'll ping on this thread once a release candidate is ready. |
@objectiser I performed analysis on both older (1.5.3) and newer (master) version of badger with the same set of badger configurations. Badger configurations
Prometheus metricsbadger v1.5.3
badger latest - master
So, memory wise there is a big improvement. The vlog data files in disk are being garbage collected well but the key files in disk are nearly intact thus occupying significant portion of disk. If this pattern continues, over time as data comes in, disk usage will grow till it reaches the limit. |
@vasanth29797 Do you have a test that I can look at? Badger had a bug that prevented keys from getting cleaned up but it should be fixed with master version of badger. If you have a test that I can run then please do share and I'll take a look at it. The badger logs will also have some useful information. |
@jarifibrahim, The way I did this analysis is:
Then after the expiry of all spans, I noted the data from prometheus. I am not use whether you will be able to reproduce this. Also jaeger logs will be noisy so its difficult to scrape badger logs will it be helpful if I send output of following command on volume dir: badger info --dir . --show-keys --show-internal --show-tables |
what kind of business use case are you trying to solve?
Using all-in-one:1.18 docker image with badger as storage in Azure Kubernetes Service
Problem - what in Jaeger blocks you from solving the requirement?
Expired data (spans) and its metadata are not purged from disk and memory even after hours from its expiry time.
Kubernetes deployment file
Details:
I set span TTL as 10 minutes. Didn't set constraint on resources. Spans once expired cannot be queried from jaeger UI. This is working as expected. But the data associated with those spans (key and value) are still in disk (PVC), even after 2 hours from its expiry time. I think the keys associated with those spans + some metadata are still in memory as well. The effect of GC and compaction aren't much.
Jaeger instance received only 756K spans from its start and after that it didn't receive new spans.
prometheus metrics after 2 hours from expiry time
jaeger_badger_lsm_size_bytes - 330 MiB
jaeger_badger_vlog_size_bytes - 211 MiB
memory usage - 640 MiB
pvc usage - 508 MIB
directory wise:
data - 177.5 MiB (1 vlog file)
key - 330 MiB (6 sst file)
I am planning to use this jaeger instance in production although I am aware of other storage options. So this will become a serious issue atleast memory-wise when jaeger receives very high number of spans.
Proposal - what do you suggest to solve the problem or improve the existing situation?
Cleaning up expired data in regular/consistent fashion, so both pvc and memory usage should come near to a state where jaeger was when it spawned up.
Any open questions to address
In issue dgraph-io/badger#1124, they mentioned the newer version of badger v2.0.0 will solve this. I think jaeger v1.18 uses badger v1.5.3. When jaeger uses newer version of badger, will this issue be resolved?
The text was updated successfully, but these errors were encountered: