Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[NEW] More refined memory management system #1792

Open
soloestoy opened this issue Feb 27, 2025 · 7 comments
Open

[NEW] More refined memory management system #1792

soloestoy opened this issue Feb 27, 2025 · 7 comments

Comments

@soloestoy
Copy link
Member

The discussion at #831 regarding maxmemory flexibility has raised several considerations. Particularly, the current maxmemory configuration and its implications require scrutiny - specifically whether memory overlimit handling correctly distinguishes between data and non-data memory usage. The current mechanism of deleting user data when non-data memory consumption grows (e.g., system operational memory) appears unreasonable.

I discussed with several members of @valkey-io/core-team and think that it is necessary to implement more refined memory management.

Unlike traditional databases that store data on disk and use memory mainly for caching and essential operational buffers (where memory shortage typically triggers swapping rather than data deletion), Valkey stores all data in memory. However, Valkey's memory contains not only user data but also various operational components like client I/O buffers. The current implementation forces data eviction when total memory exceeds maxmemory, even when the overusage stems from system operations rather than actual data growth - an unfair approach. Moreover, when using no-eviction policy, uncontrolled growth of system memory can cause total usage to far exceed maxmemory limits.

Below is a sample memory statistics output via MEMORY STATS command showing current component breakdown (note some memory usages remain unaccounted):

$./valkey-cli
127.0.0.1:6379> memory stats
 1) "peak.allocated"
 2) (integer) 52217520
 3) "total.allocated"
 4) (integer) 52203176
 5) "startup.allocated"
 6) (integer) 879216
 7) "replication.backlog"
 8) (integer) 10485896
 9) "clients.slaves"
10) (integer) 53296
11) "clients.normal"
12) (integer) 78336
13) "cluster.links"
14) (integer) 0
15) "aof.buffer"
16) (integer) 3584
17) "lua.caches"
18) (integer) 0
19) "functions.caches"
20) (integer) 224
21) "db.0"
22) 1) "overhead.hashtable.main"
    2) (integer) 19826904
    3) "overhead.hashtable.expires"
    4) (integer) 96
23) "overhead.db.hashtable.lut"
24) (integer) 9468504
25) "overhead.db.hashtable.rehashing"
26) (integer) 0
27) "overhead.total"
28) (integer) 31327552
29) "db.dict.rehashing.count"
30) (integer) 0
31) "keys.count"
32) (integer) 647394
33) "keys.bytes-per-key"
34) (integer) 32
35) "dataset.bytes"
36) (integer) 20875624
37) "dataset.percentage"
38) "40.674224853515625"
39) "peak.percentage"
40) "99.9725341796875"
41) "allocator.allocated"
42) (integer) 52301872
43) "allocator.active"
44) (integer) 53227520
45) "allocator.resident"
46) (integer) 63315968
47) "allocator.muzzy"
48) (integer) 0
49) "allocator-fragmentation.ratio"
50) "1.0176981687545776"
51) "allocator-fragmentation.bytes"
52) (integer) 925648
53) "allocator-rss.ratio"
54) "1.1895344257354736"
55) "allocator-rss.bytes"
56) (integer) 10088448
57) "rss-overhead.ratio"
58) "1.0172078609466553"
59) "rss-overhead.bytes"
60) (integer) 1089536
61) "fragmentation"
62) "1.2338569164276123"
63) "fragmentation.bytes"
64) (integer) 12206984

Currently, only clients.slaves and aof.buffer are excluded from eviction calculations. Other system memory growth still triggers data eviction, which is problematic.

Additional unaccounted system memory includes:

  • Database structures: blocking_keys, ready_keys, watched_keys dicts
  • Pub/sub components: pubsub_channels, pubsub_patterns, pubsubshard_channels
  • Tracking structures: tracking_pending_keys
  • Command hashtables

Memory composition diagram:

Key-value data memory
├─ User data: robj memory
├─ Data expiration structures
└─ Hash metadata

User metadata
├─ DB/kvstore structures
├─ Miscellaneous: pubsub, tracking, command hashtables
└─ Client I/O buffers

System operational memory
├─ Command log
├─ Latency tracking
├─ AOF buffer
├─ Global replication buffer
└─ Replication backlog

To establish fair and granular memory management, I propose implementing categorized memory limits with corresponding handling mechanisms:

  1. maxmemory-dataset: Data eviction when exceeded
  2. maxmemory-clients: Connection termination when client buffers overflow
  3. maxmemory-replication-buffer: Write throttling or sync disconnection
  4. maxmemory-aof-buffer: Write throttling or prioritized disk flushing
  5. maxmemory-lua-caches: Script cache eviction
  6. and so on

This hierarchical approach would enable differentiated control over various memory components while maintaining system stability and fairness during memory pressure scenarios.

@zuiderkwast
Copy link
Contributor

I like this idea.

Should we track dataset memory accurately then? Currently, we don't track it accurately. We just infer it as total memory minus overhead.

In #852 there is a suggestion to track memory per slot. The total dataset memory is the sum for all slots.

@hpatro
Copy link
Collaborator

hpatro commented Feb 27, 2025

Nice idea @soloestoy.

I've a bit of apprehension though. In terms of usability, won't it become a nightmare to determine the correct value for each of these ? And based on the workload behavior / different scenarios within the lifecycle of a given application, it would require different value at different point of time.

For e.g., sometimes users are not aware of the increasing size of the values and after a certain period of time, client output buffer starts facing overflow scenarios when there is a read heavy workload. Also, a sudden burst in write traffic could cause replication buffer overflow.

Even though this would give us fine grained control, the overhead of understanding each of these knobs and managing it well enough seems quite a difficult task for administrators.

@madolson
Copy link
Member

In #852 there is a suggestion to track memory per slot. The total dataset memory is the sum for all slots.

@kyle-yh-kim Just to explicitly bring you into the conversation. We talked about reviving the conversation for Valkey 9.0.

@PingXie
Copy link
Member

PingXie commented Feb 28, 2025

Thanks for the detailed write up @soloestoy!

It is a great observation that memory in our case serves two roles. It is both storage for the user data (the disk for other databases) and the resource to support user requests. Being able to express and manage the two roles explicitly IMO brings more clarity to memory management. I am directionally aligned.

@hwware what do you think about pausing #831 and tidy'ing up our memory management first?

@hwware
Copy link
Member

hwware commented Feb 28, 2025

Thanks for the detailed write up @soloestoy!

It is a great observation that memory in our case serves two roles. It is both storage for the user data (the disk for other databases) and the resource to support user requests. Being able to express and manage the two roles explicitly IMO brings more clarity to memory management. I am directionally aligned.

@hwware what do you think about pausing #831 and tidy'ing up our memory management first?

Agree with you. Let's change our direction to this issue.

@soloestoy
Copy link
Member Author

soloestoy commented Feb 28, 2025

Nice idea @soloestoy.

I've a bit of apprehension though. In terms of usability, won't it become a nightmare to determine the correct value for each of these ? And based on the workload behavior / different scenarios within the lifecycle of a given application, it would require different value at different point of time.

For e.g., sometimes users are not aware of the increasing size of the values and after a certain period of time, client output buffer starts facing overflow scenarios when there is a read heavy workload. Also, a sudden burst in write traffic could cause replication buffer overflow.

Even though this would give us fine grained control, the overhead of understanding each of these knobs and managing it well enough seems quite a difficult task for administrators.

@hpatro thank you for raising these points. Yes, there is indeed a contradiction here: finer-grained memory management requires more configuration, which introduces learning costs for users. We need to strike a balance and implement changes in phases.

  1. The first step should involve precisely tracking dataset memory usage, which aligns with user needs and improves clarity. Users can configure data planning via maxmemory-dataset. Importantly, this isolates data memory from non-data components—ensuring data eviction only occurs when dataset limits are exceeded, a logical and intuitive behavior.
  2. Non-Dataset memory are more complex, but partial progress has already been made:

In fact, many aspects of this granular memory management are already underway. I believe this systematic approach is essential to evolve Valkey into a more robust and production-ready database.

@madolson
Copy link
Member

Read through everything now, and I agree about having optional dedicated memory buffers for clients, replication, aof, and lua cache. I think we should probably leave maxmemory for dataset as is, and just let it the remaining be additional buffers reserved out of it. Basically, everything that is not set aside for replication, clients, etc is given to maxmemory.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants