[NEW] More refined memory management system #1792

soloestoy · 2025-02-27T12:21:48Z

The discussion at #831 regarding maxmemory flexibility has raised several considerations. Particularly, the current maxmemory configuration and its implications require scrutiny - specifically whether memory overlimit handling correctly distinguishes between data and non-data memory usage. The current mechanism of deleting user data when non-data memory consumption grows (e.g., system operational memory) appears unreasonable.

I discussed with several members of @valkey-io/core-team and think that it is necessary to implement more refined memory management.

Unlike traditional databases that store data on disk and use memory mainly for caching and essential operational buffers (where memory shortage typically triggers swapping rather than data deletion), Valkey stores all data in memory. However, Valkey's memory contains not only user data but also various operational components like client I/O buffers. The current implementation forces data eviction when total memory exceeds maxmemory, even when the overusage stems from system operations rather than actual data growth - an unfair approach. Moreover, when using no-eviction policy, uncontrolled growth of system memory can cause total usage to far exceed maxmemory limits.

Below is a sample memory statistics output via MEMORY STATS command showing current component breakdown (note some memory usages remain unaccounted):

$./valkey-cli
127.0.0.1:6379> memory stats
 1) "peak.allocated"
 2) (integer) 52217520
 3) "total.allocated"
 4) (integer) 52203176
 5) "startup.allocated"
 6) (integer) 879216
 7) "replication.backlog"
 8) (integer) 10485896
 9) "clients.slaves"
10) (integer) 53296
11) "clients.normal"
12) (integer) 78336
13) "cluster.links"
14) (integer) 0
15) "aof.buffer"
16) (integer) 3584
17) "lua.caches"
18) (integer) 0
19) "functions.caches"
20) (integer) 224
21) "db.0"
22) 1) "overhead.hashtable.main"
    2) (integer) 19826904
    3) "overhead.hashtable.expires"
    4) (integer) 96
23) "overhead.db.hashtable.lut"
24) (integer) 9468504
25) "overhead.db.hashtable.rehashing"
26) (integer) 0
27) "overhead.total"
28) (integer) 31327552
29) "db.dict.rehashing.count"
30) (integer) 0
31) "keys.count"
32) (integer) 647394
33) "keys.bytes-per-key"
34) (integer) 32
35) "dataset.bytes"
36) (integer) 20875624
37) "dataset.percentage"
38) "40.674224853515625"
39) "peak.percentage"
40) "99.9725341796875"
41) "allocator.allocated"
42) (integer) 52301872
43) "allocator.active"
44) (integer) 53227520
45) "allocator.resident"
46) (integer) 63315968
47) "allocator.muzzy"
48) (integer) 0
49) "allocator-fragmentation.ratio"
50) "1.0176981687545776"
51) "allocator-fragmentation.bytes"
52) (integer) 925648
53) "allocator-rss.ratio"
54) "1.1895344257354736"
55) "allocator-rss.bytes"
56) (integer) 10088448
57) "rss-overhead.ratio"
58) "1.0172078609466553"
59) "rss-overhead.bytes"
60) (integer) 1089536
61) "fragmentation"
62) "1.2338569164276123"
63) "fragmentation.bytes"
64) (integer) 12206984

Currently, only clients.slaves and aof.buffer are excluded from eviction calculations. Other system memory growth still triggers data eviction, which is problematic.

Additional unaccounted system memory includes:

Database structures: blocking_keys, ready_keys, watched_keys dicts
Pub/sub components: pubsub_channels, pubsub_patterns, pubsubshard_channels
Tracking structures: tracking_pending_keys
Command hashtables

Memory composition diagram:

Key-value data memory
├─ User data: robj memory
├─ Data expiration structures
└─ Hash metadata

User metadata
├─ DB/kvstore structures
├─ Miscellaneous: pubsub, tracking, command hashtables
└─ Client I/O buffers

System operational memory
├─ Command log
├─ Latency tracking
├─ AOF buffer
├─ Global replication buffer
└─ Replication backlog

To establish fair and granular memory management, I propose implementing categorized memory limits with corresponding handling mechanisms:

maxmemory-dataset: Data eviction when exceeded
maxmemory-clients: Connection termination when client buffers overflow
maxmemory-replication-buffer: Write throttling or sync disconnection
maxmemory-aof-buffer: Write throttling or prioritized disk flushing
maxmemory-lua-caches: Script cache eviction
and so on

This hierarchical approach would enable differentiated control over various memory components while maintaining system stability and fairness during memory pressure scenarios.

The text was updated successfully, but these errors were encountered:

zuiderkwast · 2025-02-27T15:28:56Z

I like this idea.

Should we track dataset memory accurately then? Currently, we don't track it accurately. We just infer it as total memory minus overhead.

In #852 there is a suggestion to track memory per slot. The total dataset memory is the sum for all slots.

hpatro · 2025-02-27T18:26:16Z

Nice idea @soloestoy.

I've a bit of apprehension though. In terms of usability, won't it become a nightmare to determine the correct value for each of these ? And based on the workload behavior / different scenarios within the lifecycle of a given application, it would require different value at different point of time.

For e.g., sometimes users are not aware of the increasing size of the values and after a certain period of time, client output buffer starts facing overflow scenarios when there is a read heavy workload. Also, a sudden burst in write traffic could cause replication buffer overflow.

Even though this would give us fine grained control, the overhead of understanding each of these knobs and managing it well enough seems quite a difficult task for administrators.

madolson · 2025-02-27T18:30:11Z

In #852 there is a suggestion to track memory per slot. The total dataset memory is the sum for all slots.

@kyle-yh-kim Just to explicitly bring you into the conversation. We talked about reviving the conversation for Valkey 9.0.

PingXie · 2025-02-28T00:38:11Z

Thanks for the detailed write up @soloestoy!

It is a great observation that memory in our case serves two roles. It is both storage for the user data (the disk for other databases) and the resource to support user requests. Being able to express and manage the two roles explicitly IMO brings more clarity to memory management. I am directionally aligned.

@hwware what do you think about pausing #831 and tidy'ing up our memory management first?

hwware · 2025-02-28T01:27:55Z

Thanks for the detailed write up @soloestoy!

It is a great observation that memory in our case serves two roles. It is both storage for the user data (the disk for other databases) and the resource to support user requests. Being able to express and manage the two roles explicitly IMO brings more clarity to memory management. I am directionally aligned.

@hwware what do you think about pausing #831 and tidy'ing up our memory management first?

Agree with you. Let's change our direction to this issue.

soloestoy · 2025-02-28T03:31:18Z

Nice idea @soloestoy.

I've a bit of apprehension though. In terms of usability, won't it become a nightmare to determine the correct value for each of these ? And based on the workload behavior / different scenarios within the lifecycle of a given application, it would require different value at different point of time.

For e.g., sometimes users are not aware of the increasing size of the values and after a certain period of time, client output buffer starts facing overflow scenarios when there is a read heavy workload. Also, a sudden burst in write traffic could cause replication buffer overflow.

Even though this would give us fine grained control, the overhead of understanding each of these knobs and managing it well enough seems quite a difficult task for administrators.

@hpatro thank you for raising these points. Yes, there is indeed a contradiction here: finer-grained memory management requires more configuration, which introduces learning costs for users. We need to strike a balance and implement changes in phases.

The first step should involve precisely tracking dataset memory usage, which aligns with user needs and improves clarity. Users can configure data planning via maxmemory-dataset. Importantly, this isolates data memory from non-data components—ensuring data eviction only occurs when dataset limits are exceeded, a logical and intuitive behavior.
Non-Dataset memory are more complex, but partial progress has already been made:
- maxmemory-clients: Already implemented. And in my view, users should plan their data access patterns, and the connection termination triggered by exceeding this limit serves as a protective mechanism. This prevents memory growth from exhausting system resources and causing machine crashes.
- maxmemory-replication-buffer: Partially addressed via client-output-buffer-limit replica. Further discussions on throttling (e.g., [NEW] Prevent endless replication loop by throttling write commands #1649’s write-throttling proposal), and handling AOF buffer limits are similar and needed.
- maxmemory-lua-cache: Under discussion Add DENYOOM flag to SCRIPT LOAD and make it fail on OOM #866 (comment).

In fact, many aspects of this granular memory management are already underway. I believe this systematic approach is essential to evolve Valkey into a more robust and production-ready database.

madolson · 2025-02-28T17:57:41Z

Read through everything now, and I agree about having optional dedicated memory buffers for clients, replication, aof, and lua cache. I think we should probably leave maxmemory for dataset as is, and just let it the remaining be additional buffers reserved out of it. Basically, everything that is not set aside for replication, clients, etc is given to maxmemory.

madolson mentioned this issue Feb 28, 2025

[NEW] Add info fields for the number of each object type in the database #1803

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[NEW] More refined memory management system #1792

[NEW] More refined memory management system #1792

soloestoy commented Feb 27, 2025

zuiderkwast commented Feb 27, 2025

hpatro commented Feb 27, 2025

madolson commented Feb 27, 2025

PingXie commented Feb 28, 2025

hwware commented Feb 28, 2025

soloestoy commented Feb 28, 2025 •

edited

Loading

madolson commented Feb 28, 2025

[NEW] More refined memory management system #1792

[NEW] More refined memory management system #1792

Comments

soloestoy commented Feb 27, 2025

zuiderkwast commented Feb 27, 2025

hpatro commented Feb 27, 2025

madolson commented Feb 27, 2025

PingXie commented Feb 28, 2025

hwware commented Feb 28, 2025

soloestoy commented Feb 28, 2025 • edited Loading

madolson commented Feb 28, 2025

soloestoy commented Feb 28, 2025 •

edited

Loading