Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add DENYOOM flag to SCRIPT LOAD and make it fail on OOM #866

Open
wants to merge 1 commit into
base: unstable
Choose a base branch
from

Conversation

enjoy-binbin
Copy link
Member

Currently we can load a lot of lua to bypass maxmemory,
adds a DENYOOM flag to make it fail on OOM.

Currently we can load a lot of lua to bypass maxmemory,
adds a DENYOOM flag to make it fail on OOM.

Signed-off-by: Binbin <binloveplay1314@qq.com>
Copy link

codecov bot commented Aug 2, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 70.24%. Comparing base (b728e41) to head (f6049db).
Report is 5 commits behind head on unstable.

Additional details and impacted files
@@             Coverage Diff              @@
##           unstable     #866      +/-   ##
============================================
- Coverage     70.47%   70.24%   -0.24%     
============================================
  Files           112      112              
  Lines         61467    61467              
============================================
- Hits          43320    43177     -143     
- Misses        18147    18290     +143     
Files Coverage Δ
src/commands.def 100.00% <ø> (ø)

... and 13 files with indirect coverage changes

madolson
madolson previously approved these changes Aug 7, 2024
@madolson
Copy link
Member

madolson commented Aug 7, 2024

This makes sense to me @soloestoy Do you know of any use case where this would cause an issue. Multi-exec with script load should fail. I guess it's a breaking change so we need a major change probably.

@madolson madolson added breaking-change Indicates a possible backwards incompatible change release-notes This issue should get a line item in the release notes labels Aug 7, 2024
@madolson madolson dismissed their stale review August 7, 2024 22:56

Realized something

@madolson
Copy link
Member

madolson commented Aug 7, 2024

I suppose I realized one case this might fail. If someone is trying to load a script that cleans up data, it might fail unnecessarily. There are also previous cases where multi-execs would succeed but should now fail.

@madolson madolson added the major-decision-pending Major decision pending by TSC team label Aug 7, 2024
@soloestoy
Copy link
Member

I think this would be hard for users to understand. Currently, when the maxmemory is exceeded, data eviction is triggered, so DENYOOM is mainly aimed at data write operations. However, the scripts loaded by script load are not data and are not evicted, which makes it difficult to explain.

This also reminds me of a feature I worked on before redis/redis#5454, where in order to prevent clients from caching too many commands in a MULTI context, causing uncontrolled memory growth, we marked transactions as dirty when we exceeded maxmemory. However, this was eventually reverted redis/redis#12961 because it was hard for users to understand.

Maybe we can take a different approach by adding maxmemory-scripts to limit the size of cached scripts. It would only be allowed to use a certain percentage of maxmemory, and once exceeded, the scripts would be evicted (even those loaded by script load).

@madolson
Copy link
Member

Maybe we can take a different approach by adding maxmemory-scripts to limit the size of cached scripts. It would only be allowed to use a certain percentage of maxmemory, and once exceeded, the scripts would be evicted (even those loaded by script load).

I guess I want to ask how much of a real problem this is? AFAIK we at AWS have never seen this manifest at a problem, but maybe others have?

@soloestoy
Copy link
Member

I have seen many instances where number_of_cached_scripts reaches the million level, and used_memory_scripts_eval consumes tens of gigabytes of memory, lol. When a feature is unrestricted, it is prone to abuse.

@enjoy-binbin
Copy link
Member Author

enjoy-binbin commented Aug 15, 2024

I have seen many instances where number_of_cached_scripts reaches the million level, and used_memory_scripts_eval consumes tens of gigabytes of memory, lol. When a feature is unrestricted, it is prone to abuse.

yeah, i have seem this too (both from eval scripts, haven't seen script load yet), that why now we have a eval script eviction.

btw, we now have these following issue:

  1. lua does not use the jemalloc, and we do not count lua memory into used_memory
  2. abuse of script loading (so maxmemory-scripts seems like a good idea, but i do think we need a DENYOOM flag when maxmemory-scripts is 0(unlimit))
  3. to ensure script atomicity, we can bypass maxmemory (data memory or lua heap memory):
EVAL "for i=1,10000000 do redis.call('SET', 'key-kkkk:' .. i, 'value-tttt:' .. i) end" 0
EVAL "for i=1,10000000 do redis.pcall('SET', 'key-kkkk:' .. i, 'value-tttt:' .. i) end" 0
EVAL "local a={}; local i; for i=1, 30000000 do a[i]=i; end" 0

@madolson
Copy link
Member

madolson commented Nov 4, 2024

I have seen many instances where number_of_cached_scripts reaches the million level, and used_memory_scripts_eval consumes tens of gigabytes of memory, lol. When a feature is unrestricted, it is prone to abuse.

We have also seen this, but this is almost always from EVAL abuse, not SCRIPT LOAD + EVALSHA abuse. It looks like binbin also mentioned the same thing.

abuse of script loading (so maxmemory-scripts seems like a good idea, but i do think we need a DENYOOM flag when maxmemory-scripts is 0(unlimit))

I think the concern in the past was that many clients, on new connection establishment will send SCRIPT LOAD with some scripts that may reduce maxmemory, but they error out because the SCRIPT LOAD fails so they can never reduce memory. I would be more inclined to add a config here so that managed valkey providers have an operational config they can change but the default doesn't change.

@enjoy-binbin
Copy link
Member Author

I would be more inclined to add a config here so that managed valkey providers have an operational config they can change but the default doesn't change.

sound good to me, what kind of config are we talking about here? maxmemory-scripts to limit the scripts memory (including server part and the lua VM part)?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking-change Indicates a possible backwards incompatible change major-decision-pending Major decision pending by TSC team release-notes This issue should get a line item in the release notes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants