-
Notifications
You must be signed in to change notification settings - Fork 734
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[NEW] Kill unkillable scripts #1783
Comments
I would vote for adding the |
Great, a FORCE LGTM, i think we can do it. btw, lets also talk about #866 (comment)? Currently to ensure script atomicity, we can bypass maxmemory (data memory or lua heap memory). If we allow kill unkillable scripts, i think we can fix it. (we allow killing them in internal fork, ie. return OOM early)
|
To return OOM early is an automatic kill, not SCRIPT KILL, right? If we let the script be killed automatically by the memory limit, it's a behavior change that maybe should have a config, or at least be done in a major release. |
yes, it is an automatic kill. There are always people messing around with scripts... We will give an extra 30MB of memory in case it is really lua tmp memory.
yes, at least we're trying to take this step. This is usually the result of malicious script execution. |
Have we seen that much actual asks for this? I'm really concerned about comments like this, because the problem is people don't really understand the consequences and leave the datastore in a corrupted state. I like the alternatives that Viktor mentioned such as building rollbacks through a write ahead or an undo log. They're a lot more engineering effort, but they prevent inconsistent data. Inconsistent data is an extremely dangerous place to start wandering in to. A second major concern I have here is that users basically never actually want to call script kill. Valkey is entirely single threaded and extremely latency sensitive. If you have a script running for 5 seconds, your application has already been down for 5 seconds. Paging an operator is at least 15 minutes away, and then they have to make a time pressure decision about what to do. If you're looking to automate, the safest thing is to build tooling around safely failing over to a replica or by reloading from the AOF.
By malicious, do we mean like some type of hacker? If that is the case, I think the user has much larger problems than DOS. They could just intentionally corrupt all of the data. I think a long running lua script is actually fairly detectable. |
@madolson those are very good arguments against allowing to leave the database in an inconsistent state. Last week I started to play a bit with the rollback stuff by implementing transactions that only guarantee atomicity and consistency (no isolation, or durability) on top of our hashtable implementation, but it requires changes to all data types code to support the rollback of transactions, and I'm not sure to be a good approach to follow. Probably, solving the problem only for Lua scripts might be easier than making a generic infrastructure to support multi-command atomicity. |
The problem/use-case that the feature addresses
If a Lua script has done any writes, it's not killable using SCRIPT KILL, because it would violate the atomicity guarantee of Lua script. Only shutting down the server can kill the script.
The atomicity of a script isn't always that important though. In some workloads, it would be totally fine to violate it. At least, it's not as bad as having to restart the server.
Often when you have a hanging script, it's because of some infinite loop or other bug in the script. If you have a buggy script, it may not write correct stuff to the database anyway.
Description of the feature
Allow SCRIPT KILL (and FUNCTION KILL) to kill a script even if it has done some writes.
We could add some extra FORCE argument, or a config, if we're worried about changing the default behavior.
Alternatives you've considered
Some way to undo the writes done by a script, such as a redo-list, a frozen copy of the key space similar, a copy-on-write implemented in user space.
Additional information
The text was updated successfully, but these errors were encountered: