Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

server: memory pressure notification hooks #64965

Open
jordanlewis opened this issue May 10, 2021 · 7 comments
Open

server: memory pressure notification hooks #64965

jordanlewis opened this issue May 10, 2021 · 7 comments
Labels
A-cc-enablement Pertains to current CC production issues or short-term projects A-observability-inf C-investigation Further steps needed to qualify. C-label will change. T-observability

Comments

@jordanlewis
Copy link
Member

jordanlewis commented May 10, 2021

When CockroachDB runs out of memory, it usually manifests as a SIGKILL from the oomkiller, giving the program no time to respond with any crash dumps or other emergency actions.

It appears that cgroups can be configured to send notifications before this happens, at a configurable percent of used memory. See: https://www.kernel.org/doc/html/latest/admin-guide/cgroup-v2.html#memory-interface-files

It would be useful to let CockroachDB detect a close-to-OOM situation and perform custom logic, such as performing a crashdump, dumping active goroutines and a memory profile, or even dumping unnecessary caches.

Even without cgroups, perhaps we could poll Go's memstats and compare against a configured maximum, and use that to trigger any hooks.

Jira issue: CRDB-7366

@jordanlewis jordanlewis added the C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. label May 10, 2021
@ajwerner
Copy link
Contributor

The Go folks have always wanted people to experiment with the prototype linked in golang/go#29696 (https://go-review.googlesource.com/c/go/+/46751). Maybe this exploration would call for giving that thing a shot.

@knz knz added C-investigation Further steps needed to qualify. C-label will change. and removed C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. labels May 11, 2021
@abarganier
Copy link
Contributor

@ajwerner thanks for linking this - this definitely seems like something worth experimenting with, especially considering runtime experiments are happening elsewhere, such as @knz's task group resource accounting.

In a hypothetical where we actually used the SetMaxHeap API in crdb, I imagine we could have a goroutine responsible for crash dumps wait on the SetMaxHeap notify channel, where bytes is set to a very high value (one we'd expect to be a precursor to an OOM). If there's a send on the channel, it can immediately begin the process of writing crash dump information to a file in expectation of an OOM. Since you know crdb's internals much better than I do, what's your take on this idea? @jordanlewis?

@knz
Copy link
Contributor

knz commented May 25, 2021

The idea is sound.

@knz
Copy link
Contributor

knz commented May 25, 2021

Might even be worth adding an explicit call to runtime.GC() in that signal handler, who knows it may delay the OOM crash a bit further.

@jlinder jlinder added the T-server-and-security DB Server & Security label Jun 16, 2021
@knz knz added the A-cc-enablement Pertains to current CC production issues or short-term projects label Jul 29, 2021
@knz
Copy link
Contributor

knz commented Jul 29, 2021

this is obs infra, not server

@abarganier
Copy link
Contributor

abarganier commented Jul 29, 2021

Just a quick update on this - through experimentation I was unsuccessful in subscribing to the memory.pressure_level notification from nodes running in containers, as the container processes' access to the cgroupfs is (understandably) readonly. The process needs write access to the cgroup.event_control file in the memory subsystem in order to subscribe to these notifications, and giving such write access comes with a big sacrifice in security (giving the container privileged access).

I've not given up entirely on this as it would be an exceptional heuristic for us to gain access to from inside each crdb node. Perhaps we can explore the possibility of subscribing to such notifications in the orchestration layer and then deliver notifications to the relevant crdb node over the network. More experimentation needs to be done to determine if this is a valid approach.

@github-actions
Copy link

We have marked this issue as stale because it has been inactive for
18 months. If this issue is still relevant, removing the stale label
or adding a comment will keep it active. Otherwise, we'll close it in
10 days to keep the issue queue tidy. Thank you for your contribution
to CockroachDB!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-cc-enablement Pertains to current CC production issues or short-term projects A-observability-inf C-investigation Further steps needed to qualify. C-label will change. T-observability
Projects
None yet
Development

No branches or pull requests

5 participants