Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory competition between notebook processes - proactively run GC? #1850

Open
BioTurboNick opened this issue Jan 24, 2022 · 13 comments
Open
Labels
backend Concerning the julia server and runtime online deployment About deploying to binder, heroku, self-hosted

Comments

@BioTurboNick
Copy link
Contributor

BioTurboNick commented Jan 24, 2022

I have an issue where there's a shared Pluto instance and multiple people are using notebooks. They leave them open, and in total the notebooks are absorbing a huge chunk of system memory.

I've noticed that Pluto memory usage can grow really large to fill available memory, without running the GC. We're talking 4-5 GB when after GC it's 2.5 GB, in one instance.

I think it would help to have Pluto run the GC on a notebook process automatically to reduce its footprint on the system. Either when no cells are pending, or some time after the notebook has been idle?

@dralletje
Copy link
Collaborator

dralletje commented Jan 24, 2022

Can you create a MWE example notebook?
Like one where you don't GC at the end of cells and we can see the memory go up, and one where you put a forced GC at the end of cells and we can see the memory be lower?

@BioTurboNick
Copy link
Contributor Author

On my system with 64 GB of RAM, this code accumulates ~1 GB of RAM:

for i = 1:100
	x = zeros(100000000)
end

If you open up another notebook, same code but append GC.gc(), RAM usage accumulates to 1 GB and then drops to ~270 MB. Alternately running each cell can toggle between memory load. The original notebook maintains 1 GB of RAM.

In this test case, I'm noticing it won't go much above 1 GB before the GC runs.

@fonsp fonsp added backend Concerning the julia server and runtime online deployment About deploying to binder, heroku, self-hosted labels Jan 30, 2022
@BioTurboNick
Copy link
Contributor Author

BioTurboNick commented Dec 9, 2022

@fonsp - Just wanted to bump this.

It seems like it would be easy enough to implement? After all cells have run, execute GC.gc(), and possibly ccall(:malloc_trim, Int32, (Int32,), 0)

I'd do a PR but not sure where to put it.

The issue is that the GC only runs when memory is requested by a process. So if no new memory is requested by a notebook, it'll just sit there holding on to unneeded memory forever.

@fonsp
Copy link
Owner

fonsp commented Dec 13, 2022

Hi @BioTurboNick , thanks!

My only concern is making interactive notebooks slower, e.g. @bind x Slider(1:100) and plot(rand(x)). It looks like the overhead of calling GC.gc() is at least 50ms, which is too much to allow for this case.

Perhaps we could run it only after the initial run, or only after a manual run (Shift+Enter), not a bond change.

What is ccall(:malloc_trim, Int32, (Int32,), 0)? Do you have any links to learn more? It's best if we don't use internal Julia API unless absolutely necessary.

@BioTurboNick
Copy link
Contributor Author

BioTurboNick commented Dec 13, 2022

Fair point about adjusting sliders... could there be a cancelable task spawned that waits some duration and is reset each time cells are run? EDIT: or just on those runs you suggested.

Re: malloc_trim: JuliaLang/julia#42566 (comment) (the process in question was a Pluto notebook)

Basically, Julia frees memory with the GC but doesn't always release it back to the OS. This is good if Julia needs to allocate that memory again soon because it doesn't have to make a syscall. This may be improved in a later release. Trim makes it give up some of that memory.

@BioTurboNick
Copy link
Contributor Author

Just to add that a common pattern I think is to put long/intensive executions behind checkboxes or button. It would be nice if this feature could run then too, somehow. Like if the execution was longer than 1 s, the overhead of the GC may be acceptable.

@fonsp
Copy link
Owner

fonsp commented Dec 13, 2022

Can GC run safely in a separate thread?

@fonsp
Copy link
Owner

fonsp commented Dec 13, 2022

RE: malloc_trim: I think it's best to wait a bit for JuliaLang/julia#42566 to progress. Adding ccall(:malloc_trim, Int32, (Int32,), 0) could potentially create segfaults if the current Pluto version is used in a future Julia version where the API is removed (right?). That means that we also can't try catch it for future compatibility.

But adding a GC.gc() call would already be a nice improvement!

@fonsp
Copy link
Owner

fonsp commented Dec 13, 2022

It seems like we could run GC in a debounced, trailing way after completing cell execution. i.e. after the last cell finished, start a 3 second (example) timer. If, during that timeout, no cells started running, then run GC after the timeout. This will guarantee that GC runs after cell execution, but not during fast interactivity.

Then also, we could run GC after each import/using statement?

@dralletje
Copy link
Collaborator

Could be generalized into some kind of "idle" state in which we can execute background tasks

@pankgeorg
Copy link
Collaborator

Could be generalized into some kind of "idle" state in which we can execute background tasks

Yes; you also don't want the GC to starve, as it may be stressing the execution.

@BioTurboNick
Copy link
Contributor Author

@fonsp - I'm interested in trying a PR on this. Could you please point me to where a hook exists, or could be placed, to trigger such a mechanism?

And if you also know of similar places to hook into user actions on a notebook (loading an open notebook into a tab, performing some other interaction), that would be useful for #2236

@pankgeorg
Copy link
Collaborator

pankgeorg commented May 25, 2023

A kind-of educated guess could be to run the GC step before (or using) NotebookExecutionDoneEvent; i.e: here

try_event_call(session, NotebookExecutionDoneEvent(notebook, user_requested_run))
and here
try_event_call(
session,
NotebookExecutionDoneEvent(notebook, get(kwargs, :user_requested_run, true))
)

You don't need a PR to test this, you can try

import Pluto

function onevent(ev::Pluto.PlutoEvent) end
function onevent(ev::Pluto.NotebookExecutionDoneEvent)
    condition = true
	condition && GC.gc()
end
otherOptions = Dict{Symbol, Any}(#=Add more here =#)
Pluto.run(;on_event=onevent, otherOptions...)

Edit: this is actually a bit oversimplified. You would want a Pluto.NotebookExecutionStartEvent, and a synchronization mechanism to make sure that you can reset the timer. I may come back with an implementation sometime, or you could! But my initial statement that you don't need a PR is quite naive.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend Concerning the julia server and runtime online deployment About deploying to binder, heroku, self-hosted
Projects
None yet
Development

No branches or pull requests

4 participants