-
Notifications
You must be signed in to change notification settings - Fork 52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve ram usage #243
Improve ram usage #243
Conversation
when defined(chronosStrictException): | ||
closure*: iterator(f: Future[T]): FutureBase {.raises: [Defect, CatchableError], gcsafe.} | ||
else: | ||
closure*: iterator(f: Future[T]): FutureBase {.raises: [Defect, CatchableError, Exception], gcsafe.} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would have put it in FutureBase, but:
We need to give the future as argument of iterator, so we need to know it's type.
We could give a FutureBase instead, and cast inside the iterator, but since we need to be very careful not to have a reference to the future inside the iterator, that seem risky to me
chronos/asyncfutures2.nim
Outdated
except Exception as exc: | ||
if exc of Defect: | ||
raise (ref Defect)(exc) | ||
fut.fail((ref ValueError)(msg: exc.msg, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will never be reached with strictExceptions, but no need to remove it, afaik
Very impressive! I guess we're stuck with closures, if we want to do async, right? We can't fake them by manually building our own struct with copies of the local vars we actually need and pass it as a param to a regular proc?
You're probably seeing the high-water mark from a big initial allocation while loading the DAG from the db. Nim's GC will not return memory to the OS, after it's done with it. |
There are two things which are very distinct:
proc a(i: int) =
return proc {.closure.} = i + 5 This will put
iterator a(i: int): int {.closure.} =
let b = i + 5
yield b
dec b
yield b Now, the iterator variables (not parameters) will be put into one of theses hidden object, but as a bonus, the compiler will butcher the proc into a state machine, to allow it to be resumable at each yield (which in our case, is each await) type a_env = object
b: int
state: int
proc a(env: a_env, i: int): int =
switch env.state:
case 0:
env.b = i + 5
env.state = 1
return env.b
case 1:
dec env.b
env.state = finished
return env.b This transformation is fairly complex, because you basically need to rewrite every flow control construction into a state machine (conditions, loops, exceptions, etc etc) To sum up, this PR is still based on a closure iterator for the async transformation, and the "continue callback" uses a closure |
Just noticed that while doing that I squeezed out some code for FutureVars in the continue, not sure what it was (every test are still passing), if someone has some info on that: nim-chronos/chronos/asyncmacro2.nim Line 91 in 37c62af
|
|
Thanks to @cheatfate input, we managed to remove the closure from the "continue" proc with FutureVars don't work with this PR, but they also don't work with master, so I would move their fix to another PR, instead of blocking this one Backtraces look almost the same as before, so I think this is ready for review/merge :) |
Here is a PR for FutureVar: #245 |
proc internalContinue[T](fut: pointer) {.gcsafe, raises: [Defect].} = | ||
let asFut = cast[Future[T]](fut) | ||
futureContinue(asFut) | ||
GC_unref(asFut) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It makes more sense to place the GC_unref
call immediately after we restore the asFut
variable.
This eliminates the risk of not reaching this line due to an exception from futureContinue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From what I can see, the variable is never assigned (no assgnRef
) in the generated C, so if we unref, it may be freed during the futureContinue
because the refcount = 0
N_LIB_PRIVATE N_NIMCALL(void, internalContinue__tOi1GGiApxKdBYtRcYwB7A)(void* fut) {
tyObject_FuturecolonObjectType___GXFSekg1U8JRoedGa2vBSA* asFut;
asFut = ((tyObject_FuturecolonObjectType___GXFSekg1U8JRoedGa2vBSA*) (fut));
futureContinue__XuNTB7fHwBI8KII0qEQaCw_2(asFut);
if (asFut) { nimGCunref(asFut); }
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It won't be freed because it exists on the stack. The conservative stack scanning will discover it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm pretty sure the stack scanning only happens during M&S. If you have a refcount = 0, it will be freed no matter what.
https://github.com/nim-lang/Nim/blob/0d0c249074d6a1041de16108dc247396efef5513/lib/system/gc.nim#L249-L250
Adds it to the ZCT if the refcount = 0, and the ZCT will be collected a bit later
I can test it, but it may be hard to catch that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ZTC means "zero count table". This is where the objects start their life before they are assigned to a field of a heap allocated object and also where they end up once their ref count drops to zero. Most stack variables remain with ref count = 0 throughout their lifetime. Once an object is in the ZTC, it's a candidate for collection, but collection will only happen if a stack scan confirms that there are no remaining references to the object on the stack.
This approach is known in the literature as "deferred reference counting" and it's what Nim's default GC employs:
https://en.wikipedia.org/wiki/Reference_counting#Dealing_with_inefficiency_of_updates
The goal is to avoid all the intermediate addRef
and decRef
calls while the stack variables are being passed around and copied. You count only the assignments to heap locations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh right, sorry, I saw all the stack scanning happening after the
# ---------------- cycle collector
line, so I imagined that stack scanning only happened during M&S
But indeed, the ZCT collection happens with the stack marked, so this should work, I'll try it out today :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's indeed stable with GC_unref before continue, thanks for the explanation!
baseType | ||
# Do not change this code to `quote do` version because `instantiationInfo` | ||
# will be broken for `newFuture()` call. | ||
outerProcBody.add( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any reason why you can't use macros.quote
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No idea, I just moved that from above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm very glad this mystery has been solved :)
Fixes #23212 Inspired by [this chronos PR](status-im/nim-chronos#243)
Fixes #23212 Inspired by [this chronos PR](status-im/nim-chronos#243) (cherry picked from commit f29234b)
This PR modifies the async flow to improve RAM usage of async transformation.
Results, about half of ram consumption in nimbus:
(obviously, the RSS being sentient and having is own mind, it isn't divided by two, but actual consumption is)
How
The main goal is to rely as little as possible on mark & sweep, and rely instead on reference counting.
Reference counting fails on cyclic references, and also in other cases: nim-lang/Nim#19193
So first step, add a workaround in
await
to avoid 19193.Second step, more tricky, avoid cyclic reference of the closure/iterator.
Before this PR:
A closure iterator and a _continue proc is generated inside the main proc.
Everything has a reference to everything: the closure, the continue, and the Future, which is bad.
After this PR:
The future is generated at the last minute to be sure that no one has a reference to it.
The "continue" is extracted somewhere else, and doesn't rely on closure environement (@stefantalpalaru I guess it broke stacktraces, could use a hand here)
The future is in charge of holding a reference to the closure, no one else. It means the closure can be freed easily when no longer needed, and no weird cycle here.
So, the iterator needs to take the future as argument. And we need to be very careful not to store it in the closure environment, or we're back to a cyclic ref. Maybe we can block it in some way? Not sure
For reference, here is the
futureContinue
:The addCallback relies on a new closure, which is unfortunate, but passing the
fut
asaddCallback
argument doesn't work:addCallback
except a pointer, so it's not counted for GC, and everything breaks.My test for this is:
With master it crashes, with this branch, it takes a bit of memory but eventually works
It's is better than the test in the original issue, because it allocates memory both in the return value, and in the iterator.
But it's not very CI-able, so not sure how to test this automatically