perf: reduce amount of stack memory usage #4103

aljazerzen · 2024-01-18T00:13:38Z

Refactor a few functions to use heap memory instead of stack memory, to reduce stack memory usage in the case where we recurse into function calls (which happens in long pipelines).

Ideally, stack size of the compile would not grow linearly with PRQL source call nesting depth, but that would be much harder to pull-off than just optimizing our code a bit.

For future reference: my debugging process

put the offending query into _a.prql,
run LLDB, it should stop when the panic is detected,
run thread backtrace to get the list of all frames, enumerated. The name of the game is to refactor code such that number of frames in memory goes up, which means that the average size of a frame goes down.
to see frame sizes:
- settings set frame-format '${frame.index} ${frame.sp} ${line.file.basename}\n'
- thread backtrace
- copy-paste output to lo calc
- to convert hex SP into memory offset: =HEX2DEC(RIGHT(C12,9)) (RIGHT is needed because hex2dec cannot parse more than 10 digits)
- =E11-E12 to get the frame size
- frame sizes are offset by 1, because that's how stack pointers and frame layouts work
- when I started, frame sizes were: 1520, 4320, 14864, 19744, 26688
- somewhere in-between, frame sizes were: 1520, 4320, 13744, 13712, 22144

aljazerzen · 2024-01-18T00:18:46Z

Also, we might want to test with --release on Windows, or set higher stack size.

max-sixty · 2024-01-18T00:24:14Z

Very impressive debugging!!

FWIW I managed to sample the backtrace with the #![feature(backtrace_frames)] feature, described in #2857 (comment) :

+        let bt = backtrace::Backtrace::capture();
+        let len = bt.frames().len();
+        log::trace!("{}", len);
+        if len >= 120 {
+            log::trace!("{:?}", bt);
+        }

This has the advantage of no LLDB etc, but the disadvantage that you only get the backtraces before the stack overflow.

Also, we might want to test with --release on Windows, or set higher stack size.

Though this test is only there to test whether huge queries cause a stack overflow. I'm not sure we get much from it if we change the configs for the platform so it passes!

max-sixty · 2024-01-18T00:35:05Z

Windows now passes!! 👏 👏

max-sixty · 2024-01-18T00:39:44Z

OoI, how did you identify what should be boxed? Did you just have a sense of what the large things on the stack were?

max-sixty · 2024-01-18T01:33:48Z

Re the error, we could possibly set the too-large-for-stack (or just ignore them at the specific sites). From https://rust-lang.github.io/rust-clippy/master/index.html#/boxed_local

aljazerzen · 2024-01-18T15:50:13Z

OoI, how did you identify what should be boxed? Did you just have a sense of what the large things on the stack were?

You can look at individual frames on the stack with lldb and find stuff that is large (i.e. Func or Expr structs) and make them Box<Func> and Box<Expr>. You can see the size of our structs with prqlc debug ast or by using lldb commands.

aljazerzen · 2024-01-18T15:56:01Z

Also, we might want to test with --release on Windows, or set higher stack size.

Though this test is only there to test whether huge queries cause a stack overflow. I'm not sure we get much from it if we change the configs for the platform so it passes!

Yeah, but prqlc is meant to be used in --release mode. That raises size of query that can fit into 1MB stack memory by a lot. Also, 1MB stack memory is not a lot.

An alternative would be to use stacker to resize the stack size dynamically.

max-sixty · 2024-01-18T18:10:24Z

Yeah, but prqlc is meant to be used in --release mode. That raises size of query that can fit into 1MB stack memory by a lot. Also, 1MB stack memory is not a lot.

Right right, though I'm saying something a bit different:

All tests pass which test the functionality of PRQL on Windows
The test "do extremely long queries work" fails on Windows
So changing our testing environment to make that test pass doesn't show anything!
Maybe it shows that an extremely long query passes in release mode? But we know that already, and we could find an even longer query that would fail in release mode...

(not sure if I'm being clear, happy to speak about it on the call)

max-sixty · 2024-01-18T18:11:38Z

An alternative would be to use stacker to resize the stack size dynamically.

Ah interesting, do you think that's better than doing something like a trampoline? (I understand these things conceptually but have never faced this problem technically, so am learning as much as helping...)

aljazerzen · 2024-01-18T18:41:43Z

Maybe it shows that an extremely long query passes in release mode?

Exactly.

The test is basically saying that "under 1MB stack conditions, we support only queries as long as our long_query case". By not using --release mode in tests restricts these conditions much more that the actual "production" conditions.

But now that I've improved the resolver, we can leave the test as is and use these "too" restricted conditions as the benchmark for when we increase memory usage by mistake.

So I've never dealt with such problem neither, but I think that:

stacker would be an easy solution to get rid of this problem with minimal performance and implementation cost,
using a trampoline would not only get rid of this problem, but also improve performance (because it does not need to recurse anymore, but it can use iteration), but would be very hard to implement (since resolver recurses on many different locations).

max-sixty · 2024-01-18T20:13:00Z

OK great on both counts. Thank you!

aljazerzen added 3 commits January 18, 2024 01:11

checkpoint 1

5b0c543

checkpoint 2

9e6d6e9

checkpoint 3

ad34f4e

aljazerzen changed the title ~~perf stack usage~~ perf: reduce amount of stack memory usage Jan 18, 2024

max-sixty added the pr-nightly label Jan 18, 2024

clippy

cb17ac6

aljazerzen force-pushed the perf-stack-usage branch from dfcd6e5 to cb17ac6 Compare January 18, 2024 00:23

clippy

883d002

aljazerzen merged commit 2d9e8fc into main Jan 18, 2024
81 checks passed

aljazerzen deleted the perf-stack-usage branch January 18, 2024 18:41

This was referenced Jan 19, 2024

test: Demo of printing the stack #4104

Closed

test: Reenable Windows stack overflow test #4073

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: reduce amount of stack memory usage #4103

perf: reduce amount of stack memory usage #4103

aljazerzen commented Jan 18, 2024 •

edited

Loading

aljazerzen commented Jan 18, 2024

max-sixty commented Jan 18, 2024

max-sixty commented Jan 18, 2024

max-sixty commented Jan 18, 2024

max-sixty commented Jan 18, 2024

aljazerzen commented Jan 18, 2024

aljazerzen commented Jan 18, 2024

max-sixty commented Jan 18, 2024

max-sixty commented Jan 18, 2024

aljazerzen commented Jan 18, 2024

max-sixty commented Jan 18, 2024

perf: reduce amount of stack memory usage #4103

perf: reduce amount of stack memory usage #4103

Conversation

aljazerzen commented Jan 18, 2024 • edited Loading

aljazerzen commented Jan 18, 2024

max-sixty commented Jan 18, 2024

max-sixty commented Jan 18, 2024

max-sixty commented Jan 18, 2024

max-sixty commented Jan 18, 2024

aljazerzen commented Jan 18, 2024

aljazerzen commented Jan 18, 2024

max-sixty commented Jan 18, 2024

max-sixty commented Jan 18, 2024

aljazerzen commented Jan 18, 2024

max-sixty commented Jan 18, 2024

aljazerzen commented Jan 18, 2024 •

edited

Loading