Track first free atbs for multiple block sizes instead of just 1 #2614

tannewt · 2020-02-12T01:11:31Z

This speeds up code paths that allocate only blocks larger than 1.

I believe imports will be faster because long lived allocations now always update the last free atb by assuming allocations are contiguous on the long lived end of the heap.

@jimmo @dpgeorge I heard you tried this as well but found cases where it was slower. What were they? I've split our py/ patch out into a commit but think it's still too different to apply smoothly in MP since we also have the long-lived stuff.

This speeds up cases where no single block allocations happen while Python code is allocating other block sizes.

dhalbert

This looks good! I tested a CPB fetching acceleration in a tight loop for 15 minutes with no issues. I won't merge pending any further comments by others.

jimmo · 2020-02-12T07:30:16Z

hi @tannewt -- I looked into this on MicroPython and tried something very similar. There are two main things to solve:

Long runs on >1-block allocations means that the pointer never advances. (e.g. we see this when allocating the complex number type on boards that support double-precision hardware floating point, where the complex type ends up taking two blocks).
Patterns of allocs paired with frees/reallocs that cause the pointer to reset backwards, but the next allocation doesn't fit in the freed space, so it has to search all the way forward again. As long as you have enough buckets to cover the different sizes, then this PR will fix that.

I implemented something very similar for MicroPython, but the issue we had was that this fix caused a performance regression, and I wasn't sure how common the issue is to warrant a performance hit. I went looking for other alternative fixes instead :)

I don't currently have access to the computer where I was doing these measurements, but I patched this PR into micropython/master (see https://github.com/jimmo/micropython/tree/gc-multi-atb ) and ran the performance tests on it, similar results to what I saw with my changes:

$ ./run-perfbench.py -t ~/mpy/perf/pyb.master ~/mpy/perf/pyb.multi 
diff of microsecond times (lower is better)
N=100 M=100              /home/jimmo/mpy/perf/pyb.master -> /home/jimmo/mpy/perf/pyb.multi         diff      diff% (error%)
bm_chaos.py               166264.50 ->  170207.62 :   +3943.12 =  +2.372% (+/-0.01%)
bm_fannkuch.py             79847.12 ->   82201.12 :   +2354.00 =  +2.948% (+/-0.00%)
bm_fft.py                 311653.12 ->  318420.50 :   +6767.38 =  +2.171% (+/-0.00%)
bm_float.py                51526.62 ->   53355.75 :   +1829.13 =  +3.550% (+/-0.00%)
bm_hexiom.py              287545.25 ->  286712.62 :    -832.63 =  -0.290% (+/-0.00%)
bm_nqueens.py             237672.38 ->  247577.00 :   +9904.62 =  +4.167% (+/-0.00%)
bm_pidigits.py            101768.50 ->  100396.88 :   -1371.62 =  -1.348% (+/-0.22%)
misc_aes.py                88932.25 ->   88190.75 :    -741.50 =  -0.834% (+/-0.01%)
misc_mandel.py            132150.50 ->  139368.12 :   +7217.62 =  +5.462% (+/-0.00%)
misc_pystone.py           154903.00 ->  154500.00 :    -403.00 =  -0.260% (+/-0.00%)
misc_raytrace.py          163106.62 ->  169317.62 :   +6211.00 =  +3.808% (+/-0.01%)

(This is on pybv11).

Note that these performance tests may or may not be representative of real programs, and it's worth nothing that some do in fact get faster. Especially if you're seeing this issue in code using your standard libraries (e.g. sensor reading in a loop), then merging this sounds like easily the right thing to do.

For completeness, some other things I investigated:

Some heuristics to trade a little bit of fragmentation (e.g. keep track of how many allocations since hte last pointer move, and update it every N).
Similar to the above, keep track of the number of total free blocks found during the search, and advance the pointer if that number is less than N. (i.e. if you were searching for a two-block alloc, but you never actually saw any (or some small number) single blocks in the search, then you should advance the pointer).
Reducing the number of buckets, including just having a "single" and a "multi" pointer. Doesn't improve much compared to more buckets.
Keeping a stack of movements of the pointer. This helps a lot with the second issue, but it's hard to size the stack correctly, and still has the perf hit.

tannewt · 2020-02-12T19:05:33Z

Thanks @jimmo! I bet the perf tests that improved used fewer 1-block allocations than the others. I may play around with them later to see.

@dhalbert Are you ok merging this? I think it'll generally be an improvement and be a drastic improvement in the pathological case.

dhalbert · 2020-02-12T21:04:29Z

I will merge! I think it's a real-life improvement for a number of cases.

tannewt added 2 commits February 11, 2020 17:05

Update gc tracking instructions and gdb commands.

a4d197a

Track first free atb for multiple block sizes.

36e6cc8

This speeds up cases where no single block allocations happen while Python code is allocating other block sizes.

tannewt added the enhancement label Feb 12, 2020

tannewt added this to the 5.0.0 milestone Feb 12, 2020

tannewt requested a review from dhalbert February 12, 2020 01:11

dhalbert approved these changes Feb 12, 2020

View reviewed changes

tannewt linked an issue Feb 12, 2020 that may be closed by this pull request

with statement in loop takes cyclically increasing time #2602

Closed

dhalbert merged commit 01fb668 into adafruit:master Feb 12, 2020

dhalbert mentioned this pull request Feb 25, 2020

Sanity check memory use in 5.0.0 #2639

Closed

dhalbert mentioned this pull request Jul 2, 2021

USB transmission gradually slowing down (in sawtooth pattern) micropython/micropython#7479

Closed

dhalbert mentioned this pull request Aug 14, 2023

Merge micropython v1.19.1 #8281

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Track first free atbs for multiple block sizes instead of just 1 #2614

Track first free atbs for multiple block sizes instead of just 1 #2614

tannewt commented Feb 12, 2020

dhalbert left a comment

jimmo commented Feb 12, 2020

tannewt commented Feb 12, 2020

dhalbert commented Feb 12, 2020

Track first free atbs for multiple block sizes instead of just 1 #2614

Track first free atbs for multiple block sizes instead of just 1 #2614

Conversation

tannewt commented Feb 12, 2020

dhalbert left a comment

Choose a reason for hiding this comment

jimmo commented Feb 12, 2020

tannewt commented Feb 12, 2020

dhalbert commented Feb 12, 2020