-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Track first free atbs for multiple block sizes instead of just 1 #2614
Conversation
This speeds up cases where no single block allocations happen while Python code is allocating other block sizes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good! I tested a CPB fetching acceleration in a tight loop for 15 minutes with no issues. I won't merge pending any further comments by others.
hi @tannewt -- I looked into this on MicroPython and tried something very similar. There are two main things to solve:
I implemented something very similar for MicroPython, but the issue we had was that this fix caused a performance regression, and I wasn't sure how common the issue is to warrant a performance hit. I went looking for other alternative fixes instead :) I don't currently have access to the computer where I was doing these measurements, but I patched this PR into micropython/master (see https://github.com/jimmo/micropython/tree/gc-multi-atb ) and ran the performance tests on it, similar results to what I saw with my changes:
(This is on pybv11). Note that these performance tests may or may not be representative of real programs, and it's worth nothing that some do in fact get faster. Especially if you're seeing this issue in code using your standard libraries (e.g. sensor reading in a loop), then merging this sounds like easily the right thing to do. For completeness, some other things I investigated:
|
I will merge! I think it's a real-life improvement for a number of cases. |
This speeds up code paths that allocate only blocks larger than 1.
I believe imports will be faster because long lived allocations now always update the last free atb by assuming allocations are contiguous on the long lived end of the heap.
@jimmo @dpgeorge I heard you tried this as well but found cases where it was slower. What were they? I've split our py/ patch out into a commit but think it's still too different to apply smoothly in MP since we also have the long-lived stuff.