-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Re-evaluation of pystack #7396
Re-evaluation of pystack #7396
Conversation
Are you testing different values on different ports / boards? I suspect this should be by port, if not by board (existence or lack of PSRAM could be a complicating factor). 512 is a lot out of a SAMD21, but likely to be inconsequential on an |
Initially I added 512 globally to test which boards will outright fail or have critically low ram. |
Sounds good. It is a balancing act even with PSRAM, since it trades off with espidf memory (sockets and wifi, AP connections, wifi monitor, etc). tannewt or others could speak better about leaving some headroom for future BLE features. There is the option in |
For whatever it's worth.... My usage is probably a bit atypical and quickly stresses the stack storage on all the board builds. I've found it easiest to simply disable stack usage in circuitpython by setting MICROPY_ENABLE_PYSTACK to (0) and MICROPY_STACKLESS to (1). There's probably a downside to running this way but at least for my usage I haven't found any issues yet. |
This is quite significant news.. |
Stackless issue: |
For anyone that wants to optimize memory/stack size, I stole this bit of code from a posted Micropython (I think) issue a year or so ago: def calcRecursionDepth(wldCLen,recursiveFail):
wldCLen += 1
if not recursiveFail and wldCLen < 90:
try:
(wldCLen,recursiveFail) = calcRecursionDepth(wldCLen,recursiveFail)
except:
recursiveFail = True
return (wldCLen,recursiveFail)
w = 0
rf = False
print(calcRecursionDepth(w,rf)) Interestingly, if you define this function within another function the amount of recursion you can achieve is reduces by much more than I would have expected. I suppose that indicates that function definitions consume a fair amount of stack space. |
Tomorrow along with the resizing tests, I will create stackless builds and evaluate their performance. |
This reverts commit daaa4b8.
Now all the boards with MB's of ram, are set to have 5k of pystack. There is no reason not to. The I will now build a stackless build for picow and test my ljinux + webserver + ssh to see how it performs. |
After further testing, I have determined stackless is the way to go. |
Could you please elaborate... what are the pros and cons of pystack vs. stackless for different application types? Defining stackless would help too, and how it's affected by other defines in the build. |
I just commented all my findings in #3362. |
The tested* stackless ports are now in this pr. *For esp I only tested S2, I will also test C3, but I cannot test the rest. |
I have also been running stackless on the same chips bill88t listed for more than a year. In addition, I've been using ESP32, ESP32-Pico, ESP32-C3, ESP32-S3, MIMXRT10XX, SAMD51, BROADCOM and STM32L4 boards without any issues, although I have not been running any tests designed to look for memory/stack type errors specifically. My use typically stresses RAM usage and FLASH I/O. The peripherals I tend to use are I2c, SPI, simple PWM GPIO, the RTC library and WiFi if available. |
I too have now tested C3 with stackless, all good. |
I have taken a quick read through some source files, and |
I think that moving from "pystack" to "stackless" would be a step backwards. "Stackless" allocates a stack frame on the heap at every function call, which is not the most efficient way to implement a call stack. Besides the overhead introduced on every function call, another cost is the emergent impact it has on the heap: it will cause the GC to run more often and produce a more fragmented heap. I think a better solution would be to make the size of the "pystack" configurable at boot time. This way the few people who need a larger stack can get it without impacting the performance of most people who don't. |
Configuring pystack size with something like However I have to say, I have not noticed any slowdowns due to stackless.. (and my code is absolute bloat) It would be more of an issue on boards that are barely-able-to-walk-let-alone-run, but this change is currently proposed for boards that really run. I would like to hear a decision from adadevs on the matter, cause once I finish my exams I want to implement EDIT: Thought about it some more. |
GC may be less of an issue than fragmentation, consider nested functions that do a of of heavy lifting so the small stack frames get dispersed through the heap. I'm a big fan of runtime configuration of things like memory, not sure if stackless could be a runtime thing like PYSTACK = 0 or STACKLESS = True overriding the PYSTACK value. |
The standard heap block size is 16 bytes. Some discussion of performance hit of STACKLESS and PYSTACK here (and above and below in that issue): micropython#6087 (comment) |
Warning, my knowledge on C and board internals is sevely limited, take the following data with a large chonk of salt. It seems pystack cannot be configured in runtime. In it's current form at least. Starting from
This is the first reference of Next we go to
So it seems to init the stack here. So this means the allocation is here, right? No. Using ctags we see that
This is just defining boundaries, as if this is already allocated. So this isn't our func for allocation. Let's keep looking.
So we have to have passed it.
Is this defining an attribute of sorts? Wait isn't there a My ctags were produced by the following command:
|
This is what we'd have to change. We'd need to make it use a supervisor allocation instead because that'd be dynamic. The heap is already done this way (except it takes all remaining space.) |
So I have to try to deduct CIRCUITPY_PYSTACK_SIZE from the allocation of the heap, make the alloc in this func and remove all the bss stuff right? |
@bill88t Currently You need to change
|
You will use You need to do this before calling |
I have started working onto it on another branch https://github.com/bill88t/circuitpython/tree/settings-toml-pystack I will switch the PR branch to that once it's somewhat working.. Progress:
|
It seems the code still expects the alloc to be in bss.. The current crash is in: |
Yea this stuff is pure assembly. I am bailing out. Branch is staying to stackless since this thing actually works.. |
Made a script to check how well stackless performs with random allocs and massive recursion combined.
Feel free to try it out. Pico with this script can climb up about 1.2k before failing with Example output:
|
|
Well renamed the variable onto that branch, it's not like a name change is going to affect the fact it won't bo- HOLDUP. Why did it work now????? The raw error text is here in case anyone wants to take a look at it:
|
I'd probably make it work more like the heap and have it passed around instead of a global. |
I got it.
|
Closing in favor of #7585. |
Currently all boards, excluding
espressif/TG-Watch
, have the exact same pystack size, defined by:py/circuitpy_mpconfig.h:453
:#define CIRCUITPY_PYSTACK_SIZE 1536
However, at least for me, this is a bottleneck.
My project has at this point, been optimised as much as it goes,
but I am still hitting the pystack limit with some of the latest changes.
(
RuntimeError: pystack exhausted
)This pr intends to find a better value than the global 1536 that is currently applied.
As an initial test, the value is increased by 512 bytes.