Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

precache() - preload code into the flash cache. #6628

Merged
merged 16 commits into from
Nov 3, 2019

Conversation

ChocolateFrogsNuts
Copy link
Contributor

By preloading code into the flash cache we can take control over when
SPI Flash reads will occur when code is executing.
This can be useful where the timing of a section of code is extremely
critical and we don't want random pauses to pull code in from the SPI
flash chip.

It can also be useful for code that accesses/uses SPI0 which is connected
to the flash chip.

Non interrupt handler code that is infrequently called but might otherwise
require being in valuable IRAM - such as bit-banging I/O code or some code
run at bootup can avoid being permanently in IRAM.

Macros are provided to make precaching one or more blocks of code in any
function easy.

By preloading code into the flash cache we can take control over when
SPI Flash reads will occur when code is executing.
This can be useful where the timing of a section of code is extremely
critical and we don't want random pauses to pull code in from the SPI
flash chip.

It can also be useful for code that accesses/uses SPI0 which is connected
to the flash chip.

Non interrupt handler code that is infrequently called but might otherwise
require being in valuable IRAM - such as bit-banging I/O code or some code
run at bootup can avoid being permanently in IRAM.

Macros are provided to make precaching one or more blocks of code in any
function easy.
@ChocolateFrogsNuts
Copy link
Contributor Author

forms part of the solution for #6559

@d-a-v
Copy link
Collaborator

d-a-v commented Oct 10, 2019

(I cancelled your CI, which is broken anyway, to get CI resource for my CI-fixing PR)

@ChocolateFrogsNuts
Copy link
Contributor Author

Guess that's what I get for being slack and pushing something without a test compile... it should be ok now that I added the missing include though.

@d-a-v
Copy link
Collaborator

d-a-v commented Oct 10, 2019

CI itself was broken, I wasn't commenting on your PR 😆
You should press the update button now, CI is fixed

edit you just did it :)

@ChocolateFrogsNuts
Copy link
Contributor Author

ahh, yes I thought the logs looked a bit odd :)

@devyte devyte self-requested a review October 10, 2019 21:32
@ChocolateFrogsNuts
Copy link
Contributor Author

This might need a little more work - I think it's not getting the length calculation right under some circumstances, so it probably needs another attribute

With certain alignments/lengths of code it was possible to not read enough
into the flash cache.

This commit makes the length calculation clearer and adds an extra cache
line to ensure we precache enough code.
Precached code needs to be noinline to ensure the no-reorder-blocks is applied.
@earlephilhower
Copy link
Collaborator

There's one issue I see here, and I'm not sure how to handle it.

This CPU doesn't have 32b load insns, so it uses a .literal table and pc-relative load32 to read them.

GCC doesn't include the .literal section in the function proper. I've seen it place it both before and after the fcn (I think the linker might be involved with this, but haven't dug into it).

So I'm not sure in the general case this is doable. You need 32b access to do any constant address access (i.e. GPIO) or access any global variables/static class members, or do any operation (compare, etc.) with a 32b value...

@ChocolateFrogsNuts
Copy link
Contributor Author

From what I've seen in the disassembled code, those literals can come from anywhere in the code although it does prefer to place them beside the function it seems to re-use nearby values. I don't think there's any way we can compensate for that in a transparent way. What we can do is handle it with documentation "access your constants before the precached code".
So far I haven't seen any problem with loading the addresses of the registers being accessed in the code I'm using, but perhaps that is due to the first accesses caching enough of the constants early enough.
The issue could be easily avoided in the end code by loading a base address into a variable, and accessing the GPIO/Registers/whatever as base+offset (or just loading all constants into variables).
Again documentation and the general rule of keeping your critical sections of code as short and minimal as possible.

@ChocolateFrogsNuts
Copy link
Contributor Author

Further to the above...
I modified my code that uses this to access the SPI registers as base+offset - wasn't too hard, just a volatile variable for the base, and a macro to convert the register names to an offset.
That removed all the l32 instructions referencing flash and replaced them with l32 referencing a register+offset.
What I also noticed is that the function calls are a potential problem.
They are often (usually) done by loading the address of the function as a constant from flash, then a call. We may need to document "no function calls" although I haven't noticed a problem with memcpy, but that could be because memcpy is called frequently enough for it's address to be in cache.
Now that size is less important in my code, I might replace the memcpy calls with for loops in the SPI0 code just to be sure :-)

@ChocolateFrogsNuts
Copy link
Contributor Author

Ha! It took some ninja level gymnastics with 'volatile' and function pointers but I got SPI_command() to build without any load32 instructions referring to flash in the precached code.
No changes required to this PR though.

One point for documentation though - only call functions in iram or rom. Basically same rules as an interrupt handler.

@devyte devyte mentioned this pull request Oct 28, 2019
@devyte
Copy link
Collaborator

devyte commented Oct 29, 2019

Closing in favor of #6674.

@devyte devyte closed this Oct 29, 2019
@devyte devyte reopened this Nov 3, 2019
@devyte
Copy link
Collaborator

devyte commented Nov 3, 2019

Reopening this for merge given that the rest of #6674 needs further discussion.

@devyte devyte merged commit 692e542 into esp8266:master Nov 3, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants