-
Notifications
You must be signed in to change notification settings - Fork 703
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multiple SoCs have flash configurations unsupported by MCUboot #713
Comments
STM32H7 (43) 32 bytes write, 128K erase |
A further complication with STM32H7 is that the internal flash has an integrated ECC. Writing to the same flash word twice has a high probability of causing an ECC error, and if it's a double ECC error then a read results in a bus error. Does MCUboot rely on being able to write to the same flash word more than once? |
No, it only writes once to any "word΅ which is the alignment size for the flash supplied by the OS. |
I have MCUboot working on a PoC level on STM32H7. This required changing both BOOT_MAX_ALIGN and BOOT_MAGIC_SZ to 32. The boot_img_magic array had to be changed as well (one copy in bootutil_misc.c, one in image.py, one in Zephyr's mcuboot.c, maybe more) because it's only 16 bytes which is less than the alignment. All in all it's simple changes, but just changing BOOT_MAX_ALIGN for everyone isn't backwards compatible. Is there any good reason to not make BOOT_MAX_ALIGN configurable? A user could set it to e.g. 32, which would then cause the magic to be padded to 32 bytes. (Or 16, or 512?). Everyone else would keep it at 8 and keep compatibility. |
I'd like to confirm also that I have been running MCUBoot on STM32H7 for
12 months now.
We made the same changes as you described above.
We did find its possible to get an ECC error if we lost power during an
image swap, and so MCUBoot would cause a hard-fault during swap-resume.
We solved this by:
1. Using a watchdog if an ECC fault is triggered.
2. If boot-reason is due to a watchdog in the bootloader then we
erase the image, and scratch areas.
3. We provide a recovery mode, where we wait for a repair image over
DFU.
4. Just before we boot the application we set a flag to say the
firmware was booted. If the firmware triggers WD then we don't
cause recovery.
5. If firmware crashes multiple times without a power-cycle, recovery
is triggered. (We also count the number of watchdog resets).
Due to the possibility of getting an ECC error during resume, we have
disabled resuming of partial image-swaps..
Interruption of image swaps will cause a recovery.
Probably sounds a bit complex, but this has worked really well for our
application. We went for catching ECC faults with a watchdog and
recovery mode to ensure that all eventualities are covered, and no
matter what happens we can recover the device.
Best
J
|
If someone is gonna tackle it, the person has to fix bootutil, the simulator, imgtool, mcumgr and newt, maybe the integrations in the supported Oses, and maybe other stuff which I fail to remember. Probably a bit more work that it might seem at first, but I don't think there are any big technical impediments. |
The other thing that is going to come up is that adding simulator support for this type of configure is going to point out the "rare" or "occasional" failures, and we'll need to actually figure out a way to fix them. Having some percentage of upgrade devices need recovery really isn't something I'd consider acceptable for a regular option. We do have a completely different swap strategy that is under development that is intended for devices where the writes are larger (and typically use ECC), however this more requires the erase size to be fairly small. Having 8k erases would waste quite a bit of flash for these sectors. However, the existing swap code should be assuming that each write block can only be written once, so this is probably a corner case bug, perhaps because of the larger write size. |
@jameswalmsley Your description of how you handled ECC errors on STM32H7 by involving the watchdog did not fill me with joy. So I came up with a different way to solve the problem, by trapping ECC errors and returning them as -EIO in the flash API. Normally a bus fault can't be trapped, but there is a way around that. I've opened zephyrproject-rtos/zephyr#33140 with a description of the problem, my proposed solution, and I link to some code that shows that it can work. The code fiddles with some architectural registers, so it would be good to get some feedback from someone who knows more about how those registers interact with the rest of the system. |
@weinholtendian Nice, I have to check out your solution. There are some other systems that we have that won't tolerate that though, PR on this is great timing :) I will pull in your PR and check it on our systems and try to review it soon. We've also created a new swap method for the h7 that makes use of the stm32h7 bank-swapping. |
Hello J |
This issue has been marked as stale because it has been open (more than) 60 days with no activity. Remove the stale label or add a comment saying that you would like to have the label removed otherwise this issue will automatically be closed in 14 days. Note, that you can always re-open a closed issue at any time. |
Re-opening to track. |
Is there any code that I could help test or implement for ECC flash? I have both LPC55S69 and CC3220 and would be interested to see if it'd be possible to get them both working with OTA, even if it's on a fork (for now).
Alternatively, it sounds like there is a bug and would make it easier to use ECC flashes with the current scheme by setting a larger block sizes? Any pointers on where to dive in would be great. |
This seems related to 841: Boot: Introduce new swap method using status partition, especially for chips with ECC based flash? |
This issue has been marked as stale because it has been open (more than) 60 days with no activity. Remove the stale label or add a comment saying that you would like to have the label removed otherwise this issue will automatically be closed in 14 days. Note, that you can always re-open a closed issue at any time. |
@d3zd3z can we consider reopening this ticket? We have a hyperflash platform that has this problem of not being able to be supported by MCUBoot due to the size of the write (it has ECC) |
I'd like to see a fix for this, and I'd be willing to contribute to that. We use the LPC55xx processors and being able to use the MCUboot with them would be nice, especially since we are already using it with Nordic NRF52 and i.MX RT series of micros. It seems as though the issue is isolated to memories that implement ECC, is that correct? |
I'll go ahead an re-open this, as I am actually working on what I hope is a solution to this. I want to basically add support for flash devices with large write sizes. This will likely also require relatively small erase sizes. |
Hi @d3zd3z I was facing problem with firmware upgrade on STM32H743 controller because of 32byte alignment issue. Is this problem is fixed any of the new release version. I am looking for standard solution which can be compatible with other stm32 controller as well. Can you please provide the update on this issue. |
Any news regarding LPC55xx series support? It would be awesome. |
Same question over here. Is there an ongoing effort? Anyone from NXP that can assist? Maybe @DerekSnell ? (Referring to zephyrproject-rtos/zephyr#49246) |
#1609 but that only allows setting a custom value. Didn't do anything regarding:
|
This issue has been marked as stale because it has been open (more than) 60 days with no activity. Remove the stale label or add a comment saying that you would like to have the label removed otherwise this issue will automatically be closed in 14 days. Note, that you can always re-open a closed issue at any time. |
Need to unstale this |
This issue has been marked as stale because it has been open (more than) 60 days with no activity. Remove the stale label or add a comment saying that you would like to have the label removed otherwise this issue will automatically be closed in 14 days. Note, that you can always re-open a closed issue at any time. |
It seems MCUboot is now supported on the LPC5500 devices, although only in Linking @DerekSnell from NXP's reply here, for those interesting in LPC55xx support: |
One fix for this would be to change the way the upgrade process is "logged". Instead of writing to flash each step of the process, as it's done now, which is limited by the "write size", one could build a table of the pre-calculated CRC-32 of every sector which will be swapped, and save it all in a single write. If the swap is interrupted the CRC-32 data can be used to find where it stopped. At least I think it makes sense in theory! Not a walk in the park, but probably not too hard and time consuming to create a PoC. |
We are also using STM32H7, and also facing same issue, not able to decipher your comments, regarding to BOOT_MAX_ALIGN and BOOT_MAGIC_SZ , how can we adapt changes? |
Some newer devices have flash configurations that are not supported currently by MCUboot. This issue attempts to collect these in one place to help with the design of any solutions to handle this situation.
Known devices:
The text was updated successfully, but these errors were encountered: