-
-
Notifications
You must be signed in to change notification settings - Fork 408
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Random firmware reset #206
Comments
There were some commits for the upstart config which could possibly fix this. You might want to try those |
How does the upstart fix Firmware resets? |
Do you mean 6fd1be6? |
There is some kind of error counter somewhere in the roborock software which reverts to the previous firmware if it reaches a certain value. I have no idea where it is and what causes it to increment, but I assume that going OOM is one thing that might do it. 9108819 contains some mitigations against memory leakage which causes the player process to be killed. |
Thanks for clarification :) I'll give it a try. First I have to re-flash the firmware to get back control over the robot... |
Robot is back to life, lets see what happens. |
This morning robot was not reachable and wlan led was off. So I decide to reboot, after that robot was reconnecting to wifi and NOT reseted. So there seems to be another problem. valetudo log contains:
and boot.log:
|
That sounds very broken |
... Flashed with a fresh firmware yesterday. Works fine for a whole day, and works also after a reboot. No idea what's wrong |
@Hypfer does the reboot command over ssh works for you? Running |
Yup works here |
Ok so what I've done: For now, no more errors in valetudo log and reboot works also. Lets see what happens tomorrow :D |
Robot is now up and running since 3 days, but: Lost the map today..... |
Because I had my robot also reset on my 2 days ago, I was searching for related issues. For me, it took about a week before the robot reset itself (I think, didn't pay to much attention) In regards to the error counter, could it be possible they have it in memory? |
There is already a memory limit with the upstart config on master which should prevent the resets I think |
Currently for me its very hard to get a stable solution. For me the memory limit does not solve the reset issue finally. |
Do you use the memory limit already? Well then I am afraid I cannot help :( I do think however that restarting player is not the right direction to go torwards .. maybe just downgrade valetudo to a more stable version |
Jep using the limit :) dont worry i know its hard develop for a "closed source" device. Random resets and my current map issue are hard to debug. |
…same here; memory limit set, reboot & firmware reset this night at 3:52 am. |
For me the robot is stable since more than a week with my changes. The map was back after a reboot, no idea why the map wasn't displayed |
@xoxys would you mind telling me where exactly to remove dnsmasq from? Also, what might be interesting is how many times the vacuum has been active. |
@dugite-code good catch :) I'm not sure if @tadly you should maybe follow @dugite-codeinstructions first, bu if you look for the file I think it is under /etc/upstart |
Hi everyone, last night my robot also reset its firmware and woke us up in the middle of the night. |
i also have the same problem; after some days the rockrobo looses config. After rebooting the rockrobo i can connect to the internal wifi, but no ssh or GUI access. So it seems it is completely reset. Very annoying. Hope this is solved soon.. |
@posixx It's not that simple. If you have any suggestions please share with us :) |
@rassaei As I know the token will be randomly created at first boot. So after a full reset the token will be re-generated. |
@posixx @xoxys Found something new: On the German roboter-forum.com there was a thread suggesting they stopped seeing this issue once they added the missing |
After every reset I'm flashing the same firmware image (based on 1792) on my gen2. That image contains
But I'm still seeing reboots once in a while. I did not modify that image otherwise, so maybe multiple conditions have to be met. |
The script should not use |
So I implemented a flag-cleaner in my dustbuilder and tested it against the v1-fw (4004 and 4007). It works so far for me. I created it with init scripts for runlevel0 (restart) and runlevel6(halt) for the Ubuntu based firmware. The script checks for the flag "04" and resets its to "01" if necessary. That is checked at boot-time and at restart. I think that's the cleanest solution. Here is the code I use: https://dustbuilder.xvm.mit.edu/resetfix_maybe/ |
@dgiese Thanks for the script. You can replace
with something like this:
By the way |
Thanks for the hint. I mean for know it works. @sareyko original script was more cleaner (please dont sue me), but I wanted to make sure that it runs in every environment (bash, ash). I somehow cannot get the print the hex stuff in ash. Also I write now to /mnt/reserve as that partition is persistent over factory resets. |
If you want to be most compatible, don't try to use hex characters but use octal instead. From the POSIX manual of printf: "Hexadecimal character constants as defined in the ISO C standard are not recognized in the format". The following worked fine for me under ash version 0.5.10.2:
This also worked under dash, bash, zsh and ksh. |
will using this script cause damage? I mean in that whatever is causing the device to factory reset, if we prevent the reset, then some sort of damage would happen that resetting would have prevented |
I doubt it but of course there's no warranty here. Everything you're doing is your own risk. Personally I'd say it's plausible that roborock simply added some kind of hidden root detection which messes with the user just enough that they get nudged back to use the cloud. Especially since that started happening only after they became aware of Valetudo which is a viable alternative. I can't think of any permanent damage that could be caused by this. It's much safer than disabling reboots imo |
thanks, that seems fair... i guess it would still be nice to eventually find the actual cause of the reset flags being set, but a step in the right direction at least! |
As I understood the resets just occur when you use a custom firmware made with Dustbuilder and add Valetudo. I've never heard that the resets occur with custom firmware from Vacuumz or did I miss anything? If this just happens with Dustbuilder images, there might be other possibilities then a root detection. Maybe it's just a bug in Dustbuilder which is triggered by Valetudo? Nevertheless, thanks for all your effort to dig deeper in this topic :D |
Oh shoot! Forgot about the
Why the boot-time check? I don't think that's necessary and might actually be a bad idea in certain circumstances.
I totally agree here. As long as the source of the resets is unknown, preventing them may actually brick the devices. For all we know the resets might actually be needed to recover the device in some kind of filesystem failure.
The flags get set by |
What is triggering the factory reset when you're doing it via the hardware buttons? https://github.com/dgiese/dustcloud/wiki/Xiaomi-Vacuum-Robots-Factory-Reset If that is a hardware feature, it should be possible to always recover 🤔 |
Someone else can chime in if they have experienced differently, but in my experience:
edited to also indicate differences in valetudo versions as pointed out by @dgiese |
So from my experience the flags are safe as in the worst case the vacuum does a factory reset via u-boot. As long as you dont mess up the recovery copy of the OS you should be fine. From what I saw the flag "0x4" should never occur under normal circumstances. The other flags (1-3) are normal or are set while an update. About the differences: vacuumz has prebuild images. If no resets have occured, then there might be two theories: it could be the valetudo version (RE vs. vanilla) or their images have set something special. Technically the images out of dustbuilder should not really differ in configurations, but maybe there is something weird. However resets existed before dustbuilder, so it must be something with valetudo or some configuration... |
I did not get it, can i just build the new 0.5.1 firmware for my gen1 and this fix will be included, or i need to use DustBuilder with "experimental feature" somehow? |
In the release notes of 0.5.1 @Hypfer states, that you either can apply the fix your self for local firmware builders, or just use dustbuilder (https://builder.dontvacuum.me/). When using dust builder just don't forget to check the box in 'experimental features': |
When building locally with vacuum the flag |
Do I still need to do something manually with 0.5.2? Or is this fix now canon? |
@2relativ you will need to build a new firmware image with the mitigation enabled. Just replacing the valetudo binary is not enough |
@Hypfer thanks! So I don't need to set a flag or something else. Just build a new Image with 0.5.2? |
If you follow the updated guide in the docs everything should be fine |
Our vacumm just resetted itself again. This time after 4months~, or so. I'll use the new solution, and hopefully it'll stay valetudoed 😁 |
@exetico Should be solved now. Just make sure to enable the mitigation when building a new firmware |
Hi @Hypfer Thanks for the reply. Iit was not my thought to disturb you :-) I just wanted to report my latest issue, to have the timestamp at some place - hereafter i just wanted to find a bit of time, and reflash it. I've now grapped the lastest version from DustBuilder, including the reset-fix, and evertything is "back to normal". Fingers crossed :-) |
I updated a few weeks ago as well using the fix and until now it didn't reset. I don't really think that has anything to do with this issue in particular but as I have no way to reproduce and it's some kind of reset I thought I'd mention it here. |
Same for me. I updated in March with the fix-Script:
Up to now no reset and the filesystem looks good as well: Note: I don‘t have any zones defined |
Since the mitigation does seem to work fine, this issue will now be closed and hopefully never reopened again. |
I've just now discovered this thread. I'm going to give the fix a try. This is mostly just some thoughts I came up with when reading the thread history. From what I gather, the most likely source of problems stems from If They also could do something like reset voltage regulators during boot, which could cause a momentary brownout on eMMC. If it was in the process of writing, that could corrupt data. Or maybe they didn't wire up the eMMC reset line. Or maybe this eMMC doesn't work so well under reset. Given some of the early reports, it certainly sounds like a WDT, especially since there aren't any logs. Maybe this SoC's WDT reports the current count, and we can read that value back. If this happens again, we should look closer at the hardware watchdog timer as a source of these reboots and filesystem corruption. Thanks to all who investigated it. Update: I did some poking and it looks like Read the current status: [root@rockrobo ~]# ./devmem2 0x01C20Cb8
/dev/mem opened.
Memory mapped at address 0xb6f5f000.
Value at address 0x1c20cb8 (0xb6f5fcb8): 0x000000b1
[root@rockrobo ~]# Status The watchdog is set to restart the chip immediately if it doesn't get fed every 16 seconds: [root@rockrobo ~]# ./devmem2 0x01C20Cb4
/dev/mem opened.
Memory mapped at address 0xb6fc8000.
Value at address 0x1c20cb4 (0xb6fc8cb4): 0x00000001
[root@rockrobo ~]# If this happens again, one thing we can try is setting it to issue an interrupt rather than restarting (this can be done by running But again, that's only if this problem hasn't already been solved. |
As pointed out in the dustcloud group by @bsdice, these resets seem to be related to memory usage. For Firmware 1720 these default in the WatchDoge binary should be and can also be overridden by setting larger values as env variables in the |
Hi, im running valetudo 0.3.1 and today robo was not accessable, so i restarted the robot and after that he was back with default AP. No ssh connection possible und no valetudo available at port 80.
So maybe this is not fixed?
The text was updated successfully, but these errors were encountered: