-
-
Notifications
You must be signed in to change notification settings - Fork 635
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Health Check triggers node re-interview, which reports wrong security value afterwards #4184
Comments
@darkbasic I don't think this is related to the new firmware, but unfortunately your driver log is not on loglevel "debug", so it doesn't tell much. I've tried my best with what I got here: The healing logs look unsuspicious. Not sure why the re-interview was triggered during the health check. At least the node doesn't list S0 as supported and S0 is also not used during the communication:
The automatic re-interview doesn't feel good here, I've raised #4185 for that. Its also a bit unfortunate that the re-interview starts in the middle of the health check while node neighbors are queried, I've added this as a TODO to #3707.
This is in the works for @robertsLando why does the UI refresh neighbors while the health check is going on? This could influence the results because it turns off the radio for a short time. |
The refresh is called when mesh is loaded or when there are changes in nodes array. It could be a change is detected in nodes object and that triggers a refresh. I need to investigate |
It was supposed to be on
Good catch, I noticed it as well but I forgot to report it. It's not the first time it happened.
Awesome, in the meantime is there anything we can do to find out why the jitter is so high? |
You could try looking at the debug level logs of the health check. The information about routes etc. is already there, it just isn't used yet. @robertsLando sometimes the refresh even causes an error if it happens more frequently than every 60s. I'd probably disable any automatic refresh during the health check or delay it until the window is closed. |
Here are the logs with debug level: Initially I just did a network heal. Apart from a node being marked as dead just to be immediately back I didn't notice anything unusual. Then at 2022-02-04 12:48:47.391 GMT+1 I started an health check on node 83: Here I see a lot of jitter once again. Then at 2022-02-04 12:51:28.064 GMT+1 I called an health check on node 28. |
If you click on the (?) button, it gives you some explanation for each parameter and how the rating is done. SNR is
basically the difference between measured RSSI of the node's ACK frames and the background RSSI that's measured between checks. According to the built-in IMA tool in PC contorller, it should be >= 17 dBm so your values are very good.
That must be this. IMO, this looks terribly like the 700 series issue isn't entirely solved for all commands, since the controller immediately says "I can't send this", although the next command works just fine. This only happened the one time though.
Regarding the jitter:
and this is the next one with higher latency:
So from the controller's POV, there isn't any difference, except the 2nd one takes longer. Because ACKs aren't retried AFAIK, the repeater node 2 probably needed 2 tries to reach node 83 on the longer attempts. Unfortunately, this can only be seen with a Zniffer.
See #4075 (comment), 2nd bullet point. This is a bug. |
I guess so, I'm not the first one who experienced something similar if I recall correctly. At least it's somehow usable now, but I still wouldn't suggest anyone buying a 700 series stick yet.
But between which nodes? I guess it must be between the controller and node 2 (the one one which routed the message), but that doesn't give the full picture if the signal between node 2 and node 83 is weak. I would be much more interested in the lowest SRN in the whole path (which is probably between node 2 and 83).
The zniffer is no problem, I've already decided to sacrifice one of my sticks (I still didn't do so because of lack of time). P.S. |
Yeah, but that isn't available:
I'm not sure what needs to be done to get this information recorded. Maybe the repeater needs to be 700-series?
No. You might get a somewhat reasonable estimation if the RSSI of the repeaters were known.
These? |
I've updated Z-Pi 7 to fw v7.17.2 and zwavejs2mqtt to 6.7.1.2a08abd (zwave-js 9.0.2). I've done a full heal, but unfortunately it still looks like nodes are dying/reviving mid-heal:
Doing a health check on node 83 still leads to bad routes w/ randomly lost packets: Doing an health check on node 28 still fails, but at least now it doesn't trigger re-interviews apparently: Debug logs: |
This is because the return routes can't be updated for some reason:
and this is incorrectly interpreted as a dead node #4191.
In the logs there is no difference, except that the attempts with more delay retried routing once - without changing the route though. Node 83 is routing through node 2 each time:
Question is if the route between 2 and 83 is causing these or the one between controller and 2.
It looks like it is stuck in a reverse ping check from a lower powerlevel and that is influencing the results for the higher powerlevels. I'll probably have to revise the binary search for the limit powerlevel and start from the highest, working doesn. |
Definitely between node 2 and 83 because the controller is a few centimeters away from node 2. Honestly I don't understand why it would pick such an useless hop when there are lots of other nodes in between. If I recall correctly it should be the firmware of the controller the one which computes these routes, right? |
Me neither, but maybe there was a situation where the controller didn't "see" 83 and node 2 did, so it took that route and stuck with it.
Correct
Sounds feasible IMO. We could even pin that route down as the application priority route if the user wants. I'll track that in a separate issue. |
Health check on node 28 seems to be fixed in git master thanks to #4494: Unfortunately healing still picks terrible routes and we absolutely need a way to pick better ones otherwise the whole network is basically unusable :( |
Is your problem within Home Assistant (Core or Z-Wave JS Integration)?
NO, my problem is NOT within Home Assistant or the ZWave JS integration
Is your problem within ZWaveJS2MQTT?
NO, my problem is NOT within ZWaveJS2MQTT
Checklist
I have checked the troubleshooting section and my problem is not described there.
I have read the changelog and my problem was not mentioned there.
Describe the bug
I've just updated my Z-Pi 7 firmware to 7.17.1. I didn't need to restore the NVM because another RaZberry firmware issue prevented me from migrating my network to a 500 series controller.
First thing I did a full network heal, which I'm not sure if it ever completed because I have 7 battery powered nodes which were turned off. The day after these nodes were still spinning in zwavejs2mqtt, but it might be a UI bug and maybe the backend might have timed them out and completed the heal (or maybe not?). Anyway I've attached both zwave-js and zwavejs2mqtt logs of the heal.
At this point I restarted zwavejs2mqtt and tried an health check on node 83:
I'm unsure why sometimes the latency is so high, unfortunately it doesn't show which route has been taken for each ping so this isn't that much useful. Would it be possible to implement that?
Anyway later I've tried to do the same for node 28 and, to my disbelief, it triggered a node re-interview. What's even worse, after the re-interview the node is shown an having S0 security (previously it had none, which should be the correct value).
What's going on?
I attached both zwave-js and zwavejs2mqtt logs.
Node 28 health check starts at
2022-02-03 15:08:09
on the zwavejs2mqtt log and2022-02-03T14:09
on zwavejs log (I'm not sure why the former uses local time while the latter is GMT).Device information
No response
How are you using
node-zwave-js
?zwavejs2mqtt
Docker image (latest)zwavejs2mqtt
Docker image (dev)zwavejs2mqtt
Docker manually built (please specify branches)ioBroker.zwave2
adapter (please specify version)HomeAssistant zwave_js
integration (please specify version)pkg
node-red-contrib-zwave-js
(please specify version, double click node to find out)Which branches or versions?
version:
node-zwave-js
: 8.11.2zwavejs2mqtt
: 6.5.0Did you change anything?
no
If yes, what did you change?
No response
Did this work before?
Don't know, this is a new device
If yes, where did it work?
No response
Attach Driver Logfile
Health Check logs:
zwavejs_2022-02-03.log
zwavejs2mqtt_2022-02-03.log
Network Heal logs:
zwavejs_2022-02-02.log
zwavejs2mqtt_2022-02-02.log
The text was updated successfully, but these errors were encountered: