danfoss: Update time regularly to account for loss and drift #8479

kennylevinsen · 2024-12-14T21:41:33Z

Danfoss eTRVs will regularly lose their time, such as on battery change, after updates, after hardware resets, etc., which leads to wildly inaccurate date and time of day. Even without these events, clock drift may lead to slowly growing clock errors.

This not only breaks scheduling, but also causes the automatic adaption runs which cut off heating to run at inappropriate times such as in the middle of the day when heat is required.

Follow the Danfoss recommendations and set time when the device connects, and at a one week interval.

/cc @filips

Danfoss eTRVs will regularly lose their time, such as on battery change, after updates, after hardware resets, etc., which leads to wildly inaccurate date and time of day. Even without these events, clock drift may lead to slowly growing clock errors. This not only breaks scheduling, but also causes the automatic adaption runs which cut off heating to run at inappropriate times such as in the middle of the day when heat is required. Follow the Danfoss recommendations and set time when the device connects, and at a one week interval.

Koenkk · 2024-12-15T19:47:33Z

Thanks!

Nerivec · 2025-01-09T21:13:11Z

From what I've been told about this device, I wonder if it might be worth putting the start setTime in a setTimeout since they apparently are slow to reconnect on start and this results in lots of timeouts.
Something like this:

else if (type === 'deviceAnnounce' || type === 'start') {
    if (type === 'start') {
        setTimeout(async () => await setTime(device), 300000);
    } else {
        await setTime(device);
    }

    // rest...
}

_Note: first line is just because we shouldn't create an array to check against two values 😉)

Though, if the device always deviceAnnounce on reconnect, it might not be needed on start at all.

CC: @antst

kennylevinsen · 2025-01-09T23:16:29Z

I have 6 Ally TRVs on my network, and I don't have any logged timeouts from them across several z2m restarts. Maybe open an issue with logs and description?

_Note: first line is just because we shouldn't create an array to check against two values 😉)

That's is just following the existing style, which I don't have an opinion on.

Nerivec · 2025-01-09T23:34:59Z

I'm only relaying some observations from a previous debugging session with antst.
Since yours aren't affected, I guess there is a difference with @antst
I'll let you guys check, might be a different firmware, or another device that's causing trouble for these in the network?

antst · 2025-01-09T23:41:16Z

I have 6 Ally TRVs on my network, and I don't have any logged timeouts from them across several z2m restarts. Maybe open an issue with logs and description?

_Note: first line is just because we shouldn't create an array to check against two values 😉)

That's is just following the existing style, which I don't have an opinion on.

I run 16 of them for 2.5 years in a busy network. They are, in my opinion, the best zigbee TRVs (and I tried most of what market can offer), and part of it is that they are very good on battery life. And part of being good on battery life is to don't rush with communication.
You can try to put your network down for couple of hours and start again, and see. Especially will help, if half of them not in direct range of coordinator, but rather go through mesh :)

From my experience, after anything going wrong with my mesh, due to any reasons, last one to recover are danfoss valves, and it take up to hour for them. Especially if coordinator was offline for quite some time. Everything else tends to go up quite instantly, but not those guys. And I did quite some amount of testing back in time(when spotted for the first time), it is not due to some "unfriendly" devices in my network or something. To me it fills that once they sense a lot of errors in communication with coordinator, they just go into power saving mode with communication. They do something differently, and go back to normal way of communication only some time after network recovered into healthy state.

antst · 2025-01-09T23:44:19Z

I'm only relaying some observations from a previous debugging session with antst. Since yours aren't affected, I guess there is a difference with @antst I'll let you guys check, might be a different firmware, or another device that's causing trouble for these in the network?

In my case it also fly now even with unchanged code. We noticed this issue after my network was down for sometime (due to the flashing and replacing adapters etc), so TRVs went to relaxed mode :)
Once it revitalized, now restart takes not time even with old code.
But very first time things will go wrong, old code will manifest with the same issue :)

PS: by "old code" here I mean old code of onEvent :)

kennylevinsen · 2025-01-10T00:24:25Z

Ah well, if you restart z2m while the TRVs are unavailable in an error state where they won't reconnect for an hour, then getting a timeout is expected. You should only get one for the TRV, as the next attempt will be once the device announces or a week later, whichever comes first. You could also have e.g. setpoint automations, load balance, external temp sync attempt to write to the TRVs during this time, which would also time out.

And part of being good on battery life is to don't rush with communication.

The device is only expected to sleep for a handful of seconds (or as configured in the poll control cluster, which can allow more extreme battery optimizations than what we're doing) in order to stay in spec and have reliable packet delivery. Being down for up to an hour sounds more like weird error handling (exponential backoff on reconnect?).

antst · 2025-01-10T00:30:51Z

Ah well, if you restart z2m while the TRVs are unavailable in an error state where they won't reconnect for an hour, then getting a timeout is expected. You should only get one for the TRV, as the next attempt will be once the device announces or a week later, whichever comes first. You could also have e.g. setpoint automations, load balance, external temp sync attempt to write to the TRVs during this time, which would also time out.

Yes, it is the case. I actually run load balancing and external temperature sensors :)

And part of being good on battery life is to don't rush with communication.

The device is only expected to sleep for a handful of seconds (or as configured in the poll control cluster, which can allow more extreme battery optimizations than what we're doing) in order to stay in spec and have reliable packet delivery. Being down for up to an hour sounds more like weird error handling (exponential backoff on reconnect?).

At least this is what I see, if I leave mesh without coordinator for quite some time, and then start coordinator back.
And yes, I think they internally do something like exponential backoff or something else like that, not sure though that current behavior is what they really wanted :)
But, in any case, there is definitely something awkward going on with those devices after they had too many attempts to send something to coordinator and couldn't. They recover from this behavior, but not instantly.

kennylevinsen force-pushed the danfoss-time-sync branch from 678686b to e417faa Compare December 14, 2024 23:15

Koenkk merged commit fdde721 into Koenkk:master Dec 15, 2024
2 checks passed

Koenkk mentioned this pull request Dec 15, 2024

chore(master): release 21.5.0 #8477

Merged

kennylevinsen deleted the danfoss-time-sync branch December 15, 2024 19:57

Nerivec mentioned this pull request Jan 9, 2025

fix: Long startup time when OnEvent 'start' times out (repeatedly) Koenkk/zigbee2mqtt#25693

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

danfoss: Update time regularly to account for loss and drift #8479

danfoss: Update time regularly to account for loss and drift #8479

kennylevinsen commented Dec 14, 2024

Koenkk commented Dec 15, 2024

Nerivec commented Jan 9, 2025

kennylevinsen commented Jan 9, 2025

Nerivec commented Jan 9, 2025

antst commented Jan 9, 2025 •

edited

Loading

antst commented Jan 9, 2025 •

edited

Loading

kennylevinsen commented Jan 10, 2025

antst commented Jan 10, 2025 •

edited

Loading

danfoss: Update time regularly to account for loss and drift #8479

danfoss: Update time regularly to account for loss and drift #8479

Conversation

kennylevinsen commented Dec 14, 2024

Koenkk commented Dec 15, 2024

Nerivec commented Jan 9, 2025

kennylevinsen commented Jan 9, 2025

Nerivec commented Jan 9, 2025

antst commented Jan 9, 2025 • edited Loading

antst commented Jan 9, 2025 • edited Loading

kennylevinsen commented Jan 10, 2025

antst commented Jan 10, 2025 • edited Loading

antst commented Jan 9, 2025 •

edited

Loading

antst commented Jan 9, 2025 •

edited

Loading

antst commented Jan 10, 2025 •

edited

Loading