Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

danfoss: Update time regularly to account for loss and drift #8479

Merged
merged 1 commit into from
Dec 15, 2024

Conversation

kennylevinsen
Copy link
Contributor

Danfoss eTRVs will regularly lose their time, such as on battery change, after updates, after hardware resets, etc., which leads to wildly inaccurate date and time of day. Even without these events, clock drift may lead to slowly growing clock errors.

This not only breaks scheduling, but also causes the automatic adaption runs which cut off heating to run at inappropriate times such as in the middle of the day when heat is required.

Follow the Danfoss recommendations and set time when the device connects, and at a one week interval.

/cc @filips

Danfoss eTRVs will regularly lose their time, such as on battery change,
after updates, after hardware resets, etc., which leads to wildly
inaccurate date and time of day. Even without these events, clock drift
may lead to slowly growing clock errors.

This not only breaks scheduling, but also causes the automatic adaption
runs which cut off heating to run at inappropriate times such as in the
middle of the day when heat is required.

Follow the Danfoss recommendations and set time when the device
connects, and at a one week interval.
@Koenkk Koenkk merged commit fdde721 into Koenkk:master Dec 15, 2024
2 checks passed
@Koenkk
Copy link
Owner

Koenkk commented Dec 15, 2024

Thanks!

@Nerivec
Copy link
Contributor

Nerivec commented Jan 9, 2025

From what I've been told about this device, I wonder if it might be worth putting the start setTime in a setTimeout since they apparently are slow to reconnect on start and this results in lots of timeouts.
Something like this:

else if (type === 'deviceAnnounce' || type === 'start') {
    if (type === 'start') {
        setTimeout(async () => await setTime(device), 300000);
    } else {
        await setTime(device);
    }

    // rest...
}

_Note: first line is just because we shouldn't create an array to check against two values 😉)

Though, if the device always deviceAnnounce on reconnect, it might not be needed on start at all.

CC: @antst

@kennylevinsen
Copy link
Contributor Author

I have 6 Ally TRVs on my network, and I don't have any logged timeouts from them across several z2m restarts. Maybe open an issue with logs and description?

_Note: first line is just because we shouldn't create an array to check against two values 😉)

That's is just following the existing style, which I don't have an opinion on.

@Nerivec
Copy link
Contributor

Nerivec commented Jan 9, 2025

I'm only relaying some observations from a previous debugging session with antst.
Since yours aren't affected, I guess there is a difference with @antst
I'll let you guys check, might be a different firmware, or another device that's causing trouble for these in the network?

@antst
Copy link

antst commented Jan 9, 2025

I have 6 Ally TRVs on my network, and I don't have any logged timeouts from them across several z2m restarts. Maybe open an issue with logs and description?

_Note: first line is just because we shouldn't create an array to check against two values 😉)

That's is just following the existing style, which I don't have an opinion on.

I run 16 of them for 2.5 years in a busy network. They are, in my opinion, the best zigbee TRVs (and I tried most of what market can offer), and part of it is that they are very good on battery life. And part of being good on battery life is to don't rush with communication.
You can try to put your network down for couple of hours and start again, and see. Especially will help, if half of them not in direct range of coordinator, but rather go through mesh :)

From my experience, after anything going wrong with my mesh, due to any reasons, last one to recover are danfoss valves, and it take up to hour for them. Especially if coordinator was offline for quite some time. Everything else tends to go up quite instantly, but not those guys. And I did quite some amount of testing back in time(when spotted for the first time), it is not due to some "unfriendly" devices in my network or something. To me it fills that once they sense a lot of errors in communication with coordinator, they just go into power saving mode with communication. They do something differently, and go back to normal way of communication only some time after network recovered into healthy state.

@antst
Copy link

antst commented Jan 9, 2025

I'm only relaying some observations from a previous debugging session with antst. Since yours aren't affected, I guess there is a difference with @antst I'll let you guys check, might be a different firmware, or another device that's causing trouble for these in the network?

In my case it also fly now even with unchanged code. We noticed this issue after my network was down for sometime (due to the flashing and replacing adapters etc), so TRVs went to relaxed mode :)
Once it revitalized, now restart takes not time even with old code.
But very first time things will go wrong, old code will manifest with the same issue :)

PS: by "old code" here I mean old code of onEvent :)

@kennylevinsen
Copy link
Contributor Author

Ah well, if you restart z2m while the TRVs are unavailable in an error state where they won't reconnect for an hour, then getting a timeout is expected. You should only get one for the TRV, as the next attempt will be once the device announces or a week later, whichever comes first. You could also have e.g. setpoint automations, load balance, external temp sync attempt to write to the TRVs during this time, which would also time out.

And part of being good on battery life is to don't rush with communication.

The device is only expected to sleep for a handful of seconds (or as configured in the poll control cluster, which can allow more extreme battery optimizations than what we're doing) in order to stay in spec and have reliable packet delivery. Being down for up to an hour sounds more like weird error handling (exponential backoff on reconnect?).

@antst
Copy link

antst commented Jan 10, 2025

Ah well, if you restart z2m while the TRVs are unavailable in an error state where they won't reconnect for an hour, then getting a timeout is expected. You should only get one for the TRV, as the next attempt will be once the device announces or a week later, whichever comes first. You could also have e.g. setpoint automations, load balance, external temp sync attempt to write to the TRVs during this time, which would also time out.

Yes, it is the case. I actually run load balancing and external temperature sensors :)

And part of being good on battery life is to don't rush with communication.

The device is only expected to sleep for a handful of seconds (or as configured in the poll control cluster, which can allow more extreme battery optimizations than what we're doing) in order to stay in spec and have reliable packet delivery. Being down for up to an hour sounds more like weird error handling (exponential backoff on reconnect?).

At least this is what I see, if I leave mesh without coordinator for quite some time, and then start coordinator back.
And yes, I think they internally do something like exponential backoff or something else like that, not sure though that current behavior is what they really wanted :)
But, in any case, there is definitely something awkward going on with those devices after they had too many attempts to send something to coordinator and couldn't. They recover from this behavior, but not instantly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants