Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issues with multiple commands sent in quick succession? #20

Open
lockyt opened this issue Jul 24, 2019 · 26 comments
Open

Issues with multiple commands sent in quick succession? #20

lockyt opened this issue Jul 24, 2019 · 26 comments

Comments

@lockyt
Copy link

lockyt commented Jul 24, 2019

Hi @the1laz - great work with cgateweb. It's a great little script.

I am however having an issue with CBUS which may or may not be the cgateweb script causing the problem which I am hoping you can help with.

To explain the situation. We have a Qnap NAS running docker containers in the following configuration. CBUS(CNI)<->Cgate<->Cgateweb<->mosquito eclipse<->Home Assistant<->Homekit. All works just fine for turning single lights on/off and works super fast. However, when we send more complex commands such as turning off the kitchen lights or downstairs lights, it has problems where it seems some lights fail to report back to HA that they turned on/off. So for example, you tell Siri to turn off all kitchen lights and any lights that are on turn off. However, in HA it shows that some kitchen lights are still on but they have actually turned off. Looking at the cgateweb logs, it shows the command being sent to turn the lights off and they work (lights go out) but the responses don't come back for the lights (which are showing as still on).

Looking at cgate logs, I can see the commands come in to turn on the lights and see the responses go back for all the lights. So it seems that somewhere between cgate and cgateweb the responses to update what happened are getting lost.

I have isolated it down to one of these two (cgate/cgateweb) as everything else seems to be working as expected.

Here are the screenshots of both cgate and cgateweb during the kitchen lights on/off scenario as above, as well as what is happening in Home Assistant:

Turn off all kitchen lights either through HA or SIRI:
Screen Shot 2019-07-24 at 7 51 30 pm
all physically turn off as requested but HA doesn't get a response for 28 and 30 so it thinks they are still on
Screen Shot 2019-07-24 at 7 51 24 pm

Turn on all kitchen lights through either HA or SIRI:
Screen Shot 2019-07-24 at 7 51 54 pm
only turns on those in HA/CBUS that it thinks are off (ie, everything but 28/30)
Screen Shot 2019-07-24 at 7 51 48 pm

If you go from the bottom up on the logs you can find where all lights were turned on or off. You can see the device ids being turned on/off and the responses from cgate confirming them. But cgateweb only shows some of the responses coming in, not all and the ones that don't are the lights that stay on/off. Now it could be cgate logging something it isn't sending or perhaps cgateweb not processing and accepting the responses. We have a very large cbus installation and it could be related to that.. a total of ~190 lights and blinds in the place. However this issue happens even on the smallest group - in this instance only 6 lights.

Any help would be greatly appreciated. I will need to confirm what version of cgate we are using and whether there is anything special with the cbus installation (need to speak with the CBUS guy) but thought I'd check with you first.

Thanks for your help in advance.

Locky.

@lockyt
Copy link
Author

lockyt commented Jul 26, 2019

Hi @the1laz - I spoke with the CBus guy today and he suggested that it could just be the CBus network not being able to process a bunch of commands being sent at once. Although I think we are seeing a response from cgate in the console, it doesn't mean it is actually being sent so I guess it is possible. He is suggesting we send commands with a microsecond or two delay to ensure it doesn't get hit with too many commands all at once. Not sure if that is possible with the script.

He also suggested we use levels instead of on/off commands which I will test. Some of the lights aren't dimmable so it's not a perfect solution and not sure how it would work.

He is also going to setup a scene or two for me to use instead but it kinda defeats the purpose of using Home Assistant/HomeKit. I want to be able to move lights in or out of a room on HomeKit and ask for all lights in that group to be turned on/off. Keep it dynamic. The CBus guy here in Canberra is impossible to get hold of at the best of times, so would prefer not to rely on groups being defined in CBus.

Thanks.

Locky.

@lockyt
Copy link
Author

lockyt commented Jul 26, 2019

Sorry one other question - does cgateweb use a quality of service parameter with MQTT? If so, is it possible to change it to guaranteed delivery?

Thanks!

@the1laz
Copy link
Owner

the1laz commented Jul 31, 2019

Sorry Locky, I haven't had a chance to have a good look at this but I'm sure it won't be too hard to solve. I didn't really consider sending multiple commands at once when I put this together, so I'm not surprised that it's not working. I'll have a look this weekend.

@lockyt
Copy link
Author

lockyt commented Jul 31, 2019

Awesome - that'd be great thanks mate. You may want to check my logic is right first as I'd hate you to do a bunch of work to find it is something I am doing stupid.

@lockyt
Copy link
Author

lockyt commented Aug 6, 2019

Hey Stephen - any luck over the weekend? Get a chance to have a look? Sorry no pressure - just glad you are happy to help.

@the1laz
Copy link
Owner

the1laz commented Aug 11, 2019

No luck last weekend due to my pi crashing at some point then giving me grief when starting up. Got some time today though and stuck a couple of message queues in that'll hopefully help. Seems good at my end, let me know if it fixes it for you.

@the1laz
Copy link
Owner

the1laz commented Aug 11, 2019

Don't forget to grab a copy of your settings before downloading the latest commit. Sorry, the settings file (and all the rest of the code) could be better, but I've not really put much effort into this since I first got it working.

@lockyt
Copy link
Author

lockyt commented Aug 11, 2019

Ok great thanks Stephen. Unfortunately I won't be able to test it until later in the week but I will let you know how I go!

@lockyt
Copy link
Author

lockyt commented Aug 16, 2019

Hi Stephen - just tested the new script and definitely a major improvement. I noticed the events have slowed down (your flow control I presume) and whereas it worked properly 10% of the time before, it now works properly 90% of the time. Out of the 10 or so tests I did, we did have a command or two not respond on it's state (which in turn doesn't update HA on the status of the light).

I am hoping it is just a tuning thing now to get the sweet spot of speed vs commands not being responded to? Is that something I can change on the fly or am I over simplifying it? Please let me know.

But definitely a lot better so I think we are going down the right track.

Thanks again mate for all your help.

@lockyt
Copy link
Author

lockyt commented Aug 16, 2019

Screen Shot 2019-08-17 at 9 19 12 am

This is a screen shot on one of the times it didn't work. The lights actually turned off (so the commands got to CBUS) but the responses from the CBUS through cgateweb don't all get through. This was only one test of 2 out of 10 that didn't work.

Thanks mate.

@the1laz
Copy link
Owner

the1laz commented Aug 18, 2019

Hi Locky, I'm hoping it's just a tuning thing. I've added an option to change the message interval in settings.js and bumped the default up to 200ms between messages. Hopefully that'll work for you, seems to be working for me.

@lockyt
Copy link
Author

lockyt commented Aug 18, 2019

Great. I'll try again and let you know.

@lockyt
Copy link
Author

lockyt commented Aug 21, 2019

Hi Stephen - just to let you know that it all now seems to be working a charm. Thanks so much for all your help! Really appreciated.

@the1laz
Copy link
Owner

the1laz commented Aug 21, 2019

No problem Locky, good to hear it's working.

@GledholtHall
Copy link

GledholtHall commented Jul 25, 2020

I'm kinda surprised at this - but obviously it works.

There is a TCP socket connection to C-Gate (no dropped info there) and C-Gate handles all the C-Bus communication and includes it's own buffering and flow control. It is true that C-Bus itself has very limited capacity for rapid commands but C-Gate manages this (or should).

The connection to and from MQTT is TCP based so again no dropped messages there so it's a fairly solid path apart from 'within C-Gate' over which you have no control. The fact the buffer solution works implies issues there.

... anyway as it works, this is not too important

@lockyt
Copy link
Author

lockyt commented Jul 25, 2020

Thanks @GledholtHall - not wasted effort at all as it is good to discuss these things for better understanding. The changes that @the1laz made certainly worked as I think the issue was not at either the MQTT or CBUS end but within the NodeJS script he developed. He hadn't needed queuing within his script previously as his use case differed from mine but very helpfully added some queuing for me (which I still really appreciate thank you @the1laz!) which worked a charm. It did introduce a more serial on/off of bulk commands from what it was prior to controlling queued events (which was more synchronous), but that is as expected. If @the1laz had the desire to improve his script further, a more synchronous queueing would be happily received, but the way it works now is all a-ok. I will add that the client has been going solidly for nearly 12 months now with zero issues so fixed it well and good. Thanks.

@haqeem18
Copy link

Hi @lockyt , I've just got cgateweb running on a pi4 and facing the same issues aswell. I was wondering what interval did you set for your use case? I've tried fine tuning with the timing which does improve, but there is still a good chance of not all the state update correctly.

Thanks.

@lockyt
Copy link
Author

lockyt commented Apr 15, 2021

Hi @haqeem18. It was actually for a client so I don't fully remember what interval I ended up using and don't have access to his network atm. I think I went slightly over the default 200ms but you are best to go with trial and error. Set it and try a large on/off command - we used to test turning on/off his kitchen area which was about 20 CBUS devices over and over until we got a number that didn't have any failures.

Sorry I can't be more specific but just up the number slightly until it is no longer causing problems.

Thanks.

@rbswift
Copy link

rbswift commented Jul 7, 2022

I'm seeing the same behaviour. It seems to be caused by receiving message at high rate from cbus. I'm testing using 12 group scenes to test so not writing anything via cgateweb, just reading received messages. With telnet all 12 events are received fine but with mosquitto_sub via cgateweb several are missed

telnet pi4 20025
lighting on //HOME/254/56/1  #sourceunit=102 OID=2c4bd5c0-db1d-103a-9fba-c36d60777dc1
lighting on //HOME/254/56/2  #sourceunit=102 OID=2c4b1270-db1d-103a-9fbe-c36d60777dc1
lighting on //HOME/254/56/3  #sourceunit=102 OID=2c4bfcd0-db1d-103a-9fbb-c36d60777dc1
lighting on //HOME/254/56/4  #sourceunit=102 OID=2c4c23e0-db1d-103a-9fbc-c36d60777dc1
lighting on //HOME/254/56/5  #sourceunit=102 OID=2c4c4af0-db1d-103a-9fbd-c36d60777dc1
lighting on //HOME/254/56/6  #sourceunit=102 OID=2c4b3980-db1d-103a-9fbf-c36d60777dc1
lighting on //HOME/254/56/7  #sourceunit=102 OID=2c4b6090-db1d-103a-9fc0-c36d60777dc1
lighting on //HOME/254/56/8  #sourceunit=102 OID=2c4baeb0-db1d-103a-9fb9-c36d60777dc1
lighting on //HOME/254/56/9  #sourceunit=102 OID=2c4b87a0-db1d-103a-9fc1-c36d60777dc1
lighting on //HOME/254/56/10  #sourceunit=102 OID=2c4bfcd0-db1d-103a-9fc4-c36d60777dc1
lighting on //HOME/254/56/11  #sourceunit=102 OID=2c4baeb0-db1d-103a-9fc2-c36d60777dc1
lighting on //HOME/254/56/12  #sourceunit=102 OID=2c4bd5c0-db1d-103a-9fc3-c36d60777dc1

lighting off //HOME/254/56/1  #sourceunit=102 OID=2c4bd5c0-db1d-103a-9fba-c36d60777dc1
lighting off //HOME/254/56/2  #sourceunit=102 OID=2c4b1270-db1d-103a-9fbe-c36d60777dc1
lighting off //HOME/254/56/3  #sourceunit=102 OID=2c4bfcd0-db1d-103a-9fbb-c36d60777dc1
lighting off //HOME/254/56/4  #sourceunit=102 OID=2c4c23e0-db1d-103a-9fbc-c36d60777dc1
lighting off //HOME/254/56/5  #sourceunit=102 OID=2c4c4af0-db1d-103a-9fbd-c36d60777dc1
lighting off //HOME/254/56/6  #sourceunit=102 OID=2c4b3980-db1d-103a-9fbf-c36d60777dc1
lighting off //HOME/254/56/7  #sourceunit=102 OID=2c4b6090-db1d-103a-9fc0-c36d60777dc1
lighting off //HOME/254/56/8  #sourceunit=102 OID=2c4baeb0-db1d-103a-9fb9-c36d60777dc1
lighting off //HOME/254/56/9  #sourceunit=102 OID=2c4b87a0-db1d-103a-9fc1-c36d60777dc1
lighting off //HOME/254/56/10  #sourceunit=102 OID=2c4bfcd0-db1d-103a-9fc4-c36d60777dc1
lighting off //HOME/254/56/11  #sourceunit=102 OID=2c4baeb0-db1d-103a-9fc2-c36d60777dc1
lighting off //HOME/254/56/12  #sourceunit=102 OID=2c4bd5c0-db1d-103a-9fc3-c36d60777dc1
mosquitto_sub -t '#' -v | grep cbus

cbus/read/254/56/1/state ON
cbus/read/254/56/1/level 100
cbus/read/254/56/3/state ON
cbus/read/254/56/3/level 100
cbus/read/254/56/7/state ON
cbus/read/254/56/7/level 100
cbus/read/254/56/8/state ON
cbus/read/254/56/8/level 100
cbus/read/254/56/9/state ON
cbus/read/254/56/9/level 100
cbus/read/254/56/10/state ON
cbus/read/254/56/10/level 100

cbus/read/254/56/1/state OFF
cbus/read/254/56/1/level 0
cbus/read/254/56/2/state OFF
cbus/read/254/56/2/level 0
cbus/read/254/56/4/state OFF
cbus/read/254/56/4/level 0
cbus/read/254/56/5/state OFF
cbus/read/254/56/5/level 0
cbus/read/254/56/7/state OFF
cbus/read/254/56/7/level 0
cbus/read/254/56/8/state OFF
cbus/read/254/56/8/level 0
cbus/read/254/56/9/state OFF
cbus/read/254/56/9/level 0
cbus/read/254/56/10/state OFF
cbus/read/254/56/10/level 0
cbus/read/254/56/11/state OFF
cbus/read/254/56/11/level 0
cbus/read/254/56/12/state OFF
cbus/read/254/56/12/level 0


@rbswift
Copy link

rbswift commented Jul 8, 2022

It seems like this line:
event.on('data', function(data)
will sometimes receive multiple lines representing multiple c-bus events from cgate in a single net.socket event.

The code as it stands will only process the first line and drop the rest

The additional events can be seen if we uncomment line 320 in index.js

Hopefully this is the cause for everyone. i.e. there's no need to slow down the rate of messages, we just need to change the code to process all the c-bus event lines instead of just the first one then we can return to using cgate at full speed.

I'm no expert but it looks like we should use a carrier rather than just expect one-to-one correlation between socket-on-data and c-bus events as has been used in homebridge-cbus

@GledholtHall
Copy link

GledholtHall commented Jul 8, 2022

Useful detective work - watching the progress with interest.

I haven't noticed this issue but I don't think I have many times when such a rapid number of changes happen coincidentally. But I would like my implementation to be 100% and operate at max speed so if there's a fix that would be reassuring.

@rbswift
Copy link

rbswift commented Jul 8, 2022

Thanks @GledholtHall you can easily reproduce the problem if you use toolkit to set up a cbus scene with 12 relay groups. You'd probably get dropped events most times with as few as 6 groups. Setting multiple groups in rapid succession from a cbus key unit scene key the existing message interval setting isn't relevant as that only applies to sending rather than receiving.

@rbswift
Copy link

rbswift commented Jul 10, 2022

This is fixed in my fork now and high rate events pretty close to instantaneous.

@GledholtHall
Copy link

That's great - can I merge your changes into the branch I use (which supports more c-bus applications) ?

@rbswift
Copy link

rbswift commented Jul 10, 2022

That's great - can I merge your changes into the branch I use (which supports more c-bus applications) ?

Absolutely please try it

@bdnstn
Copy link

bdnstn commented Oct 11, 2022 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants