Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/fix invalid command edge cases? #656

Closed
wants to merge 2 commits into from

Conversation

zeyus
Copy link

@zeyus zeyus commented Dec 5, 2024

My apologies if this PR is trash, but I noticed suddenly on the latest HA / zigpy / bellows update that my hacked Lidl Gateway stopped connecting. I checked the socket IP / Port and it worked fine so during the log checking, I noticed that there was an error.

I don't really know anything about this protocol, so I'm not sure if this will cause an issue for other people (presumably not), but anyone else with the same issue might be able to use this as a reference.

The error that was occurring was the following:

2024-12-05 14:15:52.757 DEBUG (MainThread) [zigpy.ota] Registering new OTA provider: Ledvance(url='https://api.update.ledvance.com/v1/zigbee/firmwares', manufacturer_ids=(4489, 4364))
2024-12-05 14:15:52.757 DEBUG (MainThread) [zigpy.ota] Registering new OTA provider: Sonoff(url=None, manufacturer_ids=(4742,))
2024-12-05 14:15:52.757 DEBUG (MainThread) [zigpy.ota] Registering new OTA provider: Inovelli(url=None, manufacturer_ids=(4655,))
2024-12-05 14:15:52.757 DEBUG (MainThread) [zigpy.ota] Registering new OTA provider: ThirdReality(url=None, manufacturer_ids=(4659, 4877, 5127))
2024-12-05 14:15:52.759 DEBUG (bellows.thread_0) [zigpy.serial] Opening a serial connection to 'socket://192.168.2.207:8888' (baudrate=57600, xonxoff=True, rtscts=False)
2024-12-05 14:15:52.761 DEBUG (bellows.thread_0) [zigpy.serial] Connection made: <bellows.ash.AshProtocol object at 0x7f3037fc1f50>
2024-12-05 14:15:53.762 DEBUG (MainThread) [bellows.ezsp] Resetting EZSP
2024-12-05 14:15:53.763 DEBUG (bellows.thread_0) [bellows.uart] Resetting ASH
2024-12-05 14:15:53.763 DEBUG (bellows.thread_0) [bellows.ash] Sending frame CANCEL + RstFrame() + FLAG
2024-12-05 14:15:53.763 DEBUG (bellows.thread_0) [bellows.ash] Sending data  1ac038bc7e
2024-12-05 14:15:54.846 DEBUG (bellows.thread_0) [bellows.ash] Received data 1ac1020b0a527e
2024-12-05 14:15:54.846 DEBUG (bellows.thread_0) [bellows.ash] Received cancel byte, clearing buffer
2024-12-05 14:15:54.846 DEBUG (bellows.thread_0) [bellows.ash] Received frame RStackFrame(version=2, reset_code=<NcpResetCode.RESET_SOFTWARE: 11>)
2024-12-05 14:15:54.846 DEBUG (MainThread) [bellows.ezsp] Switching to EZSP protocol version 4
2024-12-05 14:15:54.847 DEBUG (MainThread) [bellows.ezsp.protocol] Sending command  version: () {'desiredProtocolVersion': 4}
2024-12-05 14:15:54.847 DEBUG (bellows.thread_0) [bellows.ash] Sending frame DataFrame(frm_num=0, re_tx=False, ack_num=0, ezsp_frame=b'\x00\x00\x00\x04') + FLAG
2024-12-05 14:15:54.847 DEBUG (bellows.thread_0) [bellows.ash] Sending data  004221a850ed2c7e
2024-12-05 14:15:54.852 DEBUG (bellows.thread_0) [bellows.ash] Received data 0142a1a8532815d7
2024-12-05 14:15:54.853 DEBUG (bellows.thread_0) [bellows.ash] Received data c1bf7e
2024-12-05 14:15:54.853 DEBUG (bellows.thread_0) [bellows.ash] Received frame DataFrame(frm_num=0, re_tx=0, ack_num=1, ezsp_frame=b'\x00\x80\x00\x07\x02\x00e')
2024-12-05 14:15:54.853 DEBUG (bellows.thread_0) [bellows.ash] Sending frame AckFrame(res=0, ncp_ready=0, ack_num=1) + FLAG
2024-12-05 14:15:54.853 DEBUG (bellows.thread_0) [bellows.ash] Sending data  8160597e
2024-12-05 14:15:54.853 DEBUG (bellows.thread_0) [bellows.ash] Changing ACK timeout from 1.60 to 1.40
2024-12-05 14:15:54.854 DEBUG (MainThread) [bellows.ezsp.protocol] Received command version: {'protocolVersion': 7, 'stackType': 2, 'stackVersion': 25856}
2024-12-05 14:15:54.854 DEBUG (MainThread) [bellows.ezsp] Switching to EZSP protocol version 7
2024-12-05 14:15:54.854 DEBUG (MainThread) [bellows.ezsp.protocol] Sending command  version: () {'desiredProtocolVersion': 7}
2024-12-05 14:15:54.854 DEBUG (bellows.thread_0) [bellows.ash] Sending frame DataFrame(frm_num=1, re_tx=False, ack_num=1, ezsp_frame=b'\x00\x00\xff\x00\x00\x07') + FLAG
2024-12-05 14:15:54.854 DEBUG (bellows.thread_0) [bellows.ash] Sending data  7d31422157542a1240277e
2024-12-05 14:15:54.859 DEBUG (bellows.thread_0) [bellows.ash] Received data 1242a157542a12b0
2024-12-05 14:15:54.860 DEBUG (bellows.thread_0) [bellows.ash] Received data 59f17b747e
2024-12-05 14:15:54.860 DEBUG (bellows.thread_0) [bellows.ash] Received frame DataFrame(frm_num=1, re_tx=0, ack_num=2, ezsp_frame=b'\x00\x80\xff\x00\x00\x07\x02\x00e')
2024-12-05 14:15:54.861 DEBUG (bellows.thread_0) [bellows.ash] Sending frame AckFrame(res=0, ncp_ready=0, ack_num=2) + FLAG
2024-12-05 14:15:54.861 DEBUG (bellows.thread_0) [bellows.ash] Sending data  82503a7e
2024-12-05 14:15:54.861 DEBUG (bellows.thread_0) [bellows.ash] Changing ACK timeout from 1.40 to 1.23
2024-12-05 14:15:54.861 DEBUG (MainThread) [bellows.ezsp.protocol] Received command version: {'protocolVersion': 7, 'stackType': 2, 'stackVersion': 25856}
2024-12-05 14:15:54.862 DEBUG (MainThread) [bellows.ezsp] EZSP Stack Type: 2, Stack Version: 6500, Protocol version: 7
2024-12-05 14:15:54.862 DEBUG (MainThread) [bellows.ezsp] Sending XNCP frame: XncpCommand(command_id=<XncpCommandId.GET_SUPPORTED_FEATURES_REQ: 0>, status=<EmberStatus.SUCCESS: 0>, payload=GetSupportedFeaturesReq())
2024-12-05 14:15:54.862 DEBUG (MainThread) [bellows.ezsp.protocol] Sending command  customFrame: (b'\x00\x00\x00',) {}
2024-12-05 14:15:54.862 DEBUG (bellows.thread_0) [bellows.ash] Sending frame DataFrame(frm_num=2, re_tx=False, ack_num=2, ezsp_frame=b'\x01\x00\xff\x00G\x03\x00\x00\x00') + FLAG
2024-12-05 14:15:54.862 DEBUG (bellows.thread_0) [bellows.ash] Sending data  22432157546d16b259942e637e
2024-12-05 14:15:54.868 DEBUG (bellows.thread_0) [bellows.ash] Received data 2343a157546d15b1
2024-12-05 14:15:54.909 DEBUG (bellows.thread_0) [bellows.ash] Received data 5995495e857e
2024-12-05 14:15:54.909 DEBUG (bellows.thread_0) [bellows.ash] Received frame DataFrame(frm_num=2, re_tx=0, ack_num=3, ezsp_frame=b'\x01\x80\xff\x00G\x00\x03\x00\x01\x03')
2024-12-05 14:15:54.910 DEBUG (bellows.thread_0) [bellows.ash] Sending frame AckFrame(res=0, ncp_ready=0, ack_num=3) + FLAG
2024-12-05 14:15:54.910 DEBUG (bellows.thread_0) [bellows.ash] Sending data  83401b7e
2024-12-05 14:15:54.910 DEBUG (bellows.thread_0) [bellows.ash] Changing ACK timeout from 1.23 to 1.10
2024-12-05 14:15:54.910 DEBUG (MainThread) [bellows.ezsp.protocol] Received command customFrame: {'status': <EmberStatus.SUCCESS: 0>, 'reply': b'\x00\x01\x03'}
2024-12-05 14:15:54.911 DEBUG (MainThread) [bellows.ezsp] b'\x00\x01\x03'
2024-12-05 14:15:54.911 DEBUG (bellows.thread_0) [zigpy.serial] Waiting for serial port to close
2024-12-05 14:15:54.911 DEBUG (bellows.thread_0) [zigpy.serial] Connection lost: None
2024-12-05 14:15:54.911 DEBUG (bellows.thread_0) [bellows.uart] Connection lost: None
2024-12-05 14:15:54.912 DEBUG (MainThread) [zigpy.application] Connection to the radio has been lost: None
2024-12-05 14:15:54.912 DEBUG (MainThread) [zigpy.application] Failed to probe with config {'path': 'socket://192.168.2.207:8888', 'baudrate': 57600, 'flow_control': None}
Traceback (most recent call last):
  File "/usr/local/lib/python3.13/site-packages/zigpy/application.py", line 627, in probe
    await app.connect()
  File "/usr/local/lib/python3.13/site-packages/bellows/zigbee/application.py", line 155, in connect
    await self._ezsp.connect(use_thread=self.config[CONF_USE_THREAD])
  File "/usr/local/lib/python3.13/site-packages/bellows/ezsp/__init__.py", line 138, in connect
    await self.startup_reset()
  File "/usr/local/lib/python3.13/site-packages/bellows/ezsp/__init__.py", line 130, in startup_reset
    await self.get_xncp_features()
  File "/usr/local/lib/python3.13/site-packages/bellows/ezsp/__init__.py", line 184, in get_xncp_features
    self._xncp_features = await self.xncp_get_supported_firmware_features()
                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.13/site-packages/bellows/ezsp/__init__.py", line 680, in xncp_get_supported_firmware_features
    rsp = await self.send_xncp_frame(xncp.GetSupportedFeaturesReq())
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.13/site-packages/bellows/ezsp/__init__.py", line 664, in send_xncp_frame
    rsp_frame = xncp.XncpCommand.from_bytes(data)
  File "/usr/local/lib/python3.13/site-packages/bellows/ezsp/xncp.py", line 71, in from_bytes
    payload, rest = COMMANDS[command_id].deserialize(data)
                    ~~~~~~~~^^^^^^^^^^^^
KeyError: <XncpCommandId.undefined_0x0100: 256>

After adding the alt command to accept 0x0100, the same error happemed for each of the subsequent device info responses (e.g. build string, flow_control_type, etc).

The changes in this PR resolved the issue for me, obviously if I update, I'll lose them again (I can re-patch of course).

I would love to learn more about the protocol / transaction, so if there's anything I can do to improve this please let me know.

@puddly
Copy link
Contributor

puddly commented Dec 5, 2024

Interesting. These are protocol extensions added to our firmwares: https://github.com/NabuCasa/silabs-firmware-builder/blob/main/src/zigbee_ncp/app.c

Your hacked Lidl gateway doesn't support them but it seems like it's still responding with an unknown protocol. I think the error checking for detecting XNCP protocol features needs to include KeyError as well.

@puddly
Copy link
Contributor

puddly commented Dec 5, 2024

I've made an alternative PR that I think should fix the problem. Could you try it out? #657

@zeyus
Copy link
Author

zeyus commented Dec 5, 2024

I've made an alternative PR that I think should fix the problem. Could you try it out? #657

I will give it a go now and report back, thanks

@zeyus
Copy link
Author

zeyus commented Dec 5, 2024

Interesting. These are protocol extensions added to our firmwares: https://github.com/NabuCasa/silabs-firmware-builder/blob/main/src/zigbee_ncp/app.c

Your hacked Lidl gateway doesn't support them but it seems like it's still responding with an unknown protocol. I think the error checking for detecting XNCP protocol features needs to include KeyError as well.

Ah, so if I understand correctly then, the LIDL gateway is just basically sending the same last bit back, but without meaning...and the second bit being a 1 could just be an error code or status?

@zeyus
Copy link
Author

zeyus commented Dec 5, 2024

I've made an alternative PR that I think should fix the problem. Could you try it out? #657

I will give it a go now and report back, thanks

I will have to try a little later, there's something that will time out in 30 minutes after I can restart HA with the patched code :)

@zeyus
Copy link
Author

zeyus commented Dec 5, 2024

I've made an alternative PR that I think should fix the problem. Could you try it out? #657

Yep that works perfectly, thanks! :)

@zeyus zeyus closed this Dec 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants