Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ieee802154/at86rf2xx: port to Radio HAL #19015

Open
wants to merge 11 commits into
base: master
Choose a base branch
from

Conversation

jia200x
Copy link
Member

@jia200x jia200x commented Dec 5, 2022

Contribution description

This PR is the third (and hopefully last) attempt to port at86rf2xx series to the Radio HAL.
This one should work with Basic Mode and all the memory-mapped variants (although I don't have one to test... @maribu?)

Since discussions during the breakout sessions in the RIOT summit lead to the deprecation of the generic Bottom Half Processor modules in favour of Event Thread like BHP, I just implemented it like that in this PR. I provide two init functions now:

  1. a "bare" init (at86rf2xx_init) that receives a callback, which indicates IRQ offload through a at86rf2xx_irq_handler function.
  2. A event based init (at86rf2xx_init_event) that offloads IRQs to a given event thread queue.

So far we have always assumed the IRQ offloading runs in the same thread context as the driver, which leads to undesired effects. Unless the driver is explicitly designed to run in "polling mode", IMO this pattern only causes headaches and shows little benefits compared to just processing IRQ as they are supposed to be. In fact, this facilitates upper layer code with snippets like:

ieee802154_request_cca(dev);
mutex_lock(&dev);
/* Wait for the unlock */
ieee802154_confirm_cca(dev);

Which are of course not possible if the IRQ offload runs in the same context. And note that many radios are memory mapped or have a dedicated SPI for the radio, which means this does not necessarily translates to an extra thread. Even if we need an extra thread, it can be tiny, since it only executes a couple of functions to retrieve the IRQ.
This is also beneficial for dual-radios (e.g at86rf215) which currently uses 2 threads for offloading. With this approach, it would need only one.
Note that given the proposed intiialization architecture, it's still possible to run everything on a single thread, at the cost of complex logic (e.g re-schedule packet transmissions if there are pending IRQs).

Testing procedure

The following test/suites should work:

  • GNRC
  • LWIP
  • tests/driver_at86rf2xx*
  • tests/pkg_openwsn`

I tested GNRC, LWIP, driver_at86rf2xx*. The radio seems to be running stable:

ping fe80::7894:5ff:258d:e7c3 -s 1024 -i 200 -c 10000
--- fe80::7894:5ff:258d:e7c3 PING statistics ---
10000 packets transmitted, 10000 packets received, 0% packet loss
round-trip min/avg/max = 128.933/144.578/179.526 ms

Issues/PRs references

Depends on #18988

@github-actions github-actions bot added Area: doc Area: Documentation Area: drivers Area: Device drivers Area: network Area: Networking Area: pkg Area: External package ports Area: sys Area: System Area: tests Area: tests and testing framework Area: tools Area: Supplementary tools labels Dec 5, 2022
@jia200x
Copy link
Member Author

jia200x commented Dec 5, 2022

Some input would be appreciated, specially from @miri64, @maribu, @benpicco, @fabian18, @PeterKietzmann and @MichelRottleuthner

@jia200x jia200x added the State: waiting for other PR State: The PR requires another PR to be merged first label Dec 5, 2022
@jia200x
Copy link
Member Author

jia200x commented Dec 5, 2022

After some online discussions, note that most integrated transceivers have a dedicated SPI bus, which means spi_acquire would never block if implemented properly. Therefore, SPI instructions could be triggered from ISR, without needing an extra thread. Of course it's still not a good idea with the current SPI implementations that polls, but I guess that could be addressed as well...

@github-actions github-actions bot removed the Area: tools Area: Supplementary tools label Dec 7, 2022
@benpicco benpicco removed the State: waiting for other PR State: The PR requires another PR to be merged first label Dec 7, 2022
@jia200x
Copy link
Member Author

jia200x commented Dec 8, 2022

there was a rebase error I just fixed. I hope it passes all static tests now

@jia200x
Copy link
Member Author

jia200x commented Feb 23, 2023

I squashed and rebased. I merged many of the adaptation commits (GNRC, openwsn) to satisfy git bisect users. For the ieee802154_hal and ieeee802154_submac tests I just added new commits because they are not affected by the removal of netdev.

@benpicco benpicco added CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR and removed CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR labels Feb 23, 2023
@jia200x
Copy link
Member Author

jia200x commented Feb 24, 2023

Everything seems to be fine now.

@benpicco
Copy link
Contributor

I did some tests and managed to produce a case where the device gets stuck:

2023-02-24 14:34:41,863 - INFO # > ping ff02::1
2023-02-24 14:34:41,863 - INFO # 
2023-02-24 14:34:41,880 - INFO # 12 bytes from fe80::ac8d:fee1:60ed:99c5%8: icmp_seq=0 ttl=64 rssi=36 dBm time=9.409 ms
2023-02-24 14:34:42,880 - INFO # 12 bytes from fe80::ac8d:fee1:60ed:99c5%8: icmp_seq=1 ttl=64 rssi=36 dBm time=7.518 ms
2023-02-24 14:34:43,885 - INFO # 12 bytes from fe80::ac8d:fee1:60ed:99c5%8: icmp_seq=2 ttl=64 rssi=36 dBm time=10.516 ms
2023-02-24 14:34:43,886 - INFO # 
2023-02-24 14:34:43,888 - INFO # --- ff02::1 PING statistics ---
2023-02-24 14:34:43,894 - INFO # 3 packets transmitted, 3 packets received, 0% packet loss
2023-02-24 14:34:43,898 - INFO # round-trip min/avg/max = 7.518/9.147/10.516 ms

2023-02-24 14:35:09,115 - INFO # > ping -s 100 -i 10 -c 15 fe80::ac8d:fee1:60ed:99c5%8
2023-02-24 14:35:09,116 - INFO # 
2023-02-24 14:35:09,201 - INFO # gnrc_netif: can't queue packet for sending
2023-02-24 14:35:09,205 - INFO # gnrc_netif: can't queue packet for sending
2023-02-24 14:35:09,211 - INFO # gnrc_netif: can't queue packet for sending
2023-02-24 14:35:09,215 - INFO # gnrc_netif: can't queue packet for sending
2023-02-24 14:35:09,219 - INFO # gnrc_netif: can't queue packet for sending
2023-02-24 14:35:09,223 - INFO # gnrc_netif: can't queue packet for sending
2023-02-24 14:35:09,227 - INFO # gnrc_netif: can't queue packet for sending
2023-02-24 14:35:09,231 - INFO # gnrc_netif: can't queue packet for sending
2023-02-24 14:35:09,235 - INFO # gnrc_netif: can't queue packet for sending
2023-02-24 14:35:09,239 - INFO # gnrc_netif: can't queue packet for sending
2023-02-24 14:35:09,244 - INFO # gnrc_netif: can't queue packet for sending
2023-02-24 14:35:09,248 - INFO # gnrc_netif: can't queue packet for sending
2023-02-24 14:35:09,253 - INFO # gnrc_netif: can't queue packet for sending
2023-02-24 14:35:09,257 - INFO # gnrc_netif: can't queue packet for sending
2023-02-24 14:35:09,261 - INFO # gnrc_netif: can't queue packet for sending
2023-02-24 14:35:09,265 - INFO # gnrc_netif: can't queue packet for sending
2023-02-24 14:35:09,270 - INFO # gnrc_netif: can't queue packet for sending
2023-02-24 14:35:09,273 - INFO # gnrc_netif: can't queue packet for sending
2023-02-24 14:35:09,277 - INFO # gnrc_netif: can't queue packet for sending
2023-02-24 14:35:09,281 - INFO # gnrc_netif: can't queue packet for sending
2023-02-24 14:35:09,286 - INFO # gnrc_netif: can't queue packet for sending
2023-02-24 14:35:09,289 - INFO # gnrc_netif: can't queue packet for sending
2023-02-24 14:35:09,294 - INFO # gnrc_netif: can't queue packet for sending
2023-02-24 14:35:09,298 - INFO # gnrc_netif: can't queue packet for sending
2023-02-24 14:35:10,284 - INFO # 
2023-02-24 14:35:10,288 - INFO # --- fe80::ac8d:fee1:60ed:99c5%8 PING statistics ---
2023-02-24 14:35:10,293 - INFO # 15 packets transmitted, 0 packets received, 100% packet loss

2023-02-24 14:35:21,863 - INFO # > ping ff02::1
2023-02-24 14:35:21,863 - INFO # 
2023-02-24 14:35:24,867 - INFO # 
2023-02-24 14:35:24,870 - INFO # --- ff02::1 PING statistics ---
2023-02-24 14:35:24,875 - INFO # 3 packets transmitted, 0 packets received, 100% packet loss

If I ping the device from another node, it gets 'unstuck' again:

# other node
2023-02-24 14:36:01,923 # ping ff02::1
2023-02-24 14:36:01,941 # 12 bytes from fe80::204:2519:1801:c905%7: icmp_seq=0 ttl=64 rssi=-47 dBm time=9.397 ms
2023-02-24 14:36:02,934 # 12 bytes from fe80::204:2519:1801:c905%7: icmp_seq=1 ttl=64 rssi=-48 dBm time=7.796 ms
2023-02-24 14:36:03,941 # 12 bytes from fe80::204:2519:1801:c905%7: icmp_seq=2 ttl=64 rssi=-49 dBm time=7.795 ms
2023-02-24 14:36:03,941 # 
2023-02-24 14:36:03,942 # --- ff02::1 PING statistics ---
2023-02-24 14:36:03,944 # 3 packets transmitted, 3 packets received, 0% packet loss
2023-02-24 14:36:03,955 # round-trip min/avg/max = 7.795/8.329/9.397 ms

# samr21-xpro

2023-02-24 14:36:23,135 - INFO # > ping ff02::1
2023-02-24 14:36:23,135 - INFO # 
2023-02-24 14:36:23,151 - INFO # 12 bytes from fe80::ac8d:fee1:60ed:99c5%8: icmp_seq=0 ttl=64 rssi=36 dBm time=8.129 ms
2023-02-24 14:36:24,152 - INFO # 12 bytes from fe80::ac8d:fee1:60ed:99c5%8: icmp_seq=1 ttl=64 rssi=37 dBm time=7.810 ms
2023-02-24 14:36:25,153 - INFO # 12 bytes from fe80::ac8d:fee1:60ed:99c5%8: icmp_seq=2 ttl=64 rssi=39 dBm time=7.503 ms
2023-02-24 14:36:25,154 - INFO # 
2023-02-24 14:36:25,156 - INFO # --- ff02::1 PING statistics ---
2023-02-24 14:36:25,162 - INFO # 3 packets transmitted, 3 packets received, 0% packet loss
2023-02-24 14:36:25,166 - INFO # round-trip min/avg/max = 7.503/7.814/8.129 ms

Now this might just be another instance of #17924

@benpicco
Copy link
Contributor

benpicco commented Feb 24, 2023

Oops, never mind - this is actually the other device (at86rf215) getting stuck 😳
A reboot or send a packet there gets it unstuck.

ifneq (,$(filter netdev,$(USEMODULE)))
USEMODULE += netdev_ieee802154_submac
DEFAULT_MODULE += netdev_ieee802154_oqpsk
endif

# only needed for SPI based variants
ifeq (,$(filter at86rfa1 at86rfr2,$(USEMODULE)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
ifeq (,$(filter at86rfa1 at86rfr2,$(USEMODULE)))
ifeq (,$(filter at86rfa1 at86rfr2,$(USEMODULE)))
ifneq (,$(filter ieee802154_security,$(USEMODULE)))
USEMODULE += at86rf2xx_aes_spi
endif

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would keep it optional whether to use hardware offloading. IIRC pings were actually faster without.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's probably due to all the spi_aquire() - we should move those to the entry points of rf_ops instead of doing it for every register read, but that's for a follow-up.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and at86rf230 does not have this feature

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should I then just leave it as it is? or catch the at86rf230 case?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, we would not need cipher_modes if we know there is only at86rf2xx in use and for example not an additional at86rf215 or other 802.15.4 transceiver (I know its probably an uncommon case). To gain more from it we should test if we need cipher_ops or at86rf2xx_aes_spi.

USEMODULE += ieee802154_security

   text    data     bss     dec     hex filename
 106496     272   22328  129096   1f848 /home/fabian/forks/RIOT/examples/gnrc_networking/bin/miot-nucleo-f767zi/gnrc_networking.elf

ping -s 120 -c 50 fe80::4c49:6f54:4f76:102%7

2023-02-24 19:41:44,944 # --- fe80::4c49:6f54:4f76:102%7 PING statistics ---
2023-02-24 19:41:44,950 # 50 packets transmitted, 50 packets received, 0% packet loss
2023-02-24 19:41:44,953 # round-trip min/avg/max = 22.881/24.924/28.171 ms
USEMODULE += "ieee802154_security at86rf2xx_aes_spi"

   text    data     bss     dec     hex filename
 107328     272   22328  129928   1fb88 /home/fabian/forks/RIOT/examples/gnrc_networking/bin/miot-nucleo-f767zi/gnrc_networking.elf

ping -s 120 -c 50 fe80::4c49:6f54:4f76:102%7

2023-02-24 19:45:36,758 # --- fe80::4c49:6f54:4f76:102%7 PING statistics ---
2023-02-24 19:45:36,764 # 50 packets transmitted, 50 packets received, 0% packet loss
2023-02-24 19:45:36,768 # round-trip min/avg/max = 26.763/29.079/31.253 ms

Copy link
Contributor

@benpicco benpicco Feb 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably introduce a netdev_ieee802154_submac_soft_cipher (netdev_ieee802154_submac_sw_cipher?) (analogous to the proposed netdev_ieee802154_submac_soft_retrans) that drivers without HW crypto can then select.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. If we have that the right modules are selected dynamically base on the transceivers.
But changing the dependency of at86rf2xx_aes_spi in this PR is not really advisable.
Or would you still do it here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no strong feelings about this, I was just surprised that it was not build in.
That's probably also why @jia200x didn't notice that the current version crashes when at86rf2xx_aes_spi is enabled, because when you want to test ieee802154_security it just works - because the HW acceleration is not used.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

then should I just leave it as it is for now? We can address that in a follow up.

drivers/at86rf2xx/at86rf2xx_rf_ops.c Outdated Show resolved Hide resolved
drivers/at86rf2xx/at86rf2xx_rf_ops.c Outdated Show resolved Hide resolved
drivers/at86rf2xx/at86rf2xx_rf_ops.c Outdated Show resolved Hide resolved
tests/driver_at86rf2xx_aes/main.c Show resolved Hide resolved
tests/ieee802154_hal/init_devs.c Outdated Show resolved Hide resolved
drivers/at86rf2xx/at86rf2xx_rf_ops.c Outdated Show resolved Hide resolved
const ieee802154_radio_cipher_ops_t *cipher_ops;
if ((cipher_ops = ieee802154_radio_get_cipher_ops(&submac->dev))) {
netdev_ieee802154->sec_ctx.dev.cipher_ops = cipher_ops;
netdev_ieee802154->sec_ctx.dev.ctx = &submac->dev;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since it was basically submac->dev.priv before, benpiccos recent suggestions would be necessary.

@jia200x
Copy link
Member Author

jia200x commented Feb 24, 2023

Or do you want to do this with deprecation of at86rf2xx_cca() at the same time?

It's actually not too hard to enable the CCA Done interrupt. I can give it a try.

drivers/include/at86rf2xx.h Outdated Show resolved Hide resolved
sys/include/net/ieee802154/radio.h Show resolved Hide resolved
@@ -195,6 +195,9 @@ static ieee802154_dev_t *_reg_callback(ieee802154_dev_type_t type, void *opaque)
case IEEE802154_DEV_TYPE_MRF24J40:
printf("mrf24j40");
break;
case IEEE802154_DEV_TYPE_AT86RF2XX:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to run the test with CONTINUE_ON_EXPECTED_ERRORS=1 USEMODULE+=at86rf233 BOARD=nucleo-f767zi.

I get linking errors:

/usr/lib/gcc/arm-none-eabi/12.2.0/../../../../arm-none-eabi/bin/ld: /home/fabian/forks/RIOT/tests/ieee802154_submac/bin/nucleo-f767zi/netdev_ieee802154_submac/netdev_ieee802154_submac.o: in function `ieee802154_submac_bh_request':
/home/fabian/forks/RIOT/drivers/netdev_ieee802154_submac/netdev_ieee802154_submac.c:133: multiple definition of `ieee802154_submac_bh_request'; /home/fabian/forks/RIOT/tests/ieee802154_submac/bin/nucleo-f767zi/application_tests_ieee802154_submac/main.o:/home/fabian/forks/RIOT/tests/ieee802154_submac/main.c:174: first defined here
/usr/lib/gcc/arm-none-eabi/12.2.0/../../../../arm-none-eabi/bin/ld: /home/fabian/forks/RIOT/tests/ieee802154_submac/bin/nucleo-f767zi/netdev_ieee802154_submac/netdev_ieee802154_submac.o: in function `ieee802154_submac_ack_timer_set':
/home/fabian/forks/RIOT/drivers/netdev_ieee802154_submac/netdev_ieee802154_submac.c:144: multiple definition of `ieee802154_submac_ack_timer_set'; /home/fabian/forks/RIOT/tests/ieee802154_submac/bin/nucleo-f767zi/application_tests_ieee802154_submac/main.o:/home/fabian/forks/RIOT/tests/ieee802154_submac/main.c:134: first defined here
/usr/lib/gcc/arm-none-eabi/12.2.0/../../../../arm-none-eabi/bin/ld: /home/fabian/forks/RIOT/tests/ieee802154_submac/bin/nucleo-f767zi/netdev_ieee802154_submac/netdev_ieee802154_submac.o: in function `ieee802154_submac_ack_timer_cancel':
/home/fabian/forks/RIOT/drivers/netdev_ieee802154_submac/netdev_ieee802154_submac.c:153: multiple definition of `ieee802154_submac_ack_timer_cancel'; /home/fabian/forks/RIOT/tests/ieee802154_submac/bin/nucleo-f767zi/application_tests_ieee802154_submac/main.o:/home/fabian/forks/RIOT/tests/ieee802154_submac/main.c:138: first defined here
collect2: error: ld returned 1 exit status
make: *** [/home/fabian/forks/RIOT/tests/ieee802154_submac/../../Makefile.include:740: /home/fabian/forks/RIOT/tests/ieee802154_submac/bin/nucleo-f767zi/tests_ieee802154_submac.elf] Fehler 1

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that board pulls an netdev_eth, which selects netdev. As a result, the build system pulls netdev_ieee802154_submac, which shouldn't be there...

The proper fix for this is #19052. I will try to come up with a workaround

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is hard to fix as it is now :/. How should we proceed? Should we try to get #19052?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any opinions? @benpicco @fabian18

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a blocker for this PR?
It only happens when manually adding the module to the test, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, just because we cannot provide netdev and the radio HAL at the same time for IEEE 802.15.4 devices

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is the border router going to work then?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmmm wait, it's definitely possible to run the nrf52840 with a border router... I think it hast to do with the way this driver is modeled.
It's still not possible to run the HAL and netdev_ieee802154 variants at the same time, but it shouldn't affect the BR. I will give it a look.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, the reason is because the submac callbacks are just C functions that should be defined somewhere. This would only happen with this test application (as netdev_ieee802154_submac already implements these callbacks and therefore there won't be duplicated symbols).

So yes, this only occurs in tests/ieee802154_submac

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if it is a real solution but at least it links fine with DISABLE_MODULE += netdev_ieee802154_submac in tests/ieee802154_submac/Makefile

@benpicco
Copy link
Contributor

This needs a rebase - please squash while you’re at it

@benpicco
Copy link
Contributor

@jia200x ping ;)

@jia200x
Copy link
Member Author

jia200x commented May 30, 2023

It's actually not too hard to enable the CCA Done interrupt. I can give it a try.

Could we maybe do this in a follow up? The existing driver also didn't implement the CCA_DONE interrupt and just did polling.
I think it's definitely nice to have, but since I don't have too much time in the moment I would like to get this one merged as soon as possible.

@@ -52,22 +56,24 @@ void auto_init_at86rf2xx(void)
for (unsigned i = 0; i < AT86RF2XX_NUM; i++) {
LOG_DEBUG("[auto_init_netif] initializing at86rf2xx #%u\n", i);

at86rf2xx_setup(&at86rf2xx_devs[i], &at86rf2xx_params[i], i);
at86rf2xx_init_event(&at86rf2xx_bhp[i], &at86rf2xx_params[i], &at86rf2xx_netdev[i].submac.dev, EVENT_PRIO_HIGHEST);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should check for the return value here, to avoid a crash when the driver init fails, in case of a hardware fault or misconfiguration. For example:

        int init;
        if ((init = at86rf2xx_init_event(&at86rf2xx_bhp[i], &at86rf2xx_params[i],
                                         &at86rf2xx_netdev[i].submac.dev, EVENT_PRIO_HIGHEST))) {
            LOG_ERROR("at86rf2xx #%u init failed: %i\n", i, init);
            continue;
        }

Similar for lwip, openthread, openwsn, I think.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area: cpu Area: CPU/MCU ports Area: doc Area: Documentation Area: drivers Area: Device drivers Area: Kconfig Area: Kconfig integration Area: network Area: Networking Area: pkg Area: External package ports Area: sys Area: System Area: tests Area: tests and testing framework CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR Platform: ARM Platform: This PR/issue effects ARM-based platforms
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants