Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lwip: provide sock_async support #13427

Merged
merged 8 commits into from
Mar 6, 2020
Merged

Conversation

miri64
Copy link
Member

@miri64 miri64 commented Feb 20, 2020

Contribution description

This provides initial sock_async support for lwIP. Currently only sock_udp and sock_ip are supported, sock_tcp needs some extra work, I don't have the time for right now. (edit: was a bit tricky but not that hard after all ;-)) The change requires an update to upstream, which I provided in this PR as a patch.

Testing procedure

I updated tests/lwip to use sock_async_event for the udp server and ip server [edit: and the tcp server] command, so running make -C tests/lwip all test on native should still work.

Issues/PRs references

None.

@miri64 miri64 added Area: network Area: Networking Type: new feature The issue requests / The PR implemements a new feature for RIOT Area: pkg Area: External package ports labels Feb 20, 2020
@miri64 miri64 added this to the Release 2020.04 milestone Feb 20, 2020
@miri64 miri64 changed the title Lwip/enh/sock async lwip: provide sock_async support for IP and UDP Feb 20, 2020
@@ -98,7 +98,7 @@ void sock_tcp_queue_event_init(sock_tcp_queue_t *queue,
event_queue_t *ev_queue,
sock_tcp_queue_cb_t handler)
{
sock_async_ctx_t *ctx = sock_tcp_queue_get_async_ctx(sock);
sock_async_ctx_t *ctx = sock_tcp_queue_get_async_ctx(queue);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found this bug while trying to integrate sock_async with sock_tcp. Hope nobody minds I piggy-back this bug fix here. It is in a separate commit, so I can always factor it out if not.

@miri64 miri64 added the CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR label Feb 20, 2020
@miri64
Copy link
Member Author

miri64 commented Feb 20, 2020

I changed the patch to be a cherry-pick of a feature that was introduced in lwIP's current master. See https://savannah.nongnu.org/patch/?9891#comment1

@gschorcht
Copy link
Contributor

I have tested it with esp-wifi and UDP. Still works.

@miri64 miri64 changed the title lwip: provide sock_async support for IP and UDP lwip: provide sock_async support Feb 21, 2020
@miri64
Copy link
Member Author

miri64 commented Feb 21, 2020

I came around implementing it for TCP after all. Tests should now also work for TCP. In a later step I might even rework the tests a bit to reflect their new asynchronous nature better (e.g. by providing a way to stop the running server, provide support for more than one connection with TCP... etc)

@gschorcht
Copy link
Contributor

I came around implementing it for TCP after all. Tests should now also work for TCP. In a later step I might even rework the tests a bit to reflect their new asynchronous nature better (e.g. by providing a way to stop the running server, provide support for more than one connection with TCP... etc)

Behavior for TCP has changed with this PR. After sending something with nc to the TCP server, the TCP server resets the connection after receiving the first data packet and nc can't send anymore.

> TCP client [2003:c1:e703:555:a89d:ce2:fdef:2bc4]:37586 connected
Received TCP data from client [2003:c1:e703:555:a89d:ce2:fdef:2bc4]:37586:
00000000  48  61  6C  6C  6F  0A
TCP connection to [2003:c1:e703:555:a89d:ce2:fdef:2bc4]:37586 reset

In master, I had to leave nc before the TCP connection is released:

> TCP client [2003:c1:e703:555:a89d:ce2:fdef:2bc4]:36896 connected
Received TCP data from client [2003:c1:e703:555:a89d:ce2:fdef:2bc4]:36896:
00000000  48  61  6C  6C  6F  0A
Received TCP data from client [2003:c1:e703:555:a89d:ce2:fdef:2bc4]:36896:
00000000  48  61  6C  6C  6F  0A
TCP connection to [2003:c1:e703:555:a89d:ce2:fdef:2bc4]:36896 reset, starting to accept again

@miri64
Copy link
Member Author

miri64 commented Feb 21, 2020

Mhhh curious. For me the behavior (with native) was as you described for the master case was the same when testing. Which netcat version are you using (I use the openbsd one, version 1.206_1-1)? Also which command did you use exactly?

@gschorcht
Copy link
Contributor

gschorcht commented Feb 21, 2020

I'm using the official Ubuntu 18.04 netcat version 1.187. The command I used:

nc -6 <prefix>:32ae:a4ff:fe41:60f8 5555

With wireshark I can observe the following:

PC            ESP8266
 ------ SYN ----->
 <--- SYN/ACK ----
 ------ ACK ----->
 ---- PSH/ACK --->  (contains 'Hallo' as payload)
 <--- FIN/ACK ----  (reaction from TCP server, an ACK but also a FIN)
 ------ ACK ----->

In master, I can observe:

PC            ESP8266
 ------ SYN ----->
 <--- SYN/ACK ----
 ------ ACK ----->
 ---- PSH/ACK --->  (contains 'Hallo' as payload)
 <----- ACK ------  (reaction from TCP server, just an ACK)

@miri64
Copy link
Member Author

miri64 commented Feb 21, 2020

Can you please revert the test change (53e6a81 currently) and try again? I'm trying to figure out if this happens due to my changes to the test or lwip_sock* (I guess the first, as none of the changes should effect the implementation of lwIP, but who knows).

@gschorcht
Copy link
Contributor

Can you please revert the test change (53e6a81 currently) and try again? I'm trying to figure out if this happens due to my changes to the test or lwip_sock* (I guess the first, as none of the changes should effect the implementation of lwIP, but who knows).

Works without commit 53e6a81.

@miri64
Copy link
Member Author

miri64 commented Feb 21, 2020

Ok, I was able to reproduce it now. Trying to find a fix.

@miri64 miri64 removed the CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR label Feb 21, 2020
@miri64
Copy link
Member Author

miri64 commented Feb 21, 2020

Rebased to current master to incorporate #13438. Trying to fix the faulty behavior mentioned in #13438 (comment) now.

@miri64
Copy link
Member Author

miri64 commented Feb 21, 2020

Trying to fix the faulty behavior mentioned in #13438 (comment) now.

It turned out to be an error in the sock_tcp implementation itself: when the queue of available sock objects runs full, the netconn facing lwIP is never closed (keeping the TCP connection open in the background). The state machine then gets confused, having conns with socks and conns without (all however use the _netconn_cb as it is inherited from the listening conn). I piggy-backed two changes to prevent that:

  • close all unused netconn objects when the queue is full on sock_tcp_accept()
  • activate the TCP_LISTEN_BACKLOG module per default. It doesn't prevent netconns being accepted when the queue is full (as the netconn's internal accept queue could just have been emptied by sock_tcp_accept() e.g., but it at least prevents lwip to waste its resources a little bit.

@miri64 miri64 added the CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR label Feb 21, 2020
@miri64
Copy link
Member Author

miri64 commented Feb 21, 2020

I also added a regression test for this bug now... If you want I can also move the bug fix to a separate PR. Not sure how easy it is to reproduce without sock_async though :-/

Comment on lines +293 to +298
try:
connect_addr = socket.getaddrinfo(
"%s%%tapbr0" % server_ip, port)[0][4]
except socket.gaierror as e:
print("SKIP_TEST INFO", e)
return
Copy link
Member Author

@miri64 miri64 Feb 21, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These tests need to be adapted for riotnode (#10949) once that is merged anyway, so I hope you don't mind tapbr0 being statically used here (and expect the boards to be native for the test not to be skipped above) and the test being skipped instead as a short workaround. Otherwise I would need a third "board" just for this test.

@kb2ma
Copy link
Member

kb2ma commented Feb 26, 2020

Preliminary testing with gcoap works on an ESP8266. See my gcoap/lwip_async_support branch, which includes IPv4 support.

2020-02-25 22:45:45,431 #  coap get -c 192.168.1.186 5683 /time
2020-02-25 22:45:45,436 # gcoap_cli: sending msg ID 34830, 11 bytes
> 2020-02-25 22:45:45,443 #  gcoap: response Success, code 2.05, 15 bytes
2020-02-25 22:45:45,443 # Feb 26 03:45:45

The down side is that the gcoap thread uses 1596 bytes for IPv4 and 1652 for IPv6 on ESP8266, according to ps.

@gschorcht
Copy link
Contributor

The down side is that the gcoap thread uses 1596 bytes for IPv4 and 1652 for IPv6 on ESP8266, according to ps.

Unless I missed something, the stack size is derived from the default stack size as defined by the CPU, which is already 1024 bytes:

#define GCOAP_STACK_SIZE (THREAD_STACKSIZE_DEFAULT + DEBUG_EXTRA_STACKSIZE \
                          + sizeof(coap_pkt_t)

The question is how much stack is really needed. Maybe the stack size could be fine tuned a little bit for gcoap. Have you checked with DEVELHELP=1 what the stack usage for the gcoap thread is?

@kb2ma
Copy link
Member

kb2ma commented Feb 26, 2020

Have you checked with DEVELHELP=1 what the stack usage for the gcoap thread is?

Thanks for the feedback. The numbers I quoted above are actual stack usage with DEVELHELP enabled. By default the gcoap thread is allocated 1108 bytes (1024 + 84) on ESP8266. I will pull together a more detailed report to review the default value.

@gschorcht
Copy link
Contributor

If you want I can also move the bug fix to a separate PR.

No, that's fine with me.

@gschorcht
Copy link
Contributor

It turned out to be an error in the sock_tcp implementation itself: when the queue of available sock objects runs full, the netconn facing lwIP is never closed (keeping the TCP connection open in the background). The state machine then gets confused, having conns with socks and conns without (all however use the _netconn_cb as it is inherited from the listening conn). I piggy-backed two changes to prevent that:

  • close all unused netconn objects when the queue is full on sock_tcp_accept()
  • activate the TCP_LISTEN_BACKLOG module per default. It doesn't prevent netconns being accepted when the queue is full (as the netconn's internal accept queue could just have been emptied by sock_tcp_accept() e.g., but it at least prevents lwip to waste its resources a little bit.

Works now as expected.

@miri64
Copy link
Member Author

miri64 commented Mar 6, 2020

Ping @gschorcht @kb2ma?

Copy link
Contributor

@gschorcht gschorcht left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I lost track of that PR. Looks good and works as expected. ACK

@gschorcht
Copy link
Contributor

@miri64 Please squash.

@miri64
Copy link
Member Author

miri64 commented Mar 6, 2020

Squashed

@gschorcht gschorcht merged commit fc37d7b into RIOT-OS:master Mar 6, 2020
@gschorcht
Copy link
Contributor

@miri64 Many thanks for this contribution.

@miri64 miri64 deleted the lwip/enh/sock_async branch March 6, 2020 14:02
@miri64
Copy link
Member Author

miri64 commented Mar 6, 2020

Thanks for the review! :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area: network Area: Networking Area: pkg Area: External package ports CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR Type: new feature The issue requests / The PR implemements a new feature for RIOT
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants