Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

xrdp has very low bandwidth on high latency connections that can be mitigated via SSH tunnel #2905

Closed
vincent-163 opened this issue Jan 5, 2024 · 9 comments · Fixed by #2910
Labels

Comments

@vincent-163
Copy link

xrdp version

0.9.23.1

Detailed xrdp version, build options

xrdp 0.9.23.1
  A Remote Desktop Protocol Server.
  Copyright (C) 2004-2020 Jay Sorg, Neutrino Labs, and all contributors.                                                                                                           See https://github.com/neutrinolabs/xrdp for more information.

  Configure options:
      --prefix=/usr                                                                                                                                                      [0/1157]      --sysconfdir=/etc
      --localstatedir=/var
      --sbindir=/usr/bin
      --with-systemdsystemunitdir=/usr/lib/systemd/system
      --enable-jpeg
      --enable-tjpeg
      --enable-fuse
      --enable-fdkaac
      --enable-opus
      --enable-rfxcodec
      --enable-mp3lame
      --enable-pixman
      --enable-painter
      --enable-vsock
      --enable-ipv6
      --enable-pam-config=arch
      --enable-rdpsndaudin
      --with-imlib2
      CFLAGS=-march=x86-64 -mtune=generic -O2 -pipe -fno-plt -fexceptions         -Wp,-D_FORTIFY_SOURCE=2 -Wformat -Werror=format-security         -fstack-clash-protection -fcf-protection
      LDFLAGS=-Wl,-O1,--sort-common,--as-needed,-z,relro,-z,now

  Compiled with OpenSSL 3.2.0 23 Nov 2023

Operating system & version

Arch Linux 20240101.0.204074

Installation method

git clone & make install

Which backend do you use?

xorgxrdp-0.9.19

What desktop environment do you use?

any

Environment xrdp running on

A systemd-nspawn container

What's your client?

Windows 11 mstsc

Area(s) with issue?

Network

Steps to reproduce

  1. On a server with high latency to the client (~250ms) but enough bandwidth (at least 100Mbps), set up an XRDP server. Here is my dockerfile for setting up the xrdp server. The built docker image is then exported to /var/lib/machines/xrdp and booted using systemd-nspawn -M xrdp -b:
FROM archlinux:latest
RUN pacman -Syu --needed --noconfirm --noprogressbar git base base-devel sudo archlinux-keyring
RUN useradd -ms /bin/bash user && groupadd sudo && usermod -aG sudo user && \
    echo '%sudo ALL=(ALL:ALL) NOPASSWD: ALL' > /etc/sudoers.d/50-sudo-nopasswd
RUN printf '[archlinuxcn]\nServer = https://repo.archlinuxcn.org/$arch\n' >> /etc/pacman.conf && \
    pacman-key --init && \
    pacman-key --lsign-key "farseerfc@archlinux.org" && \
    pacman -Syu --noconfirm --noprogressbar archlinuxcn-keyring && \
    pacman -Syu --noconfirm --noprogressbar yay
RUN pacman -Syu --noconfirm --noprogressbar plasma-desktop xorg-server
RUN sudo -u user yay -S aur/xrdp aur/xorgxrdp --noconfirm --noprogressbar
RUN sed -i s%param=Xorg%param=/usr/lib/Xorg% /etc/xrdp/sesman.ini
RUN echo 'startplasma-x11' > /home/user/.xinitrc

Use machinectl shell xrdp to get into the container and do systemctl start xrdp to run it. I'm running it in a systemd-nspawn container but the exact way of running xrdp probably doesn't matter.

  1. Use msrtc to connect to the server. Install google chrome in the container and play a video. It's very laggy, and the bandwidth is around 1.3Mbps.

  2. Use SSH to connect to the server and set up a tunnel with -L 33890:127.0.0.1:3389. Then connect to 127.0.0.1:33890. You get the same desktop but the bandwidth goes up to 80Mbps and the video plays smoothly.

✔️ Expected Behavior

Direct msrtc connection should work at least as smoothly as the connection via SSH tunnel, and use up 80Mbps of bandwidth available to xrdp.

❌ Actual Behavior

Direct msrtc connection uses only 1.3Mbps, a tiny fraction of what is possible.

Anything else?

I switched net.ipv4.tcp_congestion_control between cubic and bbr and the behavior doesn't change, suggesting it's not a problem with the congestion algorithm used.

@vincent-163 vincent-163 added the bug label Jan 5, 2024
@matt335672
Copy link
Member

Hi @vincent-163

An interesting problem, and an interesting network connection. BTW, thanks for taking the time to fill in the report form properly. It saves a lot of time for both of us.

A slow connection like this would be down to two things I could think of

  1. client-server communication being dependent on a lot of round-trip messages. If this was the case, using ssh over the same route wouldn't improve the situation anyway.
  2. Inadequate buffer sizes, resulting in a closed TCP window while waiting for ACKs from the other end.

Since you've been playing with congestion algorithms, I'm guessing you're aware of these possibilities.

Here's a couple of questions for you:-

  1. Is there any chance the ssh connection is taking a different route which would mean that it isn't being subjected to the same latency? Things that could cause that might be policy-based routing, or even the ssh connection using IPv6.
  2. Is your 250ms a round-trip-time or a one way latency? We can calculate some buffer sizes once we have this information based on your figures above.

Thanks.

@vincent-163
Copy link
Author

vincent-163 commented Jan 5, 2024

Hello! Thanks for the reply.

  1. The connection is indeed a bit special in that it's from China to Europe, which potentially passes through GFW in China, which may alter the connection depending on the protocol. Also both IPv4 and IPv6 is available for the server. Here is some more detail for the scenarios that I've tested:
  • Direct connect via IPv4 port 3389: low bandwidth
  • SSH tunnel via IPv4 port 22: high bandwidth
  • Direct connect via IPv6 port 3389: low bandwidth
  • SSH tunnel via IPv6 port 22: high bandwidth
  • Change RDP port from 3389 to 23 and direct connecting via IPv4 port 23: low bandwidth
  • SSH tunnel to another server close to xrdp (250ms, high latency) and then to xrdp server (1ms latency): high bandwidth
  • SSH tunnel to another server (250ms, high latency) and then to xrdp server (reliable network, 120ms latency): low bandwidth
  • SSH tunnel to another server close to client (low latency, passes GFW) and then to xrdp server via tcp (high latency, no GFW): low bandwidth
  • Wireguard tunnel to another server close to client (low latency, passes GFW) and then forwarded to xrdp server packet by packet (high latency, no GFW): low bandwidth
  • Wireguard tunnel to another server close to client (low latency, passes GFW) and then ssh tunnel to xrdp server (high latency): high bandwidth

With Wireguard it's not easy for GFW to distinguish directly between SSH and RDP. I think the trend is very clear: the bandwidth depends solely on the latency between the xrdp server and the direct TCP client it's connected to, and not on the nature of the network.

  1. All latency figures are round-trip time, i.e. the time shown by ping.

I'm creating a Windows server so I can test mstsc over a reliable connection. Do you need an xrdp server? I can create a new xrdp server in either US or Europe for you to test.

@vincent-163
Copy link
Author

vincent-163 commented Jan 5, 2024

Okay, I think I've reproduced the issue with a pair of distant servers on Hetzner Cloud. I installed a Windows server "A" in Europe and an xrdp server in U.S. "B", then connected to the Windows server "A" via mstsc and then connected to the xrdp server "B" via nested mstsc. It was slow. I installed ssh on "A" and connected to the xrdp server "B" via ssh tunnel, and it was fast. The network connection between "A" and "B" is known to be reliable (stable 120ms latency and over 1Gbps bandwidth) and I'm confident with its reproducibility.

@vincent-163
Copy link
Author

vincent-163 commented Jan 5, 2024

So I found a pair of configuration options in xrdp.ini called tcp_send_buffer_bytes and tcp_recv_buffer_bytes:

; set tcp send/recv buffer (for experts)
#tcp_send_buffer_bytes=32768
#tcp_recv_buffer_bytes=32768

I set both of them to 10485760. And in journalctl I get:

Jan 05 18:05:02 rdp xrdp[205]: [INFO ] setting send buffer to 10485760 bytes
Jan 05 18:05:02 rdp xrdp[205]: [INFO ] send buffer set to 425984 bytes
Jan 05 18:05:02 rdp xrdp[205]: [INFO ] setting recv buffer to 10485760 bytes
Jan 05 18:05:02 rdp xrdp[205]: [INFO ] recv buffer set to 425984 bytes

Then when I connect to this server I get 6.6Mbps. The theoretical bandwidth with 425984 bytes of window size and 250ms round-trip latency is 425984bytes/250ms=13.0Mbps. When the window size is 32768 bytes, the theoretical bandwidth is 1Mbps. This seems to explain the issue with bandwidth. The reason that the buffer is set to 425984 rather than 10485760 bytes seems to be caused by net.core.rmem_max and net.core.wmem_max which defaults to 212992 on my systems. The docs say that the actual value is double the set value.

A few things I don't understand:

  1. It seems that the window is only set when the corresponding configurations are set, so it does not really have a default. The default receive buffer is already 425984 bytes (my default value of net.core.{r,w}mem_default) so why would setting the buffer size matter?
  2. A mitigation that I've tried is to set sudo sysctl net.core.wmem_max=10485760. Together with the above configuration change, this one really helped! Now it's running at 100Mbps with no problems. But why would such a change be necessary for xrdp but not for SSH?

After thinking about point 2 more carefully I've come up with an explanation. The reason I can think of is pacing. Say you want to send 10MB but you have only 100KB of buffer and the window size is 5MB. Why can window size be bigger than the buffer size? Because the window size represents the amount of data in flight rather than the data to be sent, and it has to be sent smoothly and not in bursts. So rather than sending 5MB at once, you send 100KB, wait 1/50 roundtrip and then send another 100KB. And it's probably for the same reason that the receive buffer is limited to a very small size.

What an ordinary program would do in this case is to try to send 5MB first, finds that only 100KB is sent, waits until the socket becomes writable again and sends another 100KB. Why would a program behave as if the window size was only 100KB? It tried to send 5MB first, found only 100KB is sent, then probably waited until a confirmation until it sent another 100KB.

I'm not sure how xrdp is designed, but maybe you'll need a buffered implementation of sockets rather than relying on the kernel, especially when the socket buffer size is smaller than the window. While I have been able to mitigate the problem in this case, you probably don't want to expect every user to tweak the kernel params and buffer sizes manually.

@matt335672
Copy link
Member

Yup - the buffer size was likely the culprit. You probably only want to increase the transmit buffer size as RDP traffic is normally pretty one-directional.

I'll come to the rest of the post later, but I'd first I'll just point out that you can look at the xrdp socket memory with a command like::-

ss -m dport :3389

Or use :22 to look at ssh connections. That might make it clearer what's going on.

Starting with your last point, I can't see how buffering stuff in xrdp would help at all. We've still get to get through the kernel buffers - there's no way round them. The TCP sending algorithm will block until the data it's just sent has been ACK'd.

I could of course be misunderstanding something here, as you raise the very valid point that ssh doesn't seem to need sudo sysctl net.core.wmem_max setting. I can't understand that either. I've had a look at the ssh sources, and there's not anything clever I can see going on related to socket options.

Are you able to look at the socket memory for ssh? That may give us some further clues.

@matt335672
Copy link
Member

Had a think about this last night and this morning I've done a bit of poking around. I've found something interesting (i.e. possibly wrong) here:-

xrdp/common/os_calls.c

Lines 439 to 454 in ccee5af

option_len = sizeof(option_value);
if (getsockopt(rv, SOL_SOCKET, SO_SNDBUF, (char *)&option_value,
&option_len) == 0)
{
if (option_value < (1024 * 32))
{
option_value = 1024 * 32;
option_len = sizeof(option_value);
if (setsockopt(rv, SOL_SOCKET, SO_SNDBUF, (char *)&option_value,
option_len) < 0)
{
LOG(LOG_LEVEL_ERROR, "g_tcp_socket: setsockopt() failed");
}
}
}

That bit of code is called when the listening socket is set up for xrdp, and in this form dates from 2007 when the world was a very different place.

On my system, the default value for SO_SNDBUF is 16384, so xrdp sets the send buffer to 32768. However, it seems this can stop the system from increasing the send buffer size automatically - see this stack overflow link:-

https://stackoverflow.com/questions/67113706/why-does-setting-so-sndbuf-and-so-rcvbuf-destroy-performance

My default size of 16KB for a socket buffer comes from this kernel setting:-

$ sysctl net.ipv4.tcp_wmem
net.ipv4.tcp_wmem = 4096	16384	4194304

This gives us an easy way to disable the xrdp code I've linked above, and get xrdp to behave more like sshd.

@vincent-163 - when you get a moment, can you take a look at your setting for net.ipv4.tcp_wmem? If the middle value is less than 32768, can you try changing it with this command (making the relevant substitutions)?

sudo sysctl net.ipv4.tcp_wmem='<1st current value> 32768 <3rd current value>'

If that solves your problem, the simplest thing to do will be to remove this code. Our other major platform is FreeBSD, and for FreeBSD >= 7.0 send buffer auto-sizing is also supported. Search for FreeBSD net.inet.tcp.sendbuf_auto and net.inet.tcp.sendbuf_inc for more details.

@matt335672
Copy link
Member

I've done some experimenting of my own, using a virtual router running OPNSense and a traffic shaper to emulate @vincent-163's network above:-

$ ping latency.test.lan
PING latency.test.lan (172.19.64.8) 56(84) bytes of data.
64 bytes from latency.test.lan (<snipped>): icmp_seq=1 ttl=62 time=235 ms
64 bytes from latency.test.lan (<snipped>): icmp_seq=2 ttl=62 time=234 ms
64 bytes from latency.test.lan (<snipped>): icmp_seq=3 ttl=62 time=241 ms
64 bytes from latency.test.lan (<snipped>): icmp_seq=4 ttl=62 time=242 ms
64 bytes from latency.test.lan (<snipped>): icmp_seq=5 ttl=62 time=239 ms
64 bytes from latency.test.lan (<snipped>): icmp_seq=6 ttl=62 time=236 ms
^C
--- latency.test.lan ping statistics ---
6 packets transmitted, 6 received, 0% packet loss, time 5007ms
rtt min/avg/max/mdev = 233.848/237.752/241.511/2.986 ms

According to iperf3, my bandwidth is only about 40-60 Mbps, owing to the hardware I'm using.

If I just connect to xrdp on a vanilla Ubuntu 22.04 I get round about 1mbps, broadly in line with @vincent-163's findings above. The command ss -m sport = :3389 reports a tb of 65536, which I believe corresponds to a SO_SNDBUF of 32768 (See tcp(7)).

I can then increase the default buffer size to 32768 using the script set_latency.sh below. As previously mentioned, this disables a setting of SO_SNDBUF which appears to be disabling buffer size optimisation in xrdp.

#!/bin/sh

TARGET=32768

set -- $(sysctl net.ipv4.tcp_wmem)

if [ $# != 5 ]; then
    echo "** Unable to read current SNDBUF settings" >&2
    exit 1
fi
min=$3
def=$4
max=$5

if [ $def -lt $TARGET ]; then
    def=$TARGET
    if [ $def -lt $min ]; then
        echo "** Can't set default buffer size below $min">&2
        false
    elif [ $def -gt $max ]; then
        echo "** Can't set default buffer size over $max">&2
        false
    elif [ $(id -u) -ne 0 ]; then
        echo "** Must be root to set default buffer size">&2
        false
    else
        sysctl net.ipv4.tcp_wmem="$min $def $max"
    fi
else
    echo "** Default send buffer size is already $def" >&2
fi

exit $?

After running the script, you MUST restart xrdp, i.e.:-

sudo ./set_latency.sh
sudo systemctl restart xrdp

On my test system, the difference is amazing frankly. According to iftop I'm now using the full bandwidth of the connection. ss -m sport = :3389 reports a tb of 4194304 which corresponds to an SO_SNDBUF of 2MB.

This has been a problem with xrdp over LFNs for some time I suspect. Thanks to @vincent-163 for helping me find this.

Next step is to reproduce this on a devel build and check removing the erroneous code shows the same improvement in performance. In the meantime, the script above should help xrdp performance on WANs for any existing versions.

@vincent-163
Copy link
Author

Hello! Thanks for pointing out the relevant source code lines and carrying out the experiment. I tried setting the second value of net.ipv4.tcp_wmem to 32768, and it did work as expected: xrdp was using my full bandwidth (30Mbps due to being on a slower connection). I also tried removing the code that you mentioned and recompiling xrdp, and it seems to have the same effect of using all available bandwidth even after I revert the kernel setting changes.

I think the cause of the bug is pretty clear now, but maybe you'll want to verify the fix on your side as well. Thanks for your timely response and hint at the location of the problematic code! It was very rewarding to have a bug report taken seriously and eventually leading to a solution.

@matt335672
Copy link
Member

Thanks for taking the time to write a decent fault report in the first place - without the ssh observation (which initially seemed inexplicable) I don't think we'd have found this one. I'll do some more testing then put a PR together to close this. I suspect this will fix a few historic issues, so I'll try to hunt those down too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants