Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

implemented option for direct connection via socat and busybox nc #513

Merged
merged 10 commits into from
Apr 27, 2023

Conversation

phreaker0
Copy link
Collaborator

This implements the desired feature to bypass ssh for sending the replication data and use a plain TCP connection. Added warnings of course that this option should not be used lightweight, the parameter option alone should be a big hint to the user :-)

An example use case: Two servers connected via a common network and via a dedicated link.
syncoid --compress=none --insecure-direct-connection=192.168.32.2:4444 local_pool root@backup:remote_pool

192.168.32.2 is the network for the direct link and the target host ip address. So all the unencrypted data is transferred via the dedicated link which is trusted.

The option can also be used in the case of natted network topolgies by specifying a differen listen address:

syncoid --compress=none --insecure-direct-connection=192.168.32.2:4444,10.0.2.4:3333 local_pool root@backup:remote_pool

Why did I use socat and busybox nc? Because it made it possible to make it really easy and clean to implement.

socat supports connection retrying which is needed because the listening socket isn't available immediately.
And the busybox netcat implementation is the only one I found which can timeout on an listening socket which is needed to abort if the connection doesn't work (firewall, argument error, ...)

Fixes #371

@phreaker0 phreaker0 added the review needed Ready for review and testing label Feb 21, 2020
@secabeen
Copy link
Contributor

This is great! Totally saves wasted CPU cycles on two systems connected by a trusted network doing pointless SSH encrypt/decrypt (especially when sending raw encrypted streams).

@devZer0
Copy link

devZer0 commented Mar 7, 2020

sorry, but how can i install busybox netcat in a clean way on centos 7 (via rpm command) ?

@devZer0
Copy link

devZer0 commented Mar 7, 2020

do we really really need busybox nc and socat ?

couldn't we simply use:

mbuffer -W 10 -I 8888

mbuffer: error: watchdog timeout: input stalled; sending SIGINT

and wrap "mbuffer -O host:port" tries/retries in the syncoid perl-script ?

@jimsalterjrs
Copy link
Owner

@phreaker0 does this start the netcat listener on the remote side and close it again as needed, or just expect to find an always-on listener?

@phreaker0
Copy link
Collaborator Author

@devZer0 Nice, It didn't read/found out that mbuffer timeout also works for non established connection, for all the other tools I tested it didn't (normal netcat, socat, ...). So I guess I can exchange busbox netcat for mbuffer then (will test this latter). But I still need socat for the connection retry options.
Doing the retry stuff in perl (as I first planned) would be way more difficult, as I can't use the existing single ssh pipe call for send/recv as syncoid tools, so I would need to rewrite much code of syncoid and it will be more error prone, as I have to start the server on the target and client on the source separately and also need to monitor them somehow, but without threads this will be difficult.
And the code would be much harder to maintain and likely wouldn't get merge.

@jimsalterjrs it will start the netcat listener on the remote side as needed and will close it again after the replication or on error

@jimsalterjrs
Copy link
Owner

jimsalterjrs commented Mar 8, 2020 via email

@devZer0
Copy link

devZer0 commented Mar 9, 2020

i had some little conversation with socat author/maintainer and asked for listen timeout feature in socat and convinced him it could be useful. he sent a patch with 1 day :)

as it will need some time that such enhancement will find it's way in major distros, i think there could be 2 ways to proceed:

  1. use socat on the sending and mbuffer on the receiving side
  2. use socat on both sides if socat is >=versionnr. with that new feature

@phreaker0 , if you like testing the socat patch i can forward it to you

furthermore, i'm feeling uncomfortable that there is a listener on the receiving side which accepts connection from everywhere for the timeframe of transfer.

if socat is used on the receiving side, there could be easily added some security option to restrict access ( see "RANGE option group" in https://linux.die.net/man/1/socat )

@devZer0
Copy link

devZer0 commented Mar 10, 2020

btw, i get the following warning:

Use of uninitialized value $sourcehost in string ne at ./syncoid line 128.

@TheLQ
Copy link

TheLQ commented Mar 25, 2020

I get a "Use of uninitialized value $sourcehost in string ne at ./syncoid line 128." warning when using

./syncoid --create-bookmark -r --compress=none --insecure-direct-connection host2:4343 big10 root@host2:big8/big10

--insecure-direct-connection should be able to take just a port and pull the host from the destination argument. Makes LAN backups simpler. Ideally it could automatically pick a free port but that seems to be non-trivial with busybox nc.

Also could use command checking for socat. If it doesn't exist commands just repeatedly fail.

Otherwise works well for me

@phreaker0
Copy link
Collaborator Author

@TheLQ warnings are fixed, command checks are in place
@devZer0 i'm now using mbuffer for the listening socket instead of busybox nc but's it nice that socat will have a listen timeout in the future as well. The reason why mbuffer didn't work for me at first was the order of arguments, if one uses mbuffer -I 8888 -W 10 it will not timout if there is no connection, only if -W is before the -I flag. This is documented in the manpage.

@jimsalterjrs I don't see a point in supporting busybox nc as well if mbuffer can do the job as well, you can test now

@phreaker0
Copy link
Collaborator Author

mhm, it doesn't work on my servers (i only tested with local addresses on my machine), need to investigate

@phreaker0
Copy link
Collaborator Author

So, mbuffer behaves much differently than the other listening tools. The address provided to mbuffer isn't used as listening address but as src address whitelist. mbuffer will listen on all network interfaces.

Therefore I switched back to busybox nc as default and added an option for switching to mbuffer (in which case the specified listen address is used as an IP filter).

I also increased the default timeout to 60 seconds and made it configurable, for some of my datasets with tiny files on rust and lot's of metadata changes zfs send can be so slow that a timeout is triggered.

Command check for busybox nc and mbuffer is done according to the provided options.

examples:

busybox nc, target and listen ip is the same (no NAT)
syncoid --compress=none --insecure-direct-connection=192.168.32.2:4444 local_pool root@backup:remote_pool

busybox nc, target and listen ip is different -> NAT
syncoid --compress=none --insecure-direct-connection=192.168.32.2:4444,10.0.2.4:3333 local_pool root@backup:remote_pool

busybox nc, target and listen ip is the same (no NAT) and timeout of 120 seconds
syncoid --compress=none --insecure-direct-connection=192.168.32.2:4444,192.168.32.2:4444,120 local_pool root@backup:remote_pool

mbuffer tcp (192.168.32.1 is src address), target and listen ip is the same and timeout of 120 seconds
syncoid --compress=none --insecure-direct-connection=192.168.32.2:4444,192.168.32.1:4444,120,mbuffer local_pool root@backup:remote_pool

@asche77
Copy link

asche77 commented May 30, 2020

Would love to see this for LAN syncs ...

@geudrik
Copy link

geudrik commented Jul 16, 2020

This looks awesome, looking forward to seeing it merged

Copy link

@danvatca danvatca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also adding some tests in tests/syncoid/... would help ensure the code does not break exiting types of transfers. And any changes down the line will not break them. At the very leas I would add one test for secure and another one for insecure - just happy path.

@@ -262,6 +262,11 @@ As of 1.4.18, syncoid also automatically supports and enables resume of interrup

Use specified identity file as per ssh -i.

+ --insecure-direct-connection=IP:PORT[,IP:PORT,[TIMEOUT,[mbuffer]]]
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use different parameters for each of these 4 "sub-parameters"? If user wants to change the timeout, while he wants the second parameter as default, he/she will be confused.

@@ -1911,6 +1986,7 @@ Options:
--sshport=PORT Connects to remote on a particular port
--sshcipher|c=CIPHER Passes CIPHER to ssh to use a particular cipher set
--sshoption|o=OPTION Passes OPTION to ssh for remote usage. Can be specified multiple times
--insecure-direct-connection=IP:PORT[,IP:PORT] WARNING: DATA IS NOT ENCRYPTED. First address pair is for connecting to the target and the second for listening at the target
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not consistent with the README.md and the actual parameters.

print("CRITICAL: invalid insecure-direct-connection argument!\n");
pod2usage(2);
exit 127;
} elsif (scalar @parts >= 2) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is $parts[1] when you have two parameters? It could be the $directtimeout if it is a single number? Or $directmbuffer if it is a string? Or $directlisten if it matches HOST:PORT?

if ($directmbuffer) {
$remotecmd .= " $mbuffercmd $args{'target -bwlimit'} -W $directtimeout -I " . $directlisten . " $mbufferoptions |";
} elsif (length $directlisten) {
$remotecmd .= " busybox nc -l " . $directlisten . " -w $directtimeout |";
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extract busybox nc to something like $bbnccmd (similar to $socatcmd and the rest), and check its availability.

Use of uninitialized value in concatenation (.) or string at ./syncoid line 1317.
just an asuumption on my part considering the rest of the variables with same name dont have the space in that location
@jim-perkins
Copy link
Contributor

jim-perkins commented Aug 10, 2020

just wanted to comment that I have tested the insecure connection and it 'works for me' using 10Gb SFP+ . I was limited to about 200MB per second network transfer with ssh cipher 'aes128-ctr' . only about 150MB with the default ssh cipher. using the insecure connection was able to sustain almost 400MB per second. The data was being transferred between ssd's on both ends.

BTW I just copied the syncoid from https://github.com/phreaker0/sanoid/blob/direct-connection/syncoid and dropped it into /usr/local/bin

and here is the command line tested

/usr/local/bin/syncoid  --recursive --no-sync-snap --compress=none --insecure-direct-connection=192.168.1.30:4444,192.168.1.20:4444,120,mbuffer server_ssd root@serverx:serverx_ssd/backup

@TheLQ
Copy link

TheLQ commented Mar 12, 2021

I'm using this and it works fine for me. Merged with latest master and fixed the minor conflict from a new option being added, still no problems.

@devZer0
Copy link

devZer0 commented Mar 23, 2021

regarding #513 (comment) , socat since version 1.7.4.0 now supports option "accept-timeout" , which make it also suitable for sanoid/syncoid

http://www.dest-unreach.org/socat/doc/socat.html#OPTION_ACCEPT_TIMEOUT

accept-timeout=
End waiting for a connection after [timeval] with error status.

http://www.dest-unreach.org/socat/doc/CHANGES

New option accept-timeout (listen-timeout)
Test: ACCEPTTIMEOUT
Proposed by Roland

@tinsami1
Copy link

tinsami1 commented Mar 4, 2022

What else is needed for this to be included in the next release? Or, at least, merged to master?

@jimsalterjrs jimsalterjrs merged commit 55c5e0e into jimsalterjrs:master Apr 27, 2023
@mailinglists35
Copy link

does this only work local to remote? or am I using it wrong?

I'm trying remote to local and fails like this (the resume interrupted is because I started it via ssh then CTRL-C'd it, but fails the same on clean send).

BTW here is -w in action from nmap-ncat on OL8, @phreaker0, as a continuation of the previous discussion that I started in the wrong place. I modified $directtimeout to 10 from 60 in syncoid and you can see nmap-ncat -w times out in listening mode after 10 seconds:

[remoteuser@dell810 sanoid]$ ./syncoid --debug --insecure-direct-connection=192.168.70.11:12345 --no-sync-snap --sendoptions="-L" --recvoptions="-vu" --compress=none --source-bwlimit=900m --target-bwlimit=900m remoteuser@olvm2:olvm2/vm_uri hddolvm/vm_uri | sed "s/^/$(date '+[%Y-%m-%d %H:%M:%S]') /"
[2023-06-26 17:07:32] DEBUG: SSHCMD: ssh
[2023-06-26 17:07:32] DEBUG: compression forced off from command line arguments.
[2023-06-26 17:07:32] DEBUG: checking availability of socat on source...
[2023-06-26 17:07:32] DEBUG: checking availability of busybox (for nc) on target...
[2023-06-26 17:07:32] DEBUG: checking availability of mbuffer on source...
[2023-06-26 17:07:32] DEBUG: checking availability of mbuffer on target...
[2023-06-26 17:07:32] DEBUG: checking availability of pv on local machine...
[2023-06-26 17:07:32] DEBUG: checking availability of zfs resume feature on source...
[2023-06-26 17:07:32] DEBUG: checking availability of zfs resume feature on target...
[2023-06-26 17:07:32] DEBUG: syncing source olvm2/vm_uri to target hddolvm/vm_uri.
[2023-06-26 17:07:32] DEBUG: getting current value of syncoid:sync on olvm2/vm_uri...
[2023-06-26 17:07:32] ssh      -S /tmp/syncoid-remoteuser@olvm2-1687788452-9889 remoteuser@olvm2 sudo zfs get -H syncoid:sync ''"'"'olvm2/vm_uri'"'"''
[2023-06-26 17:07:32] DEBUG: checking to see if hddolvm/vm_uri on  is already in zfs receive using  ps -Ao args= ...
[2023-06-26 17:07:32] DEBUG: checking to see if target filesystem exists using " sudo zfs get -H name 'hddolvm/vm_uri' 2>&1 |"...
[2023-06-26 17:07:32] DEBUG: getting current value of receive_resume_token on hddolvm/vm_uri...
[2023-06-26 17:07:32]  sudo zfs get -H receive_resume_token 'hddolvm/vm_uri'
[2023-06-26 17:07:32] DEBUG: got receive resume token: 1-f7df1bf1c-e0-789c636064000310a500c4ec50360710e72765a52697303048409460caa7a515a796806426f0c3e4d990e4932a4b528b81b4c3a367ecd8f497e4a79766a630303ce479dc3bb566b9a401923c27583e2f313715684f4e59ae917e596e7c6951a643724e6a621ec23dbc0c08f7e72416a5a726e5e42767e767438519004f7f203e:
[2023-06-26 17:07:32] DEBUG: getting estimated transfer size from source -S /tmp/syncoid-remoteuser@olvm2-1687788452-9889 remoteuser@olvm2 using "ssh      -S /tmp/syncoid-remoteuser@olvm2-1687788452-9889 remoteuser@olvm2 sudo zfs send  -nvP -t 1-f7df1bf1c-e0-789c636064000310a500c4ec50360710e72765a52697303048409460caa7a515a796806426f0c3e4d990e4932a4b528b81b4c3a367ecd8f497e4a79766a630303ce479dc3bb566b9a401923c27583e2f313715684f4e59ae917e596e7c6951a643724e6a621ec23dbc0c08f7e72416a5a726e5e42767e767438519004f7f203e 2>&1 |"...
[2023-06-26 17:07:32] DEBUG: sendsize = 24138379648
[2023-06-26 17:07:32] Resuming interrupted zfs send/receive from olvm2/vm_uri to hddolvm/vm_uri (~ 22.5 GB remaining):
[2023-06-26 17:07:32] DEBUG: ssh      -S /tmp/syncoid-remoteuser@olvm2-1687788452-9889 remoteuser@olvm2 'sudo zfs send  -t 1-f7df1bf1c-e0-789c636064000310a500c4ec50360710e72765a52697303048409460caa7a515a796806426f0c3e4d990e4932a4b528b81b4c3a367ecd8f497e4a79766a630303ce479dc3bb566b9a401923c27583e2f313715684f4e59ae917e596e7c6951a643724e6a621ec23dbc0c08f7e72416a5a726e5e42767e767438519004f7f203e | mbuffer -R 900m -q -s 128k -m 16M | socat - TCP:192.168.70.11:12345,retry=10,interval=1' |  nc -l 192.168.70.11:12345 -w 10 | mbuffer -r 900m -q -s 128k -m 16M | pv -p -t -e -r -b -s 24138379648 | sudo zfs receive -v -u  -s -F 'hddolvm/vm_uri' 2>&1
Ncat: Could not resolve hostname "192.168.70.11:12345": Name or service not known. QUITTING.
0.00 B 0:00:00 [0.00 B/s] [>                                                                                                     ]  0%
[2023-06-26 17:07:32] cannot receive: failed to read from stream
2023/06/26 17:07:43 socat[28059] E connect(5, AF=2 192.168.70.11:12345, 16): Connection refused
mbuffer: error: outputThread: error writing to <stdout> at offset 0x10000: Broken pipe
mbuffer: warning: error during output to <stdout>: Broken pipe
CRITICAL ERROR: ssh      -S /tmp/syncoid-remoteuser@olvm2-1687788452-9889 remoteuser@olvm2 'sudo zfs send  -t 1-f7df1bf1c-e0-789c636064000310a500c4ec50360710e72765a52697303048409460caa7a515a796806426f0c3e4d990e4932a4b528b81b4c3a367ecd8f497e4a79766a630303ce479dc3bb566b9a401923c27583e2f313715684f4e59ae917e596e7c6951a643724e6a621ec23dbc0c08f7e72416a5a726e5e42767e767438519004f7f203e | mbuffer -R 900m -q -s 128k -m 16M | socat - TCP:192.168.70.11:12345,retry=10,interval=1' |  nc -l 192.168.70.11:12345 -w 10 | mbuffer -r 900m -q -s 128k -m 16M | pv -p -t -e -r -b -s 24138379648 | sudo zfs receive -v -u  -s -F 'hddolvm/vm_uri' 2>&1 failed: 256 at ./syncoid line 629.
[remoteuser@dell810 sanoid]$ grep directtimeout syncoid
my $directtimeout = 10;

@phreaker0
Copy link
Collaborator Author

@mailinglists35 checking your output ncat is exiting immediately:
'Ncat: Could not resolve hostname "192.168.70.11:12345": Name or service not known. QUITTING.'

and socat retries 10 times with 1 second intervals and gives up.

@mailinglists35

This comment was marked as outdated.

@mailinglists35
Copy link

oh so the $directtimeout is for socat, not for nc? it seems to be used both by nc and socat, though.

@mailinglists35
Copy link

oh, sorry, nmap-ncat does not like -l IP:PORT :)

@mailinglists35
Copy link

@phreaker0 I see you have a $directmbuffer hardcoded, is it usable if I switch to 1, and how? will that bypass nc?

@mailinglists35
Copy link

ok, I modified my local copy of syncoid to understand nmap-ncat, since there is no busybox in EL9 repos...

@phreaker0 phreaker0 mentioned this pull request Sep 28, 2023
@phreaker0 phreaker0 deleted the direct-connection branch April 26, 2024 06:27
@MrRinkana
Copy link

For reference for people looking, and since im not sure where to put it but:
(if you are getting nc: timed out or mbuffer: error: watchdog timeout: input stalled; sending SIGINT)

socat seems to for some reason ignore your ip routes so if you have like me set up a (example: wireguard) gateway on your lan which port-forwards to the target at the other end of the tunnel you cannot specify the "ip routed ip", but must use the ip of the gateway directly.

Example:

LAN at source is: 168.192.1.1/24
target is: 10.10.10.2
gateway is: 168.192.1.2 on LAN and 10.10.10.1 in tunnel/other interface
ip route is: 10.10.10.2 via 168.192.1.2 ... ...

then the commands
--insecure-direct-connection=10.10.10.2:4444,..blabla... ... will fail.
--insecure-direct-connection=168.192.1.2:4444,..blabla... ... will succeed!

Took me a good while to stumble into 😅

@griznog
Copy link

griznog commented Jan 5, 2025

Tried using this today and after a bit of head scratching finally got it to work by modifying syncoid to build a busybox nc command that looked like:

busybox nc -l  -p 9043 -w 60

Could not get it to work with the arguments like it was producing with "HOST:PORT". This is on Rocky 8, so maybe I have an older/different version of busybox.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
review needed Ready for review and testing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Request: NC support