Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: Direct mbuffer transfer #15

Closed
devopstales opened this issue Jan 10, 2020 · 19 comments
Closed

Feature Request: Direct mbuffer transfer #15

devopstales opened this issue Jan 10, 2020 · 19 comments
Milestone

Comments

@devopstales
Copy link

Currently a dataset can be send troth ssh wit mbuffer but mbuffer can listen on a tcp port for data stream. If we send the data directly to this tcp post without the compression and encryption of ssh it is much more faster than then troth ssh. For this solution we need to start the mbuffer on the destination before we can start sending data.

mbuffer -s 128k -m 1G -I 9090 | zfs receive -vF lremote-zfs/my-vm
zfs send local-zfs/my-vm | mbuffer -s 128k -m 1G -O remote-server:9090
@psy0rz
Copy link
Owner

psy0rz commented Jan 19, 2020

This is non-trivial to implement in a clean way. Perhaps it would help if you specify a faster cipher like arcfour in ~/.ssh/config. (https://github.com/psy0rz/zfs_autobackup#specifying-ssh-port-or-options)

(If you're a company and want us to implement it, contact us for a quote)

@psy0rz psy0rz closed this as completed Jan 19, 2020
@psy0rz
Copy link
Owner

psy0rz commented May 18, 2021

this has become more trivial with the latest changes. If any one wants this, vote it up and i might add it.

@devopstales
Copy link
Author

+1

@digitalsignalperson
Copy link
Contributor

+1 upvote for this and possibly other transport options. An option for fast, secure transport on 10G+ networks could be

@digitalsignalperson
Copy link
Contributor

hey @psy0rz thoughts on opening this issue back up, or I'd be happy to open a new one for discussion

@psy0rz psy0rz reopened this Jan 28, 2022
@psy0rz psy0rz added this to the 3.3 milestone Jan 28, 2022
@psy0rz
Copy link
Owner

psy0rz commented Jan 28, 2022

yes, because of the process handling extensions made for zfs-autoverify, this should be doable.

however, are socat and spiped with encryption faster than regular ssh pipes?
(i understand direct mbuffer over plain tcp is offcourse)

@digitalsignalperson
Copy link
Contributor

I'll see if I can do some tests on my 10Gbit setup to see how they compare

mbuffer and netcat can also do the same job, I'm not sure which of the two is higher performance. In this syncoid PR there is some back and forth between mbuffer and nc, and there may be some arguments in favor of nc jimsalterjrs/sanoid#513

however, even without using SSL, socat may in fact be faster than netcat. A benchmark here (albeit very old 2008) found socat faster than netcat https://wiki.atlas.aei.uni-hannover.de/ATLAS/ZFSBenchmarkTest

@digitalsignalperson
Copy link
Contributor

either way, regardless of the tool, they would all have similar setups and usages

  • ssh to node to setup the port and whichever tool (netcat, mbuffer, socat, spiped, ...) and initiate the zfs send
  • from the other side pipe the data via the tool (netcat, mbuffer, socat, spiped, ...)

maybe there is a flexible way to define scripts that can plug in any of the options

@psy0rz
Copy link
Owner

psy0rz commented Jan 28, 2022

it might even already be doable via --send-pipe and --recv-pipe, but maybe hackish and problematic with buildup/teardown.

mbuffer is universally supported in operating systems. but we can support multiple tools.

@wishdev
Copy link

wishdev commented Feb 18, 2022

Just wanted to add that, indeed, --send-pipe and --recv-pipe work for this concept.

I used netcat but the following options worked for me - not an ssh connection in sight for send/recv and no issues with setup or any processes hanging around. Seemed very clean.

--send-pipe "nc server_name 8023"
--recv-pipe "nc -l -p 8023"

@psy0rz
Copy link
Owner

psy0rz commented Feb 18, 2022

awesome, thanks for the info!

@dberlin
Copy link

dberlin commented Feb 23, 2022

One of the tricky parts of this is finding an unused port that is safe to listen on and tell the other machine to connect to.
Particularly in a portable way. It's doable, but annoying and possibly slow (IE it degrades to trying to listen on every port starting at until the kernel gives you one).

If that was implemented in zfs-autobackup in python, and made available as a variable to send/recv pipe (IE as $FREEPORT or something), that would make things like nc/mbuffer a lot easier and safer

@digitalsignalperson
Copy link
Contributor

digitalsignalperson commented Jun 7, 2022

fwiw did some testing on localhost of transport options throughput (not yet piped through zfs_autobackup)

  • plain netcat ~1.0GB/sec
  • spiped ~120MB/sec
  • socat with openssl encryption option ~650MB/sec
  • netcat with gpg ~150MB/sec
  • netcat with age ~750MB/sec
  • ssh ~500MB/sec

Impressed with the simplicity and speed of netcat + age

Didn't test mbuffer because it's not in the arch linux repos

Edit: Added ssh. Not that bad.

@digitalsignalperson
Copy link
Contributor

digitalsignalperson commented Jun 7, 2022

actually I think ssh wins after all (unless you want unencrypted, then plain netcat)

Checking my supported ciphers with ssh -Q cipher

Default chacha20-poly1305@openssh.com getting ~500MB/sec
Trying aes128-ctr getting ~850MB/sec

Always good to measure!! I just assumed based on what I read that ssh was gonna be slow...

@psy0rz
Copy link
Owner

psy0rz commented Jun 7, 2022

ssh is pretty ok nowadays. :)

@badamson001
Copy link

Hi, I'm new to this project and was trying to get the netcat example above working but running into some confusion.

  1. Is it necessary to specify ssh-source or ssh-target if I'm using the --send-pipe and --recv-pipe?
  2. Would someone be kind enough to post a full --send-pipe --recv-pipe usage so I can see what I am misunderstanding?

@psy0rz
Copy link
Owner

psy0rz commented Nov 18, 2022

  1. Yes you still need ssh-source or ssh-target just like you normally would. All the other stuff is still done via ssh, as wel as setting up the nc.

  2. Just get zfs-autobackup to work correctly in the regular way, after that try adding something like:

--send-pipe "nc server_name 8023"
--recv-pipe "nc -l -p 8023"

The server_name is the name of the target machine, the port is an arbitrary free port you choose. Make sure there isnt any firewall in the way.

@psy0rz
Copy link
Owner

psy0rz commented Nov 18, 2022

One of the tricky parts of this is finding an unused port that is safe to listen on and tell the other machine to connect to. Particularly in a portable way. It's doable, but annoying and possibly slow (IE it degrades to trying to listen on every port starting at until the kernel gives you one).

If that was implemented in zfs-autobackup in python, and made available as a variable to send/recv pipe (IE as $FREEPORT or something), that would make things like nc/mbuffer a lot easier and safer

No, thats too hackish and too much feature creep for this project i think. Because then you still would have issue with firewalls or portforwards for example. Too much magic isnt good. :)

Its best to let the admin choose a fixed port and make sure that this port is reachable from the source.

@psy0rz
Copy link
Owner

psy0rz commented Feb 26, 2023

@psy0rz psy0rz closed this as completed Feb 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants