Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sshutil: prioritize aes128-gcm@openssh.com when AES acceleration is available (roughly 60% faster on Intel Mac) #299

Merged
merged 1 commit into from
Oct 8, 2021

Conversation

AkihiroSuda
Copy link
Member

@AkihiroSuda AkihiroSuda commented Oct 7, 2021

By default, ssh chooses chacha20-poly1305@openssh.com, even when AES acceleration is available.
(OpenSSH_8.1p1, macOS 11.6, MacBookPro 2020, Core i7-1068NG7)

AES accelerator is available on almost all recent Intel and AMD processors, but not on all ARM processors.
Probably available on Apple M1 too. (#299 (comment))


Benchmark

sshfs becomes roughly 60% faster on Intel Mac.

Before

suda@lima-default:/Users/suda/gopath/src/github.com/Homebrew/homebrew-core$ time git diff

real    0m34.332s
user    0m1.791s
sys     0m7.152s
suda@lima-default:/Users/suda/gopath/src/github.com/Homebrew/homebrew-core$ time git diff

real    0m15.637s
user    0m1.323s
sys     0m3.977s
suda@lima-default:/Users/suda/gopath/src/github.com/Homebrew/homebrew-core$ time git diff

real    0m12.478s
user    0m1.258s
sys     0m2.829s
suda@lima-default:/Users/suda/gopath/src/github.com/Homebrew/homebrew-core$ time git diff

real    0m13.237s
user    0m1.295s
sys     0m2.959s

After

suda@lima-default:/Users/suda/gopath/src/github.com/Homebrew/homebrew-core$ time git diff

real    0m20.687s
user    0m2.473s
sys     0m4.218s
suda@lima-default:/Users/suda/gopath/src/github.com/Homebrew/homebrew-core$ time git diff

real    0m8.318s
user    0m1.175s
sys     0m2.095s
suda@lima-default:/Users/suda/gopath/src/github.com/Homebrew/homebrew-core$ time git diff

real    0m8.389s
user    0m1.336s
sys     0m1.866s
suda@lima-default:/Users/suda/gopath/src/github.com/Homebrew/homebrew-core$ time git diff

real    0m8.411s
user    0m1.287s
sys     0m1.982s

Host info

  • OS: macOS 11.6
  • Model: MacBookPro 2020
machdep.cpu.brand_string: Intel(R) Core(TM) i7-1068NG7 CPU @ 2.30GHz
machdep.cpu.core_count: 4
machdep.cpu.cores_per_package: 8
machdep.cpu.logical_per_package: 16
machdep.cpu.thread_count: 8
machdep.cpu.features: FPU VME DE PSE TSC MSR PAE MCE CX8 APIC SEP MTRR PGE MCA CMOV PAT PSE36 CLFSH DS ACPI MMX FXSR SSE SSE2 SS HTT TM PBE SSE3 PCLMULQDQ DTES64 MON DSCPL VMX EST TM2 SSSE3 FMA CX16 TPR PDCM SSE4.1 SSE4.2 x2APIC MOVBE POPCNT AES PCID XSAVE OSXSAVE SEGLIM64 TSCTMR AVX1.0 RDRAND F16C
machdep.cpu.leaf7_features: RDWRFSGS TSC_THREAD_OFFSET SGX BMI1 AVX2 FDPEO SMEP BMI2 ERMS INVPCID FPU_CSDS AVX512F AVX512DQ RDSEED ADX SMAP AVX512IFMA CLFSOPT IPT AVX512CD SHA AVX512BW AVX512VL AVX512VBMI UMIP PKU GFNI VAES VPCLMULQDQ AVX512VNNI AVX512BITALG AVX512VPOPCNTDQ RDPID SGXLC FSREPMOV MDCLEAR IBRS STIBP L1DF ACAPMSR SSBD

@AkihiroSuda AkihiroSuda changed the title sshutil: prioritize aes128-gcm@openssh.com when AES-NI is available sshutil: prioritize aes128-gcm@openssh.com when AES-NI is available (roughly 60% faster on Intel Mac) Oct 7, 2021
@AkihiroSuda AkihiroSuda added this to the v0.7.0 milestone Oct 7, 2021
@AkihiroSuda AkihiroSuda force-pushed the aesni branch 2 times, most recently from 15535c3 to d1c08eb Compare October 7, 2021 10:49
@jandubois
Copy link
Member

Unfortunately I cannot reproduce the speedup. Maybe the CPU is too old, but it does support AES instructions:

machdep.cpu.brand_string: Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz
machdep.cpu.core_count: 4
machdep.cpu.thread_count: 8
machdep.cpu.features: FPU VME DE PSE TSC MSR PAE MCE CX8 APIC SEP MTRR PGE MCA CMOV PAT PSE36 CLFSH DS ACPI MMX FXSR SSE SSE2 SS HTT TM PBE SSE3 PCLMULQDQ DTES64 MON DSCPL VMX EST TM2 SSSE3 FMA CX16 TPR PDCM SSE4.1 SSE4.2 x2APIC MOVBE POPCNT AES PCID XSAVE OSXSAVE SEGLIM64 TSCTMR AVX1.0 RDRAND F16C
machdep.cpu.leaf7_features: RDWRFSGS TSC_THREAD_OFFSET SGX BMI1 HLE AVX2 SMEP BMI2 ERMS INVPCID RTM FPU_CSDS MPX RDSEED ADX SMAP CLFSOPT IPT MDCLEAR TSXFA IBRS STIBP L1DF SSBD
machdep.cpu.extfeatures: SYSCALL XD 1GBPAGE EM64T LAHF LZCNT PREFETCHW RDTSCP TSCI

I've manually verified that ssh will use chacha20-poly1305@openssh.com by default:

debug1: kex: server->client cipher: chacha20-poly1305@openssh.com MAC: <implicit> compression: none
debug1: kex: client->server cipher: chacha20-poly1305@openssh.com MAC: <implicit> compression: none

and can switch to aes128-gcm@openssh.com on request:

debug1: kex: server->client cipher: aes128-gcm@openssh.com MAC: <implicit> compression: none
debug1: kex: client->server cipher: aes128-gcm@openssh.com MAC: <implicit> compression: none

For comparison I've been copying a 200MB file inside /tmp/lima. It runs in 1½s on the host:

$ time cp a0 a1
real	0m1.268s
user	0m0.008s
sys	0m0.085s
$ time cp a0 a2
real	0m0.953s
user	0m0.009s
sys	0m0.083s
$ time cp a0 a3
real	0m1.505s
user	0m0.009s
sys	0m0.089s

Running inside the default instance, it runs in about 25s (I've double-checked that the sshfs commands don't specify a cipher):

jan@lima-default:/tmp/lima$ time cp a0 a1
real	0m25.017s
user	0m0.024s
sys	0m15.393s
jan@lima-default:/tmp/lima$ time cp a0 a2
real	0m25.242s
user	0m0.012s
sys	0m15.750s
jan@lima-default:/tmp/lima$ time cp a0 a3
real	0m23.573s
user	0m0.013s
sys	0m14.892s

Running one more time with a build compiled from this PR (I've verified that the ciphers are requested in the sshfs commands) shows identical times within the margin of error:

jan@lima-default:/tmp/lima$ time cp a0 a1
real	0m25.865s
user	0m0.013s
sys	0m15.533s
jan@lima-default:/tmp/lima$ time cp a0 a2
real	0m24.179s
user	0m0.000s
sys	0m15.009s
jan@lima-default:/tmp/lima$ time cp a0 a3
real	0m23.917s
user	0m0.012s
sys	0m14.851s

And I did rm a[123] between tests. So while the PR looks fine to me, it doesn't seem to have any effect on my machine.

I guess I need to do the git diff comparison, as that tests a different set of file system operation beyond raw data moving.

@jandubois
Copy link
Member

I guess I need to do the git diff comparison, as that tests a different set of file system operation beyond raw data moving.

Unfortunately same results. If anything the AES version is slightly slower (using a different git repo):

$ time git diff
real	0m0.047s
user	0m0.019s
sys	0m0.097s
$ time git diff
real	0m0.051s
user	0m0.020s
sys	0m0.110s
$ time git diff
real	0m0.050s
user	0m0.021s
sys	0m0.079s

# on master
jan@lima-default:/Users/jan/git/perl$ time git diff
real	0m11.487s
user	0m2.130s
sys	0m2.910s
jan@lima-default:/Users/jan/git/perl$ time git diff
real	0m9.700s
user	0m1.847s
sys	0m1.964s
jan@lima-default:/Users/jan/git/perl$ time git diff
real	0m9.769s
user	0m1.824s
sys	0m2.226s
jan@lima-default:/Users/jan/git/perl$ time git diff
real	0m9.726s
user	0m1.879s
sys	0m1.967s

# on aesni branch
jan@lima-default:/Users/jan/git/perl$ time git diff
real	0m19.773s
user	0m2.035s
sys	0m4.217s
jan@lima-default:/Users/jan/git/perl$ time git diff
real	0m10.239s
user	0m1.620s
sys	0m2.571s
jan@lima-default:/Users/jan/git/perl$ time git diff
real	0m10.232s
user	0m1.851s
sys	0m2.180s

Copy link
Member

@jandubois jandubois left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even though I cannot repro the performance gains, the PR looks correct to me. I assume that the aes code makes uses of other new CPU features that my older CPU is lacking, even though it does have support for AES. At least it doesn't seem to get slower on my machine.

@jandubois
Copy link
Member

Still rather sad that file copy is 15× and git diff over a 100× slower inside the VM. 😢

@AkihiroSuda
Copy link
Member Author

AkihiroSuda commented Oct 8, 2021

Some folks are collecting openssl speed benchmark on https://gist.github.com/voluntas/fd279c7b4e71f9950cfd4a5ab90b722b

aes-128-gcm seems faster than chacha20-poly1305 , even on M1 Mac?

MacBook Air (M1)

https://gist.github.com/voluntas/fd279c7b4e71f9950cfd4a5ab90b722b#gistcomment-3535685

MacPorts でインストールした OpenSSL 1.1.1 を使いました。

aes-128-gcm 1085507.93k 3046557.93k 4649582.59k 4583631.53k 4490816.30k 4548853.76k

chacha20-poly1305 381908.61k 582010.79k 1081650.60k 1740476.13k 1710656.17k 1737026.22k

Mac mini (M1)

https://gist.github.com/voluntas/fd279c7b4e71f9950cfd4a5ab90b722b#gistcomment-3784432

OpenSSL 3.0.0-beta1 17 Jun 2021 (Library: OpenSSL 3.0.0-beta1 17 Jun 2021)

AES-128-GCM 1056088.81k 3003472.98k 4813335.64k 6380920.72k 6714979.67k 6740770.82k

ChaCha20-Poly1305 379347.91k 588236.42k 1172095.15k 1838369.39k 1830761.81k 1809356.12k

@AkihiroSuda AkihiroSuda changed the title sshutil: prioritize aes128-gcm@openssh.com when AES-NI is available (roughly 60% faster on Intel Mac) sshutil: prioritize aes128-gcm@openssh.com when AES acceleration is available (roughly 60% faster on Intel Mac) Oct 8, 2021
…ailable

By default, `ssh` chooses chacha20-poly1305@openssh.com, even when AES accelerator is available.
(OpenSSH_8.1p1, macOS 11.6, MacBookPro 2020, Core i7-1068NG7)

AES accelerator is available on almost all recent Intel and AMD processors, but not on all ARM processors.
Probably available on Apple M1 too.

Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
@AkihiroSuda
Copy link
Member Author

Updated PR to prioritize aes128-gcm@openssh.com on Apple M1 too.

@AkihiroSuda AkihiroSuda merged commit 8091d05 into lima-vm:master Oct 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants