Optimize lib.ipsec.esp #845

eugeneia · 2016-03-24T17:39:06Z

This optimizes the encapsulation and decapsulation routines provided by lib.ipsec.esp:

The routines now use raw buffer mutations instead of packet.append, and no longer create a new packet, instead they only mutate the input packet. For this I added a function packet.resize that is really just a setter for packet.length that additionally checks for overflow.
Following the awesome JIT tracology research by @alexandergall, the branch heavy sequence number tracking is now implemented as a C function. (This is actually not a caveat, it would probably have to become a C function anyways in the course of implementing efficient anti-replay functionality.)

Benchmark results:

$ sudo ./snabb snabbmark esp 1e7 1024 encapsulate
Encapsulation (packet size = 1024): 13.78 Gbit/s

$ sudo ./snabb snabbmark esp 1e7 1024 decapsulate
Decapsulation (packet size = 1024): 14.23 Gbit/s

Cc @lukego

…filing optional.

See: snabbco#612 (comment)

lukego · 2016-04-21T11:45:37Z

src/core/packet.lua

+   assert(len <= max_payload, "packet payload overflow")
+   p.length = len
+end
+


This resize function seems potentially error-prone to me: if you resize the packet but don't fill in all of the data then you will be leaking unknown information from the previous user of the packet.

One alternative would be for encapsulation/encryption to use a new packet.pad(p, n) that always fills with zeros and for decapsulation/decryption to use a new function akin to packet.from_pointer() to set payload and size at the same time.

I see two options:

Divide resize up into grow and shrink (makes the code slightly awkward and API kind of encourages branching, although not in this specific case) :

--- Set packet data length. -function resize (p, len) - assert(len <= max_payload, "packet payload overflow") - p.length = len +-- Grow packet data length. +function grow (p, o) + local len = p.length + local new_len = len + o + assert(new_len <= max_payload, "packet payload overflow") + ffi.fill(p.data+len, o) + p.length = new_len +end + +-- Shrink packet data length. +function shrink (p, o) + local len = p.length + local new_len = len-o + assert(new_len >= 0, "packet payload underflow") + p.length = new_len

Always zero packet memory in free (gives a pretty solid protection against information leaks, performance impact?):

function free (p) counter.add(engine.frees) counter.add(engine.freebytes, p.length) + -- Zero packet data + ffi.fill(p.data, p.length) [...]

lukego · 2016-04-21T11:49:02Z

Looks like solid optimization work, and awesome performance :-).

eugeneia · 2016-04-21T15:06:44Z

src/core/packet.lua

@@ -102,6 +102,7 @@ function length (p) return p.length end
 -- Set packet data length.
 function resize (p, len)
   assert(len <= max_payload, "packet payload overflow")
+   ffi.fill(p.data + p.length, math.max(0, len - p.length))


@lukego:

This resize function seems potentially error-prone to me: if you resize the packet but don't fill in all of the data then you will be leaking unknown information from the previous user of the packet.

How about this: I take a slight performance hit (pretty synchronous >13Gbit/s with the changes) which is somewhat amortized by the now obsolete call to ffi.fill for zeroing padding.

Sounds good to me. Can be further optimized in a more complicated way if/when necessary.

Side note: "Gbps" metric is only meaningful when also saying which processor you are using. I don't know if you are testing a 2GHz CPU or a 4GHz CPU or something between. This is something we need to be careful of so that users don't deploy on a 2GHz machine and expect the same per-core performance that we have measured on a 4GHz machine. Just something to keep in mind.

Side note on side note: I wonder if bits per cycle (aka Gbps per GHz) could be a handy metric for this kind of thing. For example if you are seeing 13 Gigabits per second on a 3.5 GHz lugano server then you could say you are seeing 3.7 bits-per-cycle with 1KB packet size on Haswell.

I will also be interested in seeing comparative performance between Haswell (v3) and Ivy Bridge (v2) and Broadwell (v4). Haswell's AES-NI hardware is supposed to be twice as fast as Ivy Bridge for AES-GCM and I am not sure whether this improves again in Broadwell. Also Intel provide two reference implementations, one optimized for Ivy Bridge and one for Haswell, and we have only implemented the Haswell-oriented one and not checked how that performs on Ivy Bridge. (Intel say the Haswell code works on older CPUs but with lower performance -- I am assuming this is very slight but have not quantified.)

(These performance measurement activities are perhaps best addressed by just making sure that make benchmarks covers IPsec and waiting until we have Hydra running that on all available CPU generations. Morally we should not have to be doing this kind of testing by hand these days.)

This was indeed tested on lugano-1 (Intel(R) Xeon(R) CPU E5-1650 v3 @ 3.50GHz). I think the most interesting part is actually the (annotated) profiler output:

$ sudo lock ./snabb snabbmark esp 1e7 1024 encapsulate Fpv locking /var/lock/lab.. flock: getting lock took 0.000002 seconds 81% lib/ipsec/aes_128_gcm.lua:encrypt <- 100% TRACE 24 ->loop 4% lib/ipsec/esp.lua:encapsulate <- 92% TRACE 24 ->loop <- 8% JIT Compiler 3% program/snabbmark/snabbmark.lua:esp <- 100% Garbage Collector Encapsulation (packet size = 1024): 13.15 Gbit/s $ sudo lock ./snabb snabbmark esp 1e7 1024 decapsulate Fpv locking /var/lock/lab.. flock: getting lock took 0.000002 seconds 83% lib/ipsec/aes_128_gcm.lua:decrypt <- 100% TRACE 7 ->loop 5% lib/ipsec/esp.lua:decapsulate <- 100% TRACE 7 ->loop 3% program/snabbmark/snabbmark.lua:esp <- 94% Garbage Collector <- 6% TRACE 7 ->loop 3% core/packet.lua:clone <- 100% TRACE 7 ->loop Decapsulation (packet size = 1024): 13.86 Gbit/s

We see 80% of the time used by our AES DynASM routines (“a” for annotate shows this more definitively). I read this as 20% overhead due to ESP on 1KB packets (vs plain AES-GCM).

Add choice statement printer for YANG data

eugeneia added 5 commits March 21, 2016 18:29

Add `packet.resize'.

895d826

lib.ipsec.esp: Use `packet.resize' and raw mangling instead of datagram.

1369237

snabbmark esp: Operate on single packet. Make mode selectable and pro…

0b24342

…filing optional.

lib.ipsec.seq_no_t: Do not create redundant ctype objects.

36c3c93

See: snabbco#612 (comment)

lib.ipsec.esp: Replace check_seq_no' with track_seq_no' C function.

bc7f8df

eugeneia self-assigned this Mar 24, 2016

eugeneia mentioned this pull request Apr 20, 2016

[WIP] LISPER program #768

Merged

4 tasks

lukego reviewed Apr 21, 2016
View reviewed changes

eugeneia added 3 commits April 21, 2016 16:38

lib.ipsec: Use ffi.fill instead of memset.

5e7c86a

packet.resize: fill extended packet data with zeroes.

9dc44a7

lib.ipsec.esp: Omit obsolete padding fill.

41de2a2

eugeneia reviewed Apr 21, 2016
View reviewed changes

eugeneia merged commit 41de2a2 into snabbco:ipsec Apr 26, 2016

eugeneia added a commit that referenced this pull request Apr 26, 2016

Merge PR #845 (Optimize lib.ipsec.esp) into ipsec

5651f94

eugeneia added the merged label Apr 26, 2016

eugeneia mentioned this pull request Apr 27, 2016

lib.ipsec.esp: update API and fix sanitation bug #900

Merged

dpino pushed a commit to dpino/snabb that referenced this pull request Jun 28, 2017

Merge pull request snabbco#845 from Igalia/choice-printer

1ed6303

Add choice statement printer for YANG data

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize lib.ipsec.esp #845

Optimize lib.ipsec.esp #845

eugeneia commented Mar 24, 2016

lukego Apr 21, 2016

eugeneia Apr 21, 2016

lukego commented Apr 21, 2016

eugeneia Apr 21, 2016

lukego Apr 22, 2016

lukego Apr 22, 2016

lukego Apr 22, 2016

eugeneia Apr 25, 2016

Optimize lib.ipsec.esp #845

Optimize lib.ipsec.esp #845

Conversation

eugeneia commented Mar 24, 2016

lukego Apr 21, 2016

Choose a reason for hiding this comment

eugeneia Apr 21, 2016

Choose a reason for hiding this comment

lukego commented Apr 21, 2016

eugeneia Apr 21, 2016

Choose a reason for hiding this comment

lukego Apr 22, 2016

Choose a reason for hiding this comment

lukego Apr 22, 2016

Choose a reason for hiding this comment

lukego Apr 22, 2016

Choose a reason for hiding this comment

eugeneia Apr 25, 2016

Choose a reason for hiding this comment