feature: reusable fingerprinting interface #4628

lrstewart · 2024-06-25T15:07:37Z

Description of changes:

Adds a new fingerprint struct. The reasoning behind this struct is:

Reuse memory. Currently JA3 only needs to allocated memory for the s2n_hash_state, but JA4 will also need memory in order to sort lists.
More flexibility / future proofing. We now have two datapoints for what fingerprinting algorithms will look like, but they're very different. A struct will let us add new necessary inputs or outputs independently.

The new methods behave like the old methods, except:

The new "get_hash" method returns a hex string instead of raw bytes. JA4 isn't just one hash (it looks like prefix_hash_hash), so it can't be represented by raw bytes. A string is also more flexible for future methods.
I changed some customer-facing null checks from S2N_ERR_NULL (an internal error) to S2N_ERR_INVALID_ARGUMENT (a usage error).

This change is backwards compatible and should not break anything, even though fingerprinting is unstable, because we don't NEED to break anything yet. We may want to remove the legacy methods someday though.

Call-outs:

This PR is a little big, but I didn't want to split the Rust and C APIs because it's important that our C API can translate to reasonable, usable Rust. However, if it's difficult to review, I can split it.

Testing:

New unit tests for the new methods. The s2n_fingerprint_ja3_test continues to pass, as do the wireshark comparison fingerprint tests in the pcap crate.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

lrstewart · 2024-06-26T04:38:05Z

tls/s2n_fingerprint.h

+    /* Originally we represented the hash as the raw bytes of the digest.
+     * However, the JA4 hash is composed of a string prefix and two digests,
+     * so cannot reasonably be represented as raw bytes.
+     * Keep the old behavior for the old method.
+     */
+    unsigned int legacy_hash_format : 1;


I don't love this. The other options I considered:

Remove the old method. After all, it is marked unstable. But we know it's currently in use.

Translate the hex back to raw bytes in the old method. That seemed messy, since we'd translate to hex just to immediately translate back, but maybe that's better than these flags.

Hmm, personally I like the second option? I prefer complicating the legacy stuff to complicating the new stuff.

I think I agree, but I'd prefer to fix it in a follow-up PR.

Customers are actively using the legacy stuff, but not actively using the new stuff. snprintf / sscanf are a quick way to handle hex without complicating the code, but they're not exactly efficient. With a quick timing test, I've got the same number of operations taking 12s with the extra unnecessary snprintf + sscanf that took 9s before. Not a crazy slowdown, but also really not great.

I can get it down to 10s if I use the stuffer hex methods (+ LTO), but the stuffer hex methods are currently only for testing. Making them available for non-testing would complicate this PR. I can also actually keep the timing at 9s if I inline my own hex conversion logic, so I'd also want to investigate whether the stuffer hex methods are doing something inefficient.

lrstewart · 2024-06-26T04:40:46Z

api/unstable/fingerprint.h

+/**
+ * Calculates the raw string for a fingerprint.
+ *
+ * @param fingerprint The fingerprint to be used for the raw string
+ * @param max_output_size The maximum size of data that may be written to `output`.
+ * If `output` is too small, an S2N_ERR_T_USAGE error will occur.
+ * @param output The location that the requested hash will be written to.
+ * @param output_size Output variable to be set to the actual size of the data
+ * written to `output`.
+ * @returns S2N_SUCCESS on success, S2N_FAILURE on failure.
+ */
+S2N_API int s2n_fingerprint_get_raw(struct s2n_fingerprint *fingerprint,
+        uint32_t max_output_size, uint8_t *output, uint32_t *output_size);


I've gone back and forth on including or not including the raw / full string. Currently no customers are using it, but customers did originally ask for it. With the s2n_fingerprint_hash struct I don't think calculating it adds much complexity, but it does still add some complexity. If we want to remove the complexity, we also have to remove the old version of the method, but again this API is marked unstable.

It seems very possible that people might have some new usecase for this in the future, so I like having it included now to ensure that it always fits in well with the feature.

lrstewart · 2024-06-26T04:42:50Z

tls/s2n_fingerprint_ja3.c

    .hash = S2N_HASH_MD5,
+    /* The hash string is a single MD5 digest represented as hex */
+    .hash_str_size = MD5_DIGEST_LENGTH * S2N_HEX_PER_BYTE,


#4578 (comment) So I now remember why I had the hash size on the struct instead of the hash itself :) But two separate fields works just fine.

bindings/rust/s2n-tls/src/fingerprint.rs

api/unstable/fingerprint.h

jmayclin · 2024-06-28T17:10:36Z

api/unstable/fingerprint.h

+/**
+ * Calculates the raw string for a fingerprint.
+ *
+ * @param fingerprint The fingerprint to be used for the raw string
+ * @param max_output_size The maximum size of data that may be written to `output`.
+ * If `output` is too small, an S2N_ERR_T_USAGE error will occur.
+ * @param output The location that the requested hash will be written to.
+ * @param output_size Output variable to be set to the actual size of the data
+ * written to `output`.
+ * @returns S2N_SUCCESS on success, S2N_FAILURE on failure.
+ */
+S2N_API int s2n_fingerprint_get_raw(struct s2n_fingerprint *fingerprint,
+        uint32_t max_output_size, uint8_t *output, uint32_t *output_size);


It seems very possible that people might have some new usecase for this in the future, so I like having it included now to ensure that it always fits in well with the feature.

jmayclin · 2024-06-28T17:11:38Z

tls/s2n_fingerprint.h

+    /* Originally we represented the hash as the raw bytes of the digest.
+     * However, the JA4 hash is composed of a string prefix and two digests,
+     * so cannot reasonably be represented as raw bytes.
+     * Keep the old behavior for the old method.
+     */
+    unsigned int legacy_hash_format : 1;


Hmm, personally I like the second option? I prefer complicating the legacy stuff to complicating the new stuff.

bindings/rust/s2n-tls/src/fingerprint.rs

tls/s2n_fingerprint.c

bindings/rust/s2n-tls/src/fingerprint.rs

jmayclin · 2024-07-10T17:38:26Z

bindings/rust/s2n-tls/src/fingerprint.rs

+    }
+
+    #[test]
+    fn raw_may_allocate_memory() -> Result<(), Box<dyn Error>> {


ooo, these tests are fun, I like them.

jmayclin · 2024-07-10T17:45:52Z

api/unstable/fingerprint.h

+S2N_API int s2n_fingerprint_set_client_hello(struct s2n_fingerprint *fingerprint, struct s2n_client_hello *ch);
+
+/**
+ * Get the size of the fingerprint hash.


I think I'd prefer short over hash? It's less descriptive but I'd expect that to be to our benefit as we add more fingerprint types, e.g. if the short representation of some fingerprint in the future doesn't actually rely on a hash.

bindings/rust/s2n-tls/src/fingerprint.rs

api/unstable/fingerprint.h

tls/s2n_fingerprint.c

tls/s2n_fingerprint.h

tests/unit/s2n_fingerprint_test.c

bindings/rust/s2n-tls/src/fingerprint.rs

bindings/rust/s2n-tls/src/client_hello.rs

bindings/rust/s2n-tls/src/fingerprint.rs

github-actions bot added the s2n-core team label Jun 25, 2024

lrstewart force-pushed the ja4_2 branch 2 times, most recently from 409288b to f595aa8 Compare June 26, 2024 01:57

lrstewart commented Jun 26, 2024

View reviewed changes

lrstewart marked this pull request as ready for review June 26, 2024 04:43

lrstewart requested review from goatgoose, camshaft and jmayclin June 26, 2024 04:43

jmayclin reviewed Jun 28, 2024

View reviewed changes

bindings/rust/s2n-tls/src/fingerprint.rs Outdated Show resolved Hide resolved

lrstewart requested a review from jmayclin June 30, 2024 22:14

jmayclin reviewed Jul 10, 2024

View reviewed changes

bindings/rust/s2n-tls/src/fingerprint.rs Show resolved Hide resolved

lrstewart requested a review from jmayclin July 10, 2024 21:42

jmayclin approved these changes Jul 10, 2024

View reviewed changes

goatgoose reviewed Jul 11, 2024

View reviewed changes

lrstewart requested a review from goatgoose July 11, 2024 06:30

goatgoose approved these changes Jul 11, 2024

View reviewed changes

camshaft reviewed Jul 11, 2024

View reviewed changes

bindings/rust/s2n-tls/src/fingerprint.rs Show resolved Hide resolved

lrstewart added 7 commits July 11, 2024 11:52

feature: reusable fingerprinting

638d392

Rework Rust interface

69cc328

C doc / naming fixes

238aa4c

Allow pcap test to use deprecated methods for now

d3d9a46

Add multiple connection test

76554ef

PR comments

2d2a23b

newlines in comments

702213b

lrstewart force-pushed the ja4_2 branch from 50af7ab to 702213b Compare July 11, 2024 18:56

lrstewart requested a review from jmayclin July 11, 2024 18:58

jmayclin approved these changes Jul 11, 2024

View reviewed changes

camshaft approved these changes Jul 11, 2024

View reviewed changes

lrstewart enabled auto-merge (squash) July 11, 2024 19:58

lrstewart merged commit 2c4d012 into aws:main Jul 11, 2024
34 checks passed

lrstewart deleted the ja4_2 branch July 11, 2024 21:40

BrewTestBot mentioned this pull request Jul 19, 2024

s2n 1.4.18 Homebrew/homebrew-core#177876

Merged

lrstewart mentioned this pull request Jul 23, 2024

refactor: switch JA3 to use stuffer hex methods #4662

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feature: reusable fingerprinting interface #4628

feature: reusable fingerprinting interface #4628

lrstewart commented Jun 25, 2024 •

edited

Loading

lrstewart Jun 26, 2024 •

edited

Loading

jmayclin Jun 28, 2024 •

edited

Loading

lrstewart Jun 30, 2024 •

edited

Loading

lrstewart Jun 26, 2024 •

edited

Loading

jmayclin Jun 28, 2024

lrstewart Jun 26, 2024

jmayclin Jun 28, 2024

jmayclin Jun 28, 2024 •

edited

Loading

jmayclin Jul 10, 2024

jmayclin Jul 10, 2024

feature: reusable fingerprinting interface #4628

feature: reusable fingerprinting interface #4628

Conversation

lrstewart commented Jun 25, 2024 • edited Loading

Description of changes:

Call-outs:

Testing:

lrstewart Jun 26, 2024 • edited Loading

Choose a reason for hiding this comment

jmayclin Jun 28, 2024 • edited Loading

Choose a reason for hiding this comment

lrstewart Jun 30, 2024 • edited Loading

Choose a reason for hiding this comment

lrstewart Jun 26, 2024 • edited Loading

Choose a reason for hiding this comment

jmayclin Jun 28, 2024

Choose a reason for hiding this comment

lrstewart Jun 26, 2024

Choose a reason for hiding this comment

jmayclin Jun 28, 2024

Choose a reason for hiding this comment

jmayclin Jun 28, 2024 • edited Loading

Choose a reason for hiding this comment

jmayclin Jul 10, 2024

Choose a reason for hiding this comment

jmayclin Jul 10, 2024

Choose a reason for hiding this comment

lrstewart commented Jun 25, 2024 •

edited

Loading

lrstewart Jun 26, 2024 •

edited

Loading

jmayclin Jun 28, 2024 •

edited

Loading

lrstewart Jun 30, 2024 •

edited

Loading

lrstewart Jun 26, 2024 •

edited

Loading

jmayclin Jun 28, 2024 •

edited

Loading