Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature: reusable fingerprinting interface #4628

Merged
merged 7 commits into from
Jul 11, 2024
Merged

Conversation

lrstewart
Copy link
Contributor

@lrstewart lrstewart commented Jun 25, 2024

Description of changes:

Adds a new fingerprint struct. The reasoning behind this struct is:

  • Reuse memory. Currently JA3 only needs to allocated memory for the s2n_hash_state, but JA4 will also need memory in order to sort lists.
  • More flexibility / future proofing. We now have two datapoints for what fingerprinting algorithms will look like, but they're very different. A struct will let us add new necessary inputs or outputs independently.

The new methods behave like the old methods, except:

  • The new "get_hash" method returns a hex string instead of raw bytes. JA4 isn't just one hash (it looks like prefix_hash_hash), so it can't be represented by raw bytes. A string is also more flexible for future methods.
  • I changed some customer-facing null checks from S2N_ERR_NULL (an internal error) to S2N_ERR_INVALID_ARGUMENT (a usage error).

This change is backwards compatible and should not break anything, even though fingerprinting is unstable, because we don't NEED to break anything yet. We may want to remove the legacy methods someday though.

Call-outs:

This PR is a little big, but I didn't want to split the Rust and C APIs because it's important that our C API can translate to reasonable, usable Rust. However, if it's difficult to review, I can split it.

Testing:

New unit tests for the new methods. The s2n_fingerprint_ja3_test continues to pass, as do the wireshark comparison fingerprint tests in the pcap crate.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@github-actions github-actions bot added the s2n-core team label Jun 25, 2024
@lrstewart lrstewart force-pushed the ja4_2 branch 2 times, most recently from 409288b to f595aa8 Compare June 26, 2024 01:57
Comment on lines +30 to +35
/* Originally we represented the hash as the raw bytes of the digest.
* However, the JA4 hash is composed of a string prefix and two digests,
* so cannot reasonably be represented as raw bytes.
* Keep the old behavior for the old method.
*/
unsigned int legacy_hash_format : 1;
Copy link
Contributor Author

@lrstewart lrstewart Jun 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't love this. The other options I considered:

  • Remove the old method. After all, it is marked unstable. But we know it's currently in use.
  • Translate the hex back to raw bytes in the old method. That seemed messy, since we'd translate to hex just to immediately translate back, but maybe that's better than these flags.

Copy link
Contributor

@jmayclin jmayclin Jun 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, personally I like the second option? I prefer complicating the legacy stuff to complicating the new stuff.

Copy link
Contributor Author

@lrstewart lrstewart Jun 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I agree, but I'd prefer to fix it in a follow-up PR.

Customers are actively using the legacy stuff, but not actively using the new stuff. snprintf / sscanf are a quick way to handle hex without complicating the code, but they're not exactly efficient. With a quick timing test, I've got the same number of operations taking 12s with the extra unnecessary snprintf + sscanf that took 9s before. Not a crazy slowdown, but also really not great.

I can get it down to 10s if I use the stuffer hex methods (+ LTO), but the stuffer hex methods are currently only for testing. Making them available for non-testing would complicate this PR. I can also actually keep the timing at 9s if I inline my own hex conversion logic, so I'd also want to investigate whether the stuffer hex methods are doing something inefficient.

Comment on lines 120 to 146
/**
* Calculates the raw string for a fingerprint.
*
* @param fingerprint The fingerprint to be used for the raw string
* @param max_output_size The maximum size of data that may be written to `output`.
* If `output` is too small, an S2N_ERR_T_USAGE error will occur.
* @param output The location that the requested hash will be written to.
* @param output_size Output variable to be set to the actual size of the data
* written to `output`.
* @returns S2N_SUCCESS on success, S2N_FAILURE on failure.
*/
S2N_API int s2n_fingerprint_get_raw(struct s2n_fingerprint *fingerprint,
uint32_t max_output_size, uint8_t *output, uint32_t *output_size);
Copy link
Contributor Author

@lrstewart lrstewart Jun 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've gone back and forth on including or not including the raw / full string. Currently no customers are using it, but customers did originally ask for it. With the s2n_fingerprint_hash struct I don't think calculating it adds much complexity, but it does still add some complexity. If we want to remove the complexity, we also have to remove the old version of the method, but again this API is marked unstable.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems very possible that people might have some new usecase for this in the future, so I like having it included now to ensure that it always fits in well with the feature.

Comment on lines 223 to +225
.hash = S2N_HASH_MD5,
/* The hash string is a single MD5 digest represented as hex */
.hash_str_size = MD5_DIGEST_LENGTH * S2N_HEX_PER_BYTE,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#4578 (comment) So I now remember why I had the hash size on the struct instead of the hash itself :) But two separate fields works just fine.

@lrstewart lrstewart marked this pull request as ready for review June 26, 2024 04:43
bindings/rust/s2n-tls/src/fingerprint.rs Show resolved Hide resolved
bindings/rust/s2n-tls/src/fingerprint.rs Outdated Show resolved Hide resolved
bindings/rust/s2n-tls/src/fingerprint.rs Outdated Show resolved Hide resolved
bindings/rust/s2n-tls/src/fingerprint.rs Outdated Show resolved Hide resolved
api/unstable/fingerprint.h Show resolved Hide resolved
Comment on lines 120 to 146
/**
* Calculates the raw string for a fingerprint.
*
* @param fingerprint The fingerprint to be used for the raw string
* @param max_output_size The maximum size of data that may be written to `output`.
* If `output` is too small, an S2N_ERR_T_USAGE error will occur.
* @param output The location that the requested hash will be written to.
* @param output_size Output variable to be set to the actual size of the data
* written to `output`.
* @returns S2N_SUCCESS on success, S2N_FAILURE on failure.
*/
S2N_API int s2n_fingerprint_get_raw(struct s2n_fingerprint *fingerprint,
uint32_t max_output_size, uint8_t *output, uint32_t *output_size);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems very possible that people might have some new usecase for this in the future, so I like having it included now to ensure that it always fits in well with the feature.

Comment on lines +30 to +35
/* Originally we represented the hash as the raw bytes of the digest.
* However, the JA4 hash is composed of a string prefix and two digests,
* so cannot reasonably be represented as raw bytes.
* Keep the old behavior for the old method.
*/
unsigned int legacy_hash_format : 1;
Copy link
Contributor

@jmayclin jmayclin Jun 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, personally I like the second option? I prefer complicating the legacy stuff to complicating the new stuff.

bindings/rust/s2n-tls/src/fingerprint.rs Outdated Show resolved Hide resolved
bindings/rust/s2n-tls/src/fingerprint.rs Show resolved Hide resolved
tls/s2n_fingerprint.c Outdated Show resolved Hide resolved
@lrstewart lrstewart requested a review from jmayclin June 30, 2024 22:14
}

#[test]
fn raw_may_allocate_memory() -> Result<(), Box<dyn Error>> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ooo, these tests are fun, I like them.

S2N_API int s2n_fingerprint_set_client_hello(struct s2n_fingerprint *fingerprint, struct s2n_client_hello *ch);

/**
* Get the size of the fingerprint hash.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'd prefer short over hash? It's less descriptive but I'd expect that to be to our benefit as we add more fingerprint types, e.g. if the short representation of some fingerprint in the future doesn't actually rely on a hash.

@lrstewart lrstewart requested a review from jmayclin July 10, 2024 21:42
api/unstable/fingerprint.h Outdated Show resolved Hide resolved
tls/s2n_fingerprint.c Show resolved Hide resolved
tls/s2n_fingerprint.h Show resolved Hide resolved
tests/unit/s2n_fingerprint_test.c Outdated Show resolved Hide resolved
tests/unit/s2n_fingerprint_test.c Outdated Show resolved Hide resolved
tests/unit/s2n_fingerprint_test.c Outdated Show resolved Hide resolved
tests/unit/s2n_fingerprint_test.c Show resolved Hide resolved
bindings/rust/s2n-tls/src/fingerprint.rs Show resolved Hide resolved
bindings/rust/s2n-tls/src/client_hello.rs Show resolved Hide resolved
@lrstewart lrstewart requested a review from goatgoose July 11, 2024 06:30
@lrstewart lrstewart enabled auto-merge (squash) July 11, 2024 19:58
@lrstewart lrstewart merged commit 2c4d012 into aws:main Jul 11, 2024
34 checks passed
@lrstewart lrstewart deleted the ja4_2 branch July 11, 2024 21:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants