Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature: reusable fingerprinting interface #4628

Merged
merged 7 commits into from
Jul 11, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
110 changes: 110 additions & 0 deletions api/unstable/fingerprint.h
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,116 @@ typedef enum {
S2N_FINGERPRINT_JA3,
} s2n_fingerprint_type;

struct s2n_fingerprint;

/**
* Create a reusable fingerprint structure.
*
* Fingerprinting is primarily used to identify malicious or abusive clients,
* so fingerprinting needs to be efficient and require minimal resources.
* The `s2n_client_hello_get_fingerprint_hash` and `s2n_client_hello_get_fingerprint_string`
* methods may require additional memory to calculate the fingerprint. Reusing
* the same `s2n_fingerprint` structure to calculate multiple fingerprints reduces
* the cost of each individual fingerprint.
*
* @param type The algorithm to use for the fingerprint.
* @returns S2N_SUCCESS on success, S2N_FAILURE on failure.
*/
S2N_API struct s2n_fingerprint *s2n_fingerprint_new(s2n_fingerprint_type type);

/**
* Frees the memory allocated by `s2n_fingerprint_new` for a fingerprint structure.
*
* @param fingerprint The s2n_fingerprint structure to be freed.
* @returns S2N_SUCCESS on success, S2N_FAILURE on failure.
*/
S2N_API int s2n_fingerprint_free(struct s2n_fingerprint **fingerprint);

/**
* Resets the fingerprint for safe reuse with a different ClientHello.
*
* @param fingerprint The s2n_fingerprint structure to be reset.
* @returns S2N_SUCCESS on success, S2N_FAILURE on failure.
*/
S2N_API int s2n_fingerprint_wipe(struct s2n_fingerprint *fingerprint);

/**
* Sets the ClientHello to be fingerprinted.
*
* @param fingerprint The s2n_fingerprint to be modified
* @param ch The client hello to be fingerprinted. It will not be copied, so needs
* to live at least as long as this fingerprinting operation.
* @returns S2N_SUCCESS on success, S2N_FAILURE on failure.
*/
S2N_API int s2n_fingerprint_set_client_hello(struct s2n_fingerprint *fingerprint, struct s2n_client_hello *ch);

/**
* Get the size of the fingerprint hash.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* Get the size of the fingerprint hash.
* Get the size of the hex-encoded fingerprint hash.

I think it's useful to clarify, otherwise I have a tiny question in the back of my mind about whether I need to multiply this by 2 to get enough space for the hex-encoded s2n_fingerprint_get_hash

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That actually brings up a good point that I could use some opinions on:

I don't want to write "hex-encoded" because JA4 is only partially hex encoded. Here's a JA4 example: "t13i310900_e8f1e7e78f70_1f22a2ca17c4". Notice it starts with a prefix that isn't hex, then has two different truncated hashes that are hex.

But along the same vein, is "hash" even an accurate name for it anymore?! I considered just "fingerprint", but then the C looks like "s2n_fingerprint_get_fingerprint" which is kind of silly. I could leave the C as "s2n_fingerprint_get"? I also considered "short", so "s2n_fingerprint_get_short", but that doesn't seem great either. It doesn't have a name in Wireshark or the documentation-- it's just "JA3" or "JA4".

The longer version was called "full" / "fullstring" in JA3 but is now "raw" / "r" in JA4. I went with "raw" because I like it better without strong reasoning, but maybe there are other opinions on that too.

We basically have two values: one which is a known, reasonable length, and one which is variable length and can be unreasonably long. Based on how fingerprints are used, I think that separation will remain constant across methods, even though the exact format and contents of the strings will keep changing. Naming the two fields is just hard.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'd prefer short over hash? It's less descriptive but I'd expect that to be to our benefit as we add more fingerprint types, e.g. if the short representation of some fingerprint in the future doesn't actually rely on a hash.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm actually changing my mind here. While trying to think of more, better names for a fixed-size version of a variable length string, Google told me "hash". From wikipedia: "A hash function is any function that can be used to map data of arbitrary size to fixed-size values, though there are some hash functions that support variable length output." I guess we forget that more general definition because we always deal with cryptographic hash functions. But by that definition, even the JA4 string is a "hash", and "hash" is exactly the right name for this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still a tiny bit worried that other people also mostly work with cryptographic hashes and would get confused, but if that feedback surfaces publicly it should be pretty easily solved with additional documentation or examples. Since we have the JA-3 hash form included in the doc comments, it should be pretty straightforward for any questioning users to unblock themselves.

*
* Fingerprint hashes should be a constant size, but that size will vary based
* on the fingerprinting method used.
*
* @param fingerprint The s2n_fingerprint to be used for the hash
* @param size Output variable to be set to the size of the hash
* @returns S2N_SUCCESS on success, S2N_FAILURE on failure.
*/
S2N_API int s2n_fingerprint_get_hash_size(const struct s2n_fingerprint *fingerprint, uint32_t *size);

/**
* Calculates a fingerprint hash.
lrstewart marked this conversation as resolved.
Show resolved Hide resolved
*
* The output of this method depends on the type of fingerprint.
*
* JA3: A hex-encoded string representing the MD5 hash of the raw string.
* - See https://engineering.salesforce.com/tls-fingerprinting-with-ja3-and-ja3s-247362855967
* - Example: "c34a54599a1fbaf1786aa6d633545a60"
*
* @param fingerprint The s2n_fingerprint to be used for the hash
* @param max_output_size The maximum size of data that may be written to `output`.
* If `output` is too small, an S2N_ERR_T_USAGE error will occur.
* @param output The location that the requested hash will be written to.
* @param output_size Output variable to be set to the actual size of the data
* written to `output`.
* @returns S2N_SUCCESS on success, S2N_FAILURE on failure.
*/
S2N_API int s2n_fingerprint_get_hash(struct s2n_fingerprint *fingerprint,
uint32_t max_output_size, uint8_t *output, uint32_t *output_size);

/**
* Get the size of the raw fingerprint string.
*
* The size of the raw string depends on the ClientHello and cannot be known
* without calculating the fingerprint. Either `s2n_fingerprint_get_hash` or
* `s2n_fingerprint_get_raw` must be called before this method.
*
* @param fingerprint The s2n_fingerprint to be used for the raw string
* @param size Output variable to be set to the size of the raw string
* @returns S2N_SUCCESS on success, S2N_FAILURE on failure.
*/
S2N_API int s2n_fingerprint_get_raw_size(const struct s2n_fingerprint *fingerprint, uint32_t *size);

/**
* Calculates the raw string for a fingerprint.
*
* The output of this method depends on the type of fingerprint.
*
* JA3: A string consisting of lists of decimal values.
* - See https://engineering.salesforce.com/tls-fingerprinting-with-ja3-and-ja3s-247362855967
* - Example: "771,4866-4867-4865-49196-49200-159-52393-52392-52394-49195-49199-158-
* 49188-49192-107-49187-49191-103-49162-49172-57-49161-49171-51-157-
* 156-61-60-53-47-255,11-10-35-22-23-13-43-45-51,29-23-30-25-24,0-1-2"
*
* @param fingerprint The s2n_fingerprint to be used for the raw string
* @param max_output_size The maximum size of data that may be written to `output`.
* If `output` is too small, an S2N_ERR_T_USAGE error will occur.
* @param output The location that the requested raw string will be written to.
* @param output_size Output variable to be set to the actual size of the data
* written to `output`.
* @returns S2N_SUCCESS on success, S2N_FAILURE on failure.
*/
S2N_API int s2n_fingerprint_get_raw(struct s2n_fingerprint *fingerprint,
uint32_t max_output_size, uint8_t *output, uint32_t *output_size);

/**
* Calculates a fingerprint hash for a given ClientHello.
*
Expand Down
261 changes: 5 additions & 256 deletions bindings/rust/s2n-tls/src/client_hello.rs
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,7 @@ impl ClientHello {
// we know that the get_hash and get_fingerprint methods do not mutate the
// data, and use mut pointers as a matter of convention because it makes
// working with s2n_stuffers and s2n_blobs easier.
fn deref_mut_ptr(&self) -> *mut s2n_client_hello {
pub(crate) fn deref_mut_ptr(&self) -> *mut s2n_client_hello {
&self.0 as *const s2n_client_hello as *mut s2n_client_hello
}

Expand Down Expand Up @@ -115,261 +115,6 @@ impl ClientHello {
}
}

#[cfg(feature = "unstable-fingerprint")]
pub use self::fingerprint::*;

// Fingerprinting is an unstable feature. This module can be removed and added
// to the client_hello module once we have settled on an implementation.
#[cfg(feature = "unstable-fingerprint")]
pub mod fingerprint {
use crate::error::{Error, Fallible};
use s2n_tls_sys::*;

use super::ClientHello;

#[non_exhaustive]
#[derive(Copy, Clone)]
pub enum FingerprintType {
JA3,
}

// this is the size of the MD5 hash digest that is used for the JA3 fingerprint
const MD5_HASH_SIZE: u32 = 16;

impl From<FingerprintType> for s2n_tls_sys::s2n_fingerprint_type::Type {
fn from(value: FingerprintType) -> Self {
match value {
FingerprintType::JA3 => s2n_tls_sys::s2n_fingerprint_type::FINGERPRINT_JA3,
}
}
}

impl ClientHello {
/// `fingerprint_hash` calculates the hash, and also returns the size
/// required for the full fingerprint string. The return value can be used
/// to construct a string of appropriate capacity to call
/// `fingerprint_string`. `output` will be extended if necessary to store
/// the full hash.
///
/// ```no_run
/// use s2n_tls::client_hello::{ClientHello, FingerprintType};
/// use s2n_tls::connection::Connection;
/// use s2n_tls::enums::Mode;
///
/// let mut conn = Connection::new(Mode::Server);
/// // handshake happens
/// let mut client_hello: &ClientHello = conn.client_hello().unwrap();
/// let mut hash = Vec::new();
/// let string_size = client_hello.fingerprint_hash(FingerprintType::JA3, &mut hash).unwrap();
/// // hash has been resized so that it can store the fingerprint hash
///
/// let mut string = String::with_capacity(string_size as usize);
/// // string will not be resized, and the method will fail with
/// // ErrorType::UsageError if the string doesn't have enough capacity
/// client_hello.fingerprint_string(FingerprintType::JA3, &mut string).unwrap();
/// ```
pub fn fingerprint_hash(
&self,
hash: FingerprintType,
output: &mut Vec<u8>,
) -> Result<u32, Error> {
let mut hash_size: u32 = 0;
let mut str_size: u32 = 0;
// make sure the vec has sufficient space for the hash
if output.capacity() < MD5_HASH_SIZE as usize {
output.reserve_exact(MD5_HASH_SIZE as usize - output.len());
}
unsafe {
s2n_client_hello_get_fingerprint_hash(
self.deref_mut_ptr(),
hash.into(),
MD5_HASH_SIZE,
output.as_mut_ptr(),
&mut hash_size,
&mut str_size,
)
.into_result()?;
// SAFETY: we wrote to the raw vec (using the mut pointer), and need
// to update the state of the vec to reflect the changes we made.
output.set_len(hash_size as usize);
};
Ok(str_size)
}

/// `fingerprint_string` will try to calculate the fingerprint and store the
/// resulting string in `output`. If `output` does not have sufficient
/// capacity an Error of `ErrorType::UsageError` will be returned.
pub fn fingerprint_string(
&self,
hash: FingerprintType,
output: &mut String,
) -> Result<(), Error> {
let mut output_size = 0;
unsafe {
s2n_tls_sys::s2n_client_hello_get_fingerprint_string(
self.deref_mut_ptr(),
hash.into(),
output.capacity() as u32,
output.as_mut_ptr(),
&mut output_size,
)
.into_result()?;
// SAFETY: update internal state of string to match the data written
// into it.
output.as_mut_vec().set_len(output_size as usize);
};
Ok(())
}
}

#[cfg(test)]
pub mod fingerprint_tests {
use crate::{
client_hello::{
fingerprint::{FingerprintType, MD5_HASH_SIZE},
ClientHello,
},
error::{Error, ErrorType},
security,
testing::TestPair,
};

/// This function is a test fixture used a generate a valid ClientHello so
/// that we don't have to copy and paste the raw bytes for test fixtures
fn get_client_hello_bytes() -> Result<Vec<u8>, crate::error::Error> {
let config = crate::testing::config_builder(&security::DEFAULT_TLS13)
.unwrap()
.build()?;
let mut pair = TestPair::from_config(&config);
pair.handshake()?;
// this doesn't have the handshake header
let client_hello_message = pair.server.client_hello()?.raw_message()?;
// handshake header is {tag: u8, client_hello_length: u24}
let mut client_hello = vec![0; 4];
// As long as the client hello is small, no bit fiddling is required
assert!(client_hello_message.len() < u8::MAX as usize);
// tag for handshake header
client_hello[0] = 1;
client_hello[3] = client_hello_message.len() as u8;
client_hello.extend(client_hello_message.iter());
Ok(client_hello)
}

fn known_test_case(
raw_client_hello: Vec<u8>,
expected_string: &str,
expected_hash_hex: &str,
) -> Result<(), Error> {
let expected_hash: Vec<u8> = hex::decode(expected_hash_hex).unwrap();
let client_hello =
ClientHello::parse_client_hello(raw_client_hello.as_slice()).unwrap();

let mut hash = Vec::new();
let string_size = client_hello
.fingerprint_hash(FingerprintType::JA3, &mut hash)
.unwrap();
assert_eq!(hash, expected_hash);

let mut string = String::with_capacity(string_size as usize);
client_hello
.fingerprint_string(FingerprintType::JA3, &mut string)
.unwrap();
assert_eq!(string, expected_string);
Ok(())
}

pub fn get_client_hello() -> Box<ClientHello> {
// sets up connection and handshakes
let raw_client_hello = get_client_hello_bytes();
ClientHello::parse_client_hello(raw_client_hello.unwrap().as_slice()).unwrap()
}

pub fn client_hello_bytes() -> Vec<u8> {
vec![
0x01, 0x00, 0x00, 0xEC, 0x03, 0x03, 0x90, 0xe8, 0xcc, 0xee, 0xe5, 0x70, 0xa2, 0xa1,
0x2f, 0x6b, 0x69, 0xd2, 0x66, 0x96, 0x0f, 0xcf, 0x20, 0xd5, 0x32, 0x6e, 0xc4, 0xb2,
0x8c, 0xc7, 0xbd, 0x0a, 0x06, 0xc2, 0xa5, 0x14, 0xfc, 0x34, 0x20, 0xaf, 0x72, 0xbf,
0x39, 0x99, 0xfb, 0x20, 0x70, 0xc3, 0x10, 0x83, 0x0c, 0xee, 0xfb, 0xfa, 0x72, 0xcc,
0x5d, 0xa8, 0x99, 0xb4, 0xc5, 0x53, 0xd6, 0x3d, 0xa0, 0x53, 0x7a, 0x5c, 0xbc, 0xf5,
0x0b, 0x00, 0x1e, 0xc0, 0x2b, 0xc0, 0x2f, 0xcc, 0xa9, 0xcc, 0xa8, 0xc0, 0x2c, 0xc0,
0x30, 0xc0, 0x0a, 0xc0, 0x09, 0xc0, 0x13, 0xc0, 0x14, 0x00, 0x33, 0x00, 0x39, 0x00,
0x2f, 0x00, 0x35, 0x00, 0x0a, 0x01, 0x00, 0x00, 0x85, 0x00, 0x00, 0x00, 0x23, 0x00,
0x21, 0x00, 0x00, 0x1e, 0x69, 0x6e, 0x63, 0x6f, 0x6d, 0x69, 0x6e, 0x67, 0x2e, 0x74,
0x65, 0x6c, 0x65, 0x6d, 0x65, 0x74, 0x72, 0x79, 0x2e, 0x6d, 0x6f, 0x7a, 0x69, 0x6c,
0x6c, 0x61, 0x2e, 0x6f, 0x72, 0x67, 0x00, 0x17, 0x00, 0x00, 0xff, 0x01, 0x00, 0x01,
0x00, 0x00, 0x0a, 0x00, 0x0a, 0x00, 0x08, 0x00, 0x1d, 0x00, 0x17, 0x00, 0x18, 0x00,
0x19, 0x00, 0x0b, 0x00, 0x02, 0x01, 0x00, 0x00, 0x23, 0x00, 0x00, 0x00, 0x10, 0x00,
0x0e, 0x00, 0x0c, 0x02, 0x68, 0x32, 0x08, 0x68, 0x74, 0x74, 0x70, 0x2f, 0x31, 0x2e,
0x31, 0x00, 0x05, 0x00, 0x05, 0x01, 0x00, 0x00, 0x00, 0x00, 0x00, 0x0d, 0x00, 0x18,
0x00, 0x16, 0x04, 0x03, 0x05, 0x03, 0x06, 0x03, 0x08, 0x04, 0x08, 0x05, 0x08, 0x06,
0x04, 0x01, 0x05, 0x01, 0x06, 0x01, 0x02, 0x03, 0x02, 0x01, 0x00, 0x1c, 0x00, 0x02,
0x40, 0x00,
]
}

// test that a fingerprint can successfully be calculated from ClientHellos
// returned from a connection
#[checkers::test]
fn io_fingerprint_test() {
let config = crate::testing::config_builder(&security::DEFAULT_TLS13)
.unwrap()
.build()
.unwrap();
let mut pair = TestPair::from_config(&config);

// client_hellos can not be accessed before the handshake
assert!(pair.client.client_hello().is_err());
assert!(pair.server.client_hello().is_err());

pair.handshake().unwrap();

let client_hello = pair.server.client_hello().unwrap();
let mut hash = Vec::new();
let fingerprint_size = client_hello
.fingerprint_hash(FingerprintType::JA3, &mut hash)
.unwrap();
let mut string = String::with_capacity(fingerprint_size as usize);
client_hello
.fingerprint_string(FingerprintType::JA3, &mut string)
.unwrap();
}

// known value test case copied from s2n_fingerprint_ja3_test.c
#[checkers::test]
fn valid_client_bytes() {
let raw_client_hello = client_hello_bytes();
let expected_fingerprint = "771,49195-49199-52393-52392-49196-49200-\
49162-49161-49171-49172-51-57-47-53-10,0-\
23-65281-10-11-35-16-5-13-28,29-23-24-25,0";
let expected_hash_hex = "839bbe3ed07fed922ded5aaf714d6842";
known_test_case(raw_client_hello, expected_fingerprint, expected_hash_hex).unwrap();
}

#[test]
fn hash_output_resizing() {
let client_hello = get_client_hello();
let hash_capacities = vec![0, MD5_HASH_SIZE, 1_000];
for initial_size in hash_capacities {
let mut hash = Vec::with_capacity(initial_size as usize);
client_hello
.fingerprint_hash(FingerprintType::JA3, &mut hash)
.unwrap();
assert_eq!(hash.len(), MD5_HASH_SIZE as usize);
}
}

#[test]
fn string_output_too_small() {
let client_hello = get_client_hello();
let mut fingerprint_string = String::with_capacity(0);
let fingerprint_err = client_hello
.fingerprint_string(FingerprintType::JA3, &mut fingerprint_string)
.unwrap_err();
assert_eq!(fingerprint_err.kind(), ErrorType::UsageError);
}
}
}

impl Drop for ClientHello {
fn drop(&mut self) {
let mut client_hello: *mut s2n_client_hello = &mut self.0;
Expand Down Expand Up @@ -432,3 +177,7 @@ mod tests {
assert_eq!("incoming.telemetry.mozilla.org".as_bytes(), server_name);
}
}

// Leftover from when fingerprinting was implemented in this module
#[cfg(feature = "unstable-fingerprint")]
pub use crate::fingerprint::FingerprintType;
goatgoose marked this conversation as resolved.
Show resolved Hide resolved
Loading
Loading