-
Notifications
You must be signed in to change notification settings - Fork 717
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feature: reusable fingerprinting interface #4628
Merged
Merged
Changes from all commits
Commits
Show all changes
7 commits
Select commit
Hold shift + click to select a range
638d392
feature: reusable fingerprinting
lrstewart 69cc328
Rework Rust interface
lrstewart 238aa4c
C doc / naming fixes
lrstewart d3d9a46
Allow pcap test to use deprecated methods for now
lrstewart 76554ef
Add multiple connection test
lrstewart 2d2a23b
PR comments
lrstewart 702213b
newlines in comments
lrstewart File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's useful to clarify, otherwise I have a tiny question in the back of my mind about whether I need to multiply this by 2 to get enough space for the hex-encoded
s2n_fingerprint_get_hash
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That actually brings up a good point that I could use some opinions on:
I don't want to write "hex-encoded" because JA4 is only partially hex encoded. Here's a JA4 example: "t13i310900_e8f1e7e78f70_1f22a2ca17c4". Notice it starts with a prefix that isn't hex, then has two different truncated hashes that are hex.
But along the same vein, is "hash" even an accurate name for it anymore?! I considered just "fingerprint", but then the C looks like "s2n_fingerprint_get_fingerprint" which is kind of silly. I could leave the C as "s2n_fingerprint_get"? I also considered "short", so "s2n_fingerprint_get_short", but that doesn't seem great either. It doesn't have a name in Wireshark or the documentation-- it's just "JA3" or "JA4".
The longer version was called "full" / "fullstring" in JA3 but is now "raw" / "r" in JA4. I went with "raw" because I like it better without strong reasoning, but maybe there are other opinions on that too.
We basically have two values: one which is a known, reasonable length, and one which is variable length and can be unreasonably long. Based on how fingerprints are used, I think that separation will remain constant across methods, even though the exact format and contents of the strings will keep changing. Naming the two fields is just hard.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I'd prefer
short
overhash
? It's less descriptive but I'd expect that to be to our benefit as we add more fingerprint types, e.g. if the short representation of some fingerprint in the future doesn't actually rely on a hash.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm actually changing my mind here. While trying to think of more, better names for a fixed-size version of a variable length string, Google told me "hash". From wikipedia: "A hash function is any function that can be used to map data of arbitrary size to fixed-size values, though there are some hash functions that support variable length output." I guess we forget that more general definition because we always deal with cryptographic hash functions. But by that definition, even the JA4 string is a "hash", and "hash" is exactly the right name for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still a tiny bit worried that other people also mostly work with cryptographic hashes and would get confused, but if that feedback surfaces publicly it should be pretty easily solved with additional documentation or examples. Since we have the JA-3 hash form included in the doc comments, it should be pretty straightforward for any questioning users to unblock themselves.