-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Alternative proposal for Hashspace ID Values #143
Comments
Alternative from me: Logic for Namespace is a specific UUIDv1 Increment by UUID in least significant position.
|
Paul's proposal, TEXT to HEX, is tough because the current hashspace ID labels for a value is at minimum 16 characters and at max 24 characters after encoding in hex. (See the end)
Text to Hex
Edit: I could change the names labels, remove underscore but it does not scale nicely unless they can all be 12-15 chars after the encoding or one must navigate the Ver/Var hex. |
For my online service I have done the following approach:
which results in the following UUIDs:
What do you think about it? |
I know that not being a hash function expert, I shouldn't even question the advantage and simplicity of incrementing a base UUID. However, in terms of avalanche effect, which is better: changing only 1 bit or I also know that it is not a requirement that an ID produced with a hashspace has a very high probability of not clashing with IDs produced with a different hashspace. SHA-x algorithms already guarantee that changing 1 bit in the input produces a drastically different output, so this probability must be extremely low to be taken into account. However, if it is possible to maximize this effect by changing as many input bits as possible, wouldn't that be more desirable? -- EDIT: I crossed out the "changing 128 bits" phrase because changing 128 bits means inverting all bits, which is the result of an XOR operation. It seems more appropriate to change more than 1 bit, but not all. |
Sounds logical to me and easy to describe as you only need to define the namespace UUID that will be used with the UUIDv5 function to generate the hashspaces for each algorithm name. However, it is necessary to use the "canonical name" of each algorithm, which implies text encoding (UTF-8, ASCII etc), case sensitivity (uppercase, lowercase), use (or not) of "non-word" characters (dash, space), etc. |
Yes, the canonical name is indeed my concern, too. Proposal 1 I have the following idea:
Here is a proposal (I have calculated the UUIDv5, but please double-check them):
|
I prefer to keep the previous defined UUIDv4-based hashspaces, but I think this UUIDv5 mechanism is a better way to define pseudo-random or "random-looking" hashspaces which can be easily reproduced to define new hashspaces for cryptographic hash functions that could not be included in the document. I just don't know which are the current canonical names for the SHA-2 family. For example, Wikipedia and Java use SHA-256 (with a dash), but not SHA2_256 (with a 2 and an underline). P.S.: can we use this document as a reference?: https://csrc.nist.gov/files/pubs/fips/180-4/upd1/final/docs/fips180-4-draft-aug2014.pdf |
Proposal 2 There is another method which does not rely on canonical names (or even English language) at all. A lot of hash algorithms are identified by OIDs. Some of them are located in this arc: We could use a UUIDv5 with namespace OID (6ba7b812-9dad-11d1-80b4-00c04fd430c8) Here is my proposal:
(Edit after my initial post: Changed the algorithm names as defined by FIPS180-4 and FIPS202) (Note to self:) Here is a list of Algorithms/OIDs I have found:
|
I think it's way better. Why not using the URN notation in lowercase mode only, e.g. |
@fabiolimace Are you confused about my notation |
Sorry I meant the string "urn:oid:2.16.840.1.101.3.4.2.4" as the name input for the UUIDv5 function. This:
Not this:
But I'm not sure if it's important. |
UUIDv5 requires two parameters: Namespace ID and Payload. So the notation The full notation of Edit: I have changed my proposal to |
Yes, I noticed that the namespace parameter was implicit. |
I completely agree now. ❤️ -- P.S. |
Yes, it breaks some implementations, including mine. But after all, Internet Drafts are supposed to change. :-) |
@danielmarschall, Your proposal of Checking against your list earlier:
RIPEMD may have two and SNEFRU does not have one that I can find?
I can change them to the NIST document items easy enough. I added the "2" so they were somewhat inline with SHA3 from a formatting perspective and I I swapped the "/" char for an underscore. Underscores were used because they matched the underscores used in the namespace items. But I am not partial. I can change them to the following as defined by FIPS180-4 and FIPS202
|
The new names according to FIPS180-4 and FIPS202 look good to me. About algorithms with multiple OIDs, I would try to find the "official" ones. About algorithms without known OID, I think this could be out-of-scope. I am not sure if my proposal 1 (that used algorithm names in a custom namespace, |
I found a list of OIDs here (extracted from github.com/openssl):
Links:
EDIT: GOST OIDs are already in kyzer's list. |
It might be a bit off-topic, but I am very confused about the implementations in PHP.
(Edit: Found the solution)
|
@danielmarschall, personally I like proposal 2 of the OIDs because they are "well formatted" that is they are a set of "numbers and a dots". Proposal 1 has the challenge that SHA256, sha256, sha-256, SHA-256 all produce different hashes and proposal 2 removes that. Proposal 2 has the challenges I listed but the points may be moot as many of the items we are discussing are algos nobody will likely ever use... kydavis@ubuntu-web-server:~$ echo -n "SHA256" | sha256sum
b3abe5d8c69b38733ad57ea75e83bcae42bbbbac75e3a5445862ed2f8a2cd677 -
kydavis@ubuntu-web-server:~$ echo -n "SHA-256" | sha256sum
bbd07c4fc02c99b97124febf42c7b63b5011c0df28d409fbb486b5a9d2e615ea -
kydavis@ubuntu-web-server:~$ echo -n "sha256" | sha256sum
5d5b09f6dcb2d53a5fffc60c4ac0d55fabdf556069d6631545f42aa6e3500f2e -
kydavis@ubuntu-web-server:~$ echo -n "sha-256" | sha256sum
3128f8ac2988e171a53782b144b98a5c2ee723489c8b220cece002916fbc71e2 - |
@kyzer-davis Are you referring to the small discussion(s) about HAVAL and GOST and my long OID list above? Don't worry, they were just part of my personal evaluation process to find out if proposal 1 or proposal 2 are better in regards to the Non-NIST algorithms, because you mentioned missing and ambiguous OIDs, so I was wondering if this is a serious issue or not. I don't propose that GOST, HAVAL, Tiger, ... get added to the RFC. To avoid confusion in this large thread, here is my proposed text (Proposal 2):
Since the lines are too long for RFC, here is a variant with line breaks:
Another format that does not use the
@kyzer-davis If you agree, can you please add one of these to a pull request? Thank you very much! |
Another way to demonstrate the hashspaces is to show a predefined list followed by the pseudocode used to generate the list. I find it (almost) impossible to have doubts about how the list was generated. Separating the list from the steps to generate it takes less "cognitive effort", in my opinion. Predefined list of hashspaces:
Pseudocode to derive hashspaces from message digest OIDs: # array of message digest OIDs
OID["SHA-224"] = "2.16.840.1.101.3.4.2.4"
OID["SHA-256"] = "2.16.840.1.101.3.4.2.1"
OID["SHA-384"] = "2.16.840.1.101.3.4.2.2"
OID["SHA-512"] = "2.16.840.1.101.3.4.2.3"
OID["SHA-512/224"] = "2.16.840.1.101.3.4.2.5"
OID["SHA-512/256"] = "2.16.840.1.101.3.4.2.6"
OID["SHA3-224"] = "2.16.840.1.101.3.4.2.7"
OID["SHA3-256"] = "2.16.840.1.101.3.4.2.8"
OID["SHA3-384"] = "2.16.840.1.101.3.4.2.9"
OID["SHA3-512"] = "2.16.840.1.101.3.4.2.10"
OID["SHAKE128"] = "2.16.840.1.101.3.4.2.11"
OID["SHAKE256"] = "2.16.840.1.101.3.4.2.12"
# function do derive hashspaces from message digest OIDs
function hashspace(algo) { return UUIDv5(NAMESPACE_OID, OID[algo]) } Note: the pseudocode is based on AWK syntax. Implementers can simply copy the pseudocode and change it to suit the target language syntax. If I was the implementer, I would appreciate it. |
Got it @fabiolimace and @danielmarschall. Don't worry about formatting, I will get that figured out. Could end up as some ascii, some table, etc. PR will likely happen next week. Finally, depending on how the discussion over in #144 shakes out one could possibly add a new hashspace ID to the IANA registry without needing a full on spec to do so. Just needs to be defined by the way we say in this doc and then added to that table. |
I'm somewhat concerned about this OID + UUIDv5 approach because:
I think v4 based IDs are simpler and safer. |
I've tried a few times, but I always fail miserably because I don't have the statistical knowledge to give an answer. I always end up, in my naive attempts, trusting in the principle of Saint Thomas: seeing is believing. However, I can't see any difference with my eyes.
EDIT: I crossed out the text because I realized I misunderstood the sentence. Please ignore. (but the question still remains) |
The crossed out question is a different topic but I think is a very good question, which neither do I have an answer to. Please take a look at several posts relating to FIPS stuff following my original post about the hash space approach. |
I can understand your concern that UUIDv5 is using a deprecated hash algorithm. But I think it is very useful that the hash space is not just random, but connected with the algorithm. Imagine IANA does not have that hash listed. By using random UUIDv4, someone needs to choose/generate a hash space id, and IANA needs to add it. Maybe IANA even insists that a RFC is written that defines the hash space ID. But do you think every developer who wants to use a Non-NIST hash will contact IANA or even write a RFC? A lot of algorithms have OIDs. This is important for some technologies like X.509. By having the hash space (optionally) be derivated from the OID means that two developers can hash using HAVAL-3-128, and since HAVAL-3-128 has OID "1.3.6.1.4.1.18105.2.1.1.1", both implementations output the same UUID. Without writing a RFC, without contacting IANA. |
In my opinion, such a new hash function must be registered through a formal process (by a separate RFC or IANA registry, I don't know) unless the new UUID RFC specifies the algorithm to derive a hash space ID in a normative manner. Otherwise, the de facto hash space ID crafted by future implementers will be put on an uncertain state. So far, the name-based v8 is just an example of v8 implementation techniques, and we will have no time to put this in the normative section. With this in mind, we shouldn't create any expectations related to the future hash space IDs. UUIDv4-based hash space IDs do require a formal process to ratify new hash functions, and accordingly give the full control over the UUID specification to the future spec authors to recommend one hashing algorithm and discourage another. |
My naive and scattered thoughts:
Would it be better to identify these hashspaces using v7? |
EDIT: v4 is better because of its randomness. Hash space IDs are passed to another hash function so should be very different from each other. |
I saw the OID proposal above, and I'd like to second that. This would allow 3rd parties can also define new Hashspace UUIDs, if they have an OID they can control (and hand out sub-OIDs from), which they can get from the IANA. It would also allow users of v9 to substitute a v5 UUID in out-of-band transport with simply the OID for the algorithm itself. The main risk of doing this, in my opinion, without a centralized registry is that one algorithm might end up with 2 different OIDs in different contexts. If this route is taken, there should be guidance to avoid anti-collisions. |
Since it's v8, any third party can generate a UUID and use it in their application as a hashspace ID for any hash function. Perhaps, we should expand the following statement in Section 6.5 to clarify that any user-defined UUID value may be used as a hashspace ID within an application context. This point is not sufficiently clear in the current draft, despite #132.
Within an implementation can the implementer do whatever they want, but a standard has to focus in the coordination of such implementations. What if SHA-4 has a parameter that is not expressed in the OID? What if a widespread implementation applies SHA-5 differently than expected? These circumstances may risk the future interoperability under the OID-based hashspace scheme. Plus, observing such a situation, future RFC authors might even avoid ratifying an OID-based hashspace ID because officially specifying the meaning of widely used hashspace ID can destroy the existing implementations. |
Getting caught up on these longer threads after being out unexpectedly. |
If you do not have a datatracker.ietf.org login, please get one, as you'll need it for the virtual interim. That's the only barrier to participation. Slides uploaded to datatracker would also be appreciated. |
The `GUID.v8()` method is no longer supported due to recent sudden changes in the UUIDv8 discussions. It will be removed when the new RFC is finally published. See the latest discussions about UUIDv8: * ietf-wg-uuidrev/rfc4122bis#143 * ietf-wg-uuidrev/rfc4122bis#144 * ietf-wg-uuidrev/rfc4122bis#147
@mcr "Slides uploaded to datatracker would also be appreciated." yeah, I will get some to the chairs this week! |
if @danielmarschall or others still feel they want v9, then they also need to explain the proposal in a slide or two. |
From Paul Wouters
Current, All random
The text was updated successfully, but these errors were encountered: