-
Notifications
You must be signed in to change notification settings - Fork 570
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add named custom dictionary support #200
base: master
Are you sure you want to change the base?
Conversation
Are you thinking too complicated? The named dictionary for base64 has 65 characters, it includes the =. Also, it cannot replicate the padding with =, even if you provide only the real 64 characters, so it will also be different for the provided example. Even with the correct dictionary and when no padding is needed, I am not sure if it would produce the same output, since it works differently... With 64 characters it's simply an encoding with the base of 64, but not the base64. |
Good catch regarding the dictionary itself - I've fixed and pushed - but that doesn't make any appreciable difference to the output! The padding itself is a non-issue, it's sort of standard with a lot of baseXX encodings, but the decoders also are happy to generally ignore them, so making strictly correct base64 requires it, the decoder should work without (and it should look identical right up to those padding characters) :-) And I agree about the "working differently" part - but that's not how people will expect it to work. as a least I feel that using a bit-safe dictionary should make it look identical (so maybe something to choose to pass that bit-size and use I added the When choosing to use base16 with tattoo (chosen for the length) you get 1789 characters, and base64 has 1195 characters - close to 2/3, so it's definitely valid in that respect, and both the I hope that makes sense! :-D |
So your plan is to use That doesn't sound too complicated, I could do that, maybe at the weekend. |
That's sort of the fallback plan - will definitely work, but it introduces two code paths for it (which it sort of has already) - if I get the time I was wanting to see if there's anything that can be done to tweak the code to make it more "natural" - either yours (was wanting to look at if there's a simple transpose of the data that would result in it being "correct" which would then be a mathematical solution), or potentially looking at _compress itself and see if instead of taking a numeric bit-length, it could take that or a string dictionary and put it in there (though I think that might be more complicated, I'm not sure if that's not the best future-proof solution!) :-P |
The code in the custom dictionary encoders that actually encodes it into the dictionary doesn't do it as expected, I've not had enough time to find a solution, but it should encode the same regardless of the method used when you have a known dictionary - most specifically if you try to use the base64 dictionary (exported from the base64 folder) it should be identical to direct base64 encoding.
The difficulty is that some dictionaries are a valid bit-size of dictionary (ie, take the
dictionary.length
and see if it's a perfect binary representation, such as base64 being binary0b100 0000
- or 6 bits of data).For these it should be relatively easy to use the in-built
_compress
code and pass that bit length.When not using a perfect bit size it instead needs to use a rolling value to find the next bit used (as now) - but just like Pythagorus to Trigonometry, it should still be compatible.
This adds all the wrappers for it, and we need to have final code that allows these two to match (ignoring any
=
padding characters) -Which will also match the content of
test/data/tattoo/base64.bin
-