The multibase base2 prefix is the character 0
. Canonical multibase-base2
encoded data will always be 8k+1
characters long.
A byte array is encoded to multibase base2 by prefixing its base2 encoding with
the character 0
.
A byte array is encoded to base2 by concatenating the base2 representations of each byte in the array, in order.
The base2 representation of a byte is a byte array of length 8 in big-endian
order, where each byte of the array is set to the character 1
, if the
corresponding bit in the byte is set, and the character 0
if the corresponding
bit is unset.
For example, [0x58, 0x59, 0x5a]
can be converted to multibase base2 as
follows:
map each byte to the base2 representation:
["01011000", "01011001", "01011010"]
concatenate:
"010110000101100101011010"
prefix with '0':
"0010110000101100101011010"
The canonical format of multibase base2 encoding is decoded by dropping the
multibase prefix (which must be the character 0
), dividing the remaining
character sequence into k sections of length 8, then converting each section
into it's corresponding byte value.
The byte value of each section is obtained by mapping each '0' or '1' character to a 0 or 1 bit in the resulting byte (MSB first).
For example, the byte value of "00110111"
is 0x37
.
The process, with input "00100000101000010"
:
drop the leading zero:
"0100000101000010"
divide to sections:
["01000001", "01000010"]
allocate output buffer:
[0x00, 0x00]
set the correct bits:
[0x41, 0x42] or "AB"
A non-canonical encoding of multibase base2 whose length is not equal to 8k+1 characters for some k may be canonicalized by left-padding the string with '0' characters (until the length is 8k+1). The string can then be decoded normally.
For example, the non-canonical encoding "011010"
may be canonicalized to
"000011010"
(decoded as [0x1a]
). A standard multibase implementation must
output base2 in the canonical encoding only.