This file describes how data is encoded to binary and explain some rationale behind them.
Everything uses big-endian
An unsigned integer, stored in a variable number of bytes, depending on its value. This behavior lets small values (like 17
) fit in one byte and, at the same time, give support to (almost) 64 bits integers. Another advantage is that the user doesn't need to care about fixing the field size.
The down-sides of this design are:
- a more complex encoding/decoding process
- lost of compatibily with most tools due to this rather rare encoding.
The first matching rule from the list bellow should be used. This means, for example, encoding 0
with 16 bits is invalid.
- Integers greater or equal to
0
and less than2^7=128
are encoded asuint8
:
0xxx xxxx
(each char is a bit,x
is either 0 or 1) - Integers less than
2^14=16384
are encoded asuint16
, but with the first bit set:
10xx xxxx xxxx xxxx
- Integers less than
2^29=536870912
are encoded asuint32
, but with the first 2 bits set:
110x xxxx xxxx xxxx xxxx xxxx xxxx xxxx
- Integers less than
2^61=2305843009213693952
are encoded asuint64
, but with the first 3 bits set:111x xxxx xxxx xxxx xxxx xxxx xxxx xxxx xxxx xxxx xxxx xxxx xxxx xxxx xxxx xxxx
- Any other value should be treated as an error
A signed integer, store in a variable number of bytes.
The first matching rule from the list bellow should be used. This means, for example, encoding 0
with 16 bits is invalid.
- Integers greater or equal to
-2^6=-64
and less than2^6
are encoded asint8
, but with the first bit unset:
0xxx xxxx
(each char is a bit,x
is either 0 or 1) - Integers greater or equal to
-2^13=-8192
and less than2^13
are encoded asint16
, but with the first bit set and the second unset:
10xx xxxx xxxx xxxx
- Integers greater or equal to
-2^28=-268435456
and less than2^28
are encoded asint32
, but with the first 2 bits set and the third unset:
110x xxxx xxxx xxxx xxxx xxxx xxxx xxxx
- Integers greater or equal to
-2^60=-1152921504606846976
and less than2^60
are encoded asint64
, but with the first 3 bits set:111x xxxx xxxx xxxx xxxx xxxx xxxx xxxx xxxx xxxx xxxx xxxx xxxx xxxx xxxx xxxx
- Any other value should be treated as an error
A 64-bit floating point, many times referred to as double
, as spec'ed in IEEE 754
An unicode text string. It should be first converted into bytes (as spec'ed by UTF-8) and then encoded as Buffer
(see bellow)
A sequence of octets (bytes). First, the Buffer length (in bytes), len
, is encoded as uint
(see above) and appended to the result. After that, len
bytes follow (the Buffer content):
<uint_length> <buffer_data>
Either true
, encoded as the byte 0x01
, or false
, encoded as 0x00
.
Any JSON-compatible data. First the value is transformed in string by a JSON serialization algorithm (like JSON.stringify
). The resulting string is the encoded as a string
(see above).
A mongodb ObjectId, composed of 12 bytes. No encoding is actually needed, the 12 bytes are simply appended to the final result.
A JS-compatible regular expression, composed of:
source
: the regex source as a string (as returned by thesource
property in aRegExp
instance);flags
: a set from the universe{g, i, m}
. That is, each of those 3 flags are active or not.
First, the source
is encoded as a string
. After that, is appended the flag byte. The flag byte is a bit-mask: 0000 0mig
.
A date value, represented by a UNIX timestamp in milliseconds, encoded as a uint
.
A compound type is an ordered sequence of fields
. Each field
has three properties:
- its type
- whether it's optional or not
- whether it's an array or a single value
For each field
(following in order):
- if it's optional
- if the value is
empty
(see bellow)- append the
boolean
false
- continue to next field.
- append the
- else
- append the
boolean
true
- append the
- if the value is
- if it's a single value
- append
value
encoded as defined by the field'stype
- continue to next field
- append
- get the the array length
len
- append
len
encoded as anuint
- append each
value
in the array, encoded as defined by the field'stype
A value is said to be empty
if it's an equivalent of undefined
or null
. Empty string, empty array, empty Buffers, empty object, zeros, NaN, Infinity, etc are NOT said to be empty