Skip to content

v6.10.0

Compare
Choose a tag to compare
@github-actions github-actions released this 19 Nov 17:49
2e86183

6.10.0 (2024-11-18)

The MongoDB Node.js team is pleased to announce version 6.10.0 of the bson package!

Release Notes

BSON Binary Vector Support!

The Binary class has new helpers to assist with using the newly minted Vector sub_type of Binary sub_type == 9 🎉! For more on how these types can be used with MongoDB take a look at How to Ingest Quantized Vectors!

Here's a summary of the API:

class Binary {
  toInt8Array(): Int8Array;
  toFloat32Array(): Float32Array;
  toPackedBits(): Uint8Array;

  static fromInt8Array(array: Int8Array): Binary;
  static fromFloat32Array(array: Float32Array): Binary;
  static fromPackedBits(array: Uint8Array, padding: number = 0): Binary;
}

Relatively self-explanatory: each one supports converting to and constructing from a native Javascript data type that corresponds to one of the three vector types: Int8, Float32, PackedBit.

Vector Bytes Format

When a Binary is sub_type 9 the first two bytes are set to important metadata about the vector.

  • binary.buffer[0] - The datatype that indicates what the following bytes are.
  • binary.buffer[1] - The padding amount, a value 0-7 that indicates how many bits to ignore in a PackedBit vector.

Packed Bits 📦

static fromPackedBits(array: Uint8Array, padding: number = 0)

When handling packed bits, the last byte may not be entirely used. For example, a PackedBit vector = [0xFF, 0xF0] with padding = 4 ignores those last four 0s making the bit vector logically equal to 12 ones.

    F    F    F    0
[1111 1111 1111]   // ignored: the four 0s are padding

Important

When using the fromPackedBits method to set your padding amount to avoid inadvertently extending your bit vector.

Unpacking Bits 🧳

Packed bits get special treatment with two styles of conversion methods to suit your vector-y needs. toBits will return individually addressable bits shifted apart into an array. fromBits takes the same format in reverse and packs the bits into bytes.

Notice there is no argument to set the padding. That is because it can be determined by the array's length. Recall those 12 ones from the previous example, well, the padding has to be 4 to reach a multiple of 8.

class Binary {
  toBits(): Int8Array;
  static fromBits(bits: ArrayLike<number>): Binary;
}

Caution

We highly encourage using ONLY these methods to interact with vector data and avoid operating directly on the byte format. Other Binary class methods (put(), write() read(), and value()) and direct access of data in a Binary's buffer beyond the 1st index should only be used in exceptional circumstances and with extreme caution after closely consulting the BSON Vector specification.

Details to keep in mind

  • A javascript engine's endianness is platform dependent whereas BSON is always in little-endian format so if viewing bytes as Float32s take care to re-order bytes as needed.
  • Int8 vectors are signed bytes but read() always returns unsigned bytes.
  • The vector data begins at offset 2.

Binary's read() returns a view of Binary.buffer

Binary's read() return type claimed it would return number[] or Uint8Array which was true in previous BSON versions that didn't always store a Uint8Array on the buffer property like Binary does today.

read()'s length parameter did not respect the position value allowing reading bytes beyond the data that is actually stored in the Binary. This has been corrected.

Additionally, this method returned a view in Node.js environments and a copy in Web environments. it has been fixed to always return a view.

Features

Bug Fixes

Documentation

We invite you to try the bson library immediately, and report any issues to the NODE project.