Skip to content

Commit

Permalink
Support embeds in Text (#27)
Browse files Browse the repository at this point in the history
* upgrade sparse-array-rled

* add embeds

* update serialized form docs

* folder reorg

* embed tests

* patch upgrade sparse-array-rled

* rerun benchmarks

* proofread: embed docs and tests
  • Loading branch information
mweidner037 authored Aug 27, 2024
1 parent 0ac5470 commit 38d33fd
Show file tree
Hide file tree
Showing 22 changed files with 442 additions and 245 deletions.
25 changes: 15 additions & 10 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -336,12 +336,17 @@ A total order on Positions, independent of any specific assignment of values.

An Order manages metadata (bunches) for any number of Lists, Texts, Outlines, and AbsLists. You can also use an Order to create Positions independent of a List (`createPositions`), convert between Positions and AbsPositions (`abs` and `unabs`), and directly view the tree of bunches (`getBunch`, `getBunchFor`).

#### `Text`
#### `Text<E>`

A list of characters, represented as an ordered map with Position keys.

Text is functionally equivalent to a `List<string>` with single-char values, but it uses strings internally and in bulk methods, instead of arrays of single chars. This reduces memory usage and the size of saved states.

The list may also contain embedded objects of type `E`.
Each embed takes the place of a single character. You can use embeds to represent
non-text content, like images and videos, that may appear inline in a text document.
If you do not specify the generic type `E`, it defaults to `never`, i.e., no embeds are allowed.

#### `Outline`

An `Outline` is like a List but without values. Instead, you tell the Outline which Positions are currently present, then use it to convert between Positions and their current indices.
Expand Down Expand Up @@ -376,7 +381,7 @@ AbsList's API is a hybrid between `Array<T>` and `Map<AbsPosition, T>`. Use `ins
The library also comes with _unordered_ collections:

- `PositionMap<T>`: A map from Positions to values of type `T`, like `List<T>` but without ordering info.
- `PositionCharMap`: A map from Positions to characters, like `Text` but without ordering info.
- `PositionCharMap<E>`: A map from Positions to characters (or embeds), like `Text<E>` but without ordering info.
- `PositionSet`: A set of Positions, like `Outline` but without ordering info.

These collections do not support in-order or indexed access, but they also do not require managing metadata, and they are slightly more efficient.
Expand All @@ -401,7 +406,7 @@ Saved states: Each class lets you save and load its internal states in JSON form

- `ListSavedState<T>`
- `OrderSavedState`
- `TextSavedState`
- `TextSavedState<E>`
- `OutlineSavedState`
- `AbsListSavedState<T>`

Expand Down Expand Up @@ -482,16 +487,16 @@ Each benchmark applies the [automerge-perf](https://github.com/automerge/automer

Results for an op-based/state-based text CRDT built on top of a Text + PositionSet, on my laptop:

- Sender time (ms): 655
- Sender time (ms): 722
- Avg update size (bytes): 92.7
- Receiver time (ms): 369
- Receiver time (ms): 416
- Save time (ms): 11
- Save size (bytes): 599817
- Load time (ms): 10
- Save time GZIP'd (ms): 42
- Save size GZIP'd (bytes): 87006
- Save size (bytes): 598917
- Load time (ms): 11
- Save time GZIP'd (ms): 40
- Save size GZIP'd (bytes): 86969
- Load time GZIP'd (ms): 30
- Mem used estimate (MB): 1.8
- Mem used estimate (MB): 2.0

For more results, see [benchmark_results.md](./benchmark_results.md).

Expand Down
102 changes: 51 additions & 51 deletions benchmark_results.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,47 +13,47 @@ For perspective on the save sizes: the final text (excluding deleted chars) is 1
Use `List` and send updates directly over a reliable link (e.g. WebSocket).
Updates and saved states use JSON encoding, with optional GZIP for saved states.

- Sender time (ms): 623
- Sender time (ms): 671
- Avg update size (bytes): 86.8
- Receiver time (ms): 342
- Save time (ms): 9
- Save size (bytes): 804020
- Load time (ms): 14
- Save time GZIP'd (ms): 54
- Save size GZIP'd (bytes): 89118
- Load time GZIP'd (ms): 36
- Receiver time (ms): 384
- Save time (ms): 8
- Save size (bytes): 803120
- Load time (ms): 17
- Save time GZIP'd (ms): 55
- Save size GZIP'd (bytes): 89013
- Load time GZIP'd (ms): 37
- Mem used estimate (MB): 2.2

## AbsList Direct

Use `AbsList` and send updates directly over a reliable link (e.g. WebSocket).
Updates and saved states use JSON encoding, with optional GZIP for saved states.

- Sender time (ms): 1504
- Sender time (ms): 1576
- Avg update size (bytes): 216.2
- AbsPosition length stats: avg = 187.4, percentiles [25, 50, 75, 100] = 170,184,202,272
- Receiver time (ms): 739
- Save time (ms): 15
- Save size (bytes): 868579
- Load time (ms): 19
- Save time GZIP'd (ms): 64
- Save size GZIP'd (bytes): 87086
- Load time GZIP'd (ms): 44
- Mem used estimate (MB): 2.1
- Receiver time (ms): 791
- Save time (ms): 14
- Save size (bytes): 867679
- Load time (ms): 21
- Save time GZIP'd (ms): 63
- Save size GZIP'd (bytes): 87108
- Load time GZIP'd (ms): 46
- Mem used estimate (MB): 2.2

## List Direct w/ Custom Encoding

Use `List` and send updates directly over a reliable link (e.g. WebSocket).
Updates use a custom string encoding; saved states use JSON with optional GZIP.

- Sender time (ms): 509
- Sender time (ms): 556
- Avg update size (bytes): 31.2
- Receiver time (ms): 299
- Save time (ms): 8
- Save size (bytes): 804020
- Receiver time (ms): 357
- Save time (ms): 9
- Save size (bytes): 803120
- Load time (ms): 11
- Save time GZIP'd (ms): 49
- Save size GZIP'd (bytes): 89113
- Save time GZIP'd (ms): 47
- Save size GZIP'd (bytes): 89021
- Load time GZIP'd (ms): 36
- Mem used estimate (MB): 2.2

Expand All @@ -62,64 +62,64 @@ Updates use a custom string encoding; saved states use JSON with optional GZIP.
Use `Text` and send updates directly over a reliable link (e.g. WebSocket).
Updates and saved states use JSON encoding, with optional GZIP for saved states.

- Sender time (ms): 619
- Sender time (ms): 693
- Avg update size (bytes): 86.8
- Receiver time (ms): 389
- Receiver time (ms): 444
- Save time (ms): 5
- Save size (bytes): 493835
- Save size (bytes): 492935
- Load time (ms): 8
- Save time GZIP'd (ms): 36
- Save size GZIP'd (bytes): 73737
- Load time GZIP'd (ms): 22
- Mem used estimate (MB): 1.3
- Save time GZIP'd (ms): 35
- Save size GZIP'd (bytes): 73709
- Load time GZIP'd (ms): 24
- Mem used estimate (MB): 1.4

## Outline Direct

Use `Outline` and send updates directly over a reliable link (e.g. WebSocket).
Updates and saved states use JSON encoding, with optional GZIP for saved states.
Neither updates nor saved states include values (chars).

- Sender time (ms): 587
- Sender time (ms): 648
- Avg update size (bytes): 78.4
- Receiver time (ms): 326
- Save time (ms): 5
- Receiver time (ms): 365
- Save time (ms): 6
- Save size (bytes): 382419
- Load time (ms): 7
- Save time GZIP'd (ms): 24
- Save size GZIP'd (bytes): 39367
- Load time GZIP'd (ms): 14
- Mem used estimate (MB): 1.2
- Save size GZIP'd (bytes): 39364
- Load time GZIP'd (ms): 13
- Mem used estimate (MB): 1.1

## TextCrdt

Use a hybrid op-based/state-based CRDT implemented on top of the library's data structures, copied from [@list-positions/crdts](https://github.com/mweidner037/list-positions-crdts).
This variant uses a Text + PositionSet to store the state and Positions in messages, manually managing BunchMetas.
Updates and saved states use JSON encoding, with optional GZIP for saved states.

- Sender time (ms): 655
- Sender time (ms): 722
- Avg update size (bytes): 92.7
- Receiver time (ms): 369
- Receiver time (ms): 416
- Save time (ms): 11
- Save size (bytes): 599817
- Load time (ms): 10
- Save time GZIP'd (ms): 42
- Save size GZIP'd (bytes): 87006
- Save size (bytes): 598917
- Load time (ms): 11
- Save time GZIP'd (ms): 40
- Save size GZIP'd (bytes): 86969
- Load time GZIP'd (ms): 30
- Mem used estimate (MB): 1.8
- Mem used estimate (MB): 2.0

## ListCrdt

Use a hybrid op-based/state-based CRDT implemented on top of the library's data structures, copied from [@list-positions/crdts](https://github.com/mweidner037/list-positions-crdts).
This variant uses a List of characters + PositionSet to store the state and Positions in messages, manually managing BunchMetas.
Updates and saved states use JSON encoding, with optional GZIP for saved states.

- Sender time (ms): 701
- Sender time (ms): 762
- Avg update size (bytes): 94.8
- Receiver time (ms): 472
- Receiver time (ms): 507
- Save time (ms): 13
- Save size (bytes): 910002
- Load time (ms): 21
- Save time GZIP'd (ms): 64
- Save size GZIP'd (bytes): 102650
- Load time GZIP'd (ms): 35
- Mem used estimate (MB): 2.5
- Save size (bytes): 909102
- Load time (ms): 15
- Save time GZIP'd (ms): 57
- Save size GZIP'd (bytes): 102554
- Load time GZIP'd (ms): 36
- Mem used estimate (MB): 2.6
8 changes: 4 additions & 4 deletions package-lock.json

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion package.json
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@
"dependencies": {
"lex-sequence": "^2.0.0",
"maybe-random-string": "^1.0.0",
"sparse-array-rled": "^1.0.0"
"sparse-array-rled": "^2.0.1"
},
"devDependencies": {
"@istanbuljs/nyc-config-typescript": "^1.0.2",
Expand Down
20 changes: 10 additions & 10 deletions src/index.ts
Original file line number Diff line number Diff line change
@@ -1,13 +1,13 @@
export * from "./abs_list";
export * from "./abs_position";
export * from "./bunch";
export * from "./bunch_ids";
export * from "./lexicographic_string";
export * from "./list";
export * from "./order";
export * from "./outline";
export * from "./position";
export * from "./text";
export * from "./lists/abs_list";
export * from "./lists/list";
export * from "./lists/outline";
export * from "./lists/text";
export * from "./order/abs_position";
export * from "./order/bunch";
export * from "./order/bunch_ids";
export * from "./order/lexicographic_string";
export * from "./order/order";
export * from "./order/position";
export * from "./unordered_collections/position_char_map";
export * from "./unordered_collections/position_map";
export * from "./unordered_collections/position_set";
22 changes: 13 additions & 9 deletions src/internal/item_list.ts
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
import type { SparseItems } from "sparse-array-rled";
import { BunchMeta, BunchNode } from "../bunch";
import { Order } from "../order";
import { MAX_POSITION, MIN_POSITION, Position } from "../position";
import { SparseIndices, type SparseItems } from "sparse-array-rled";
import { BunchMeta, BunchNode } from "../order/bunch";
import { Order } from "../order/order";
import { MAX_POSITION, MIN_POSITION, Position } from "../order/position";

export interface SparseItemsFactory<I, S extends SparseItems<I>> {
"new"(): S;
Expand Down Expand Up @@ -244,6 +244,8 @@ export class ItemList<I, S extends SparseItems<I>> {

/**
* Returns the [item, offset] at position, or null if it is not currently present.
*
* **Warning**: item is aliased internally! Use immediately and discard.
*/
getItem(pos: Position): [item: I, offset: number] | null {
const data = this.state.get(this.order.getNodeFor(pos));
Expand All @@ -254,6 +256,8 @@ export class ItemList<I, S extends SparseItems<I>> {
/**
* Returns the [item, offset] currently at index.
*
* **Warning**: item is aliased internally! Use immediately and discard.
*
* @throws If index is not in `[0, this.length)`.
* Note that this differs from an ordinary Array,
* which would instead return undefined.
Expand Down Expand Up @@ -646,11 +650,11 @@ export class ItemList<I, S extends SparseItems<I>> {
const savedState: { [bunchID: string]: number[] } = {};
for (const [node, data] of this.state) {
if (!data.values.isEmpty()) {
savedState[node.bunchID] = data.values
.serialize()
.map((item, i) =>
i % 2 === 0 ? this.itemsFactory.length(item as I) : (item as number)
);
const indices = SparseIndices.new();
for (const [index, item] of data.values.items()) {
indices.set(index, this.itemsFactory.length(item));
}
savedState[node.bunchID] = indices.serialize();
}
}
return savedState;
Expand Down
13 changes: 6 additions & 7 deletions src/abs_list.ts → src/lists/abs_list.ts
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
import { AbsBunchMeta, AbsPosition, AbsPositions } from "./abs_position";
import { AbsBunchMeta, AbsPosition, AbsPositions } from "../order/abs_position";
import { Order } from "../order/order";
import { List, ListSavedState } from "./list";
import { Order } from "./order";

/**
* A JSON-serializable saved state for an `AbsList<T>`.
Expand All @@ -27,15 +27,14 @@ import { Order } from "./order";
* uses a compact JSON representation with run-length encoded deletions, identical to `SerializedSparseArray<T>` from the
* [sparse-array-rled](https://github.com/mweidner037/sparse-array-rled#readme) package.
* It alternates between:
* - arrays of present values (even indices), and
* - numbers (odd indices), representing that number of deleted values.
* - arrays of present values, and
* - numbers, representing that number of deleted indices (empty slots).
*
* For example, the sparse array `["foo", "bar", , , , "X", "yy"]` serializes to
* `[["foo", "bar"], 3, ["X", "yy"]]`.
*
* Trivial entries (empty arrays, 0s, & trailing deletions) are always omitted,
* except that the 0th entry may be an empty array.
* For example, the sparse array `[, , "biz", "baz"]` serializes to `[[], 2, ["biz", "baz"]]`.
* Trivial entries (empty arrays, 0s, & trailing deletions) are always omitted.
* For example, the sparse array `[, , "biz", "baz"]` serializes to `[2, ["biz", "baz"]]`.
*/
export type AbsListSavedState<T> = Array<{
bunchMeta: AbsBunchMeta;
Expand Down
19 changes: 9 additions & 10 deletions src/list.ts → src/lists/list.ts
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
import { SparseArray } from "sparse-array-rled";
import { BunchMeta } from "./bunch";
import { ItemList, SparseItemsFactory } from "./internal/item_list";
import { normalizeSliceRange } from "./internal/util";
import { Order } from "./order";
import { Position } from "./position";
import { ItemList, SparseItemsFactory } from "../internal/item_list";
import { normalizeSliceRange } from "../internal/util";
import { BunchMeta } from "../order/bunch";
import { Order } from "../order/order";
import { Position } from "../order/position";
import { Outline, OutlineSavedState } from "./outline";

const sparseArrayFactory: SparseItemsFactory<
Expand Down Expand Up @@ -45,15 +45,14 @@ const sparseArrayFactory: SparseItemsFactory<
* uses a compact JSON representation with run-length encoded deletions, identical to `SerializedSparseArray<T>` from the
* [sparse-array-rled](https://github.com/mweidner037/sparse-array-rled#readme) package.
* It alternates between:
* - arrays of present values (even indices), and
* - numbers (odd indices), representing that number of deleted values.
* - arrays of present values, and
* - numbers, representing that number of deleted indices (empty slots).
*
* For example, the sparse array `["foo", "bar", , , , "X", "yy"]` serializes to
* `[["foo", "bar"], 3, ["X", "yy"]]`.
*
* Trivial entries (empty arrays, 0s, & trailing deletions) are always omitted,
* except that the 0th entry may be an empty array.
* For example, the sparse array `[, , "biz", "baz"]` serializes to `[[], 2, ["biz", "baz"]]`.
* Trivial entries (empty arrays, 0s, & trailing deletions) are always omitted.
* For example, the sparse array `[, , "biz", "baz"]` serializes to `[2, ["biz", "baz"]]`.
*/
export type ListSavedState<T> = {
[bunchID: string]: (T[] | number)[];
Expand Down
Loading

0 comments on commit 38d33fd

Please sign in to comment.