Support embeds in Text (#27)

* upgrade sparse-array-rled * add embeds * update serialized form docs * folder reorg * embed tests * patch upgrade sparse-array-rled * rerun benchmarks * proofread: embed docs and tests
mweidner037 · Aug 27, 2024 · 38d33fd · 38d33fd
1 parent 0ac5470
commit 38d33fd
Show file tree

Hide file tree

Showing 22 changed files with 442 additions and 245 deletions.
diff --git a/README.md b/README.md
@@ -336,12 +336,17 @@ A total order on Positions, independent of any specific assignment of values.
 
 An Order manages metadata (bunches) for any number of Lists, Texts, Outlines, and AbsLists. You can also use an Order to create Positions independent of a List (`createPositions`), convert between Positions and AbsPositions (`abs` and `unabs`), and directly view the tree of bunches (`getBunch`, `getBunchFor`).
 
-#### `Text`
+#### `Text<E>`
 
 A list of characters, represented as an ordered map with Position keys.
 
 Text is functionally equivalent to a `List<string>` with single-char values, but it uses strings internally and in bulk methods, instead of arrays of single chars. This reduces memory usage and the size of saved states.
 
+The list may also contain embedded objects of type `E`.
+Each embed takes the place of a single character. You can use embeds to represent
+non-text content, like images and videos, that may appear inline in a text document.
+If you do not specify the generic type `E`, it defaults to `never`, i.e., no embeds are allowed.
+
 #### `Outline`
 
 An `Outline` is like a List but without values. Instead, you tell the Outline which Positions are currently present, then use it to convert between Positions and their current indices.
@@ -376,7 +381,7 @@ AbsList's API is a hybrid between `Array<T>` and `Map<AbsPosition, T>`. Use `ins
 The library also comes with _unordered_ collections:
 
 - `PositionMap<T>`: A map from Positions to values of type `T`, like `List<T>` but without ordering info.
-- `PositionCharMap`: A map from Positions to characters, like `Text` but without ordering info.
+- `PositionCharMap<E>`: A map from Positions to characters (or embeds), like `Text<E>` but without ordering info.
 - `PositionSet`: A set of Positions, like `Outline` but without ordering info.
 
 These collections do not support in-order or indexed access, but they also do not require managing metadata, and they are slightly more efficient.
@@ -401,7 +406,7 @@ Saved states: Each class lets you save and load its internal states in JSON form
 
 - `ListSavedState<T>`
 - `OrderSavedState`
-- `TextSavedState`
+- `TextSavedState<E>`
 - `OutlineSavedState`
 - `AbsListSavedState<T>`
 
@@ -482,16 +487,16 @@ Each benchmark applies the [automerge-perf](https://github.com/automerge/automer
 
 Results for an op-based/state-based text CRDT built on top of a Text + PositionSet, on my laptop:
 
-- Sender time (ms): 655
+- Sender time (ms): 722
 - Avg update size (bytes): 92.7
-- Receiver time (ms): 369
+- Receiver time (ms): 416
 - Save time (ms): 11
-- Save size (bytes): 599817
-- Load time (ms): 10
-- Save time GZIP'd (ms): 42
-- Save size GZIP'd (bytes): 87006
+- Save size (bytes): 598917
+- Load time (ms): 11
+- Save time GZIP'd (ms): 40
+- Save size GZIP'd (bytes): 86969
 - Load time GZIP'd (ms): 30
-- Mem used estimate (MB): 1.8
+- Mem used estimate (MB): 2.0
 
 For more results, see [benchmark_results.md](./benchmark_results.md).
 

diff --git a/benchmark_results.md b/benchmark_results.md
@@ -13,47 +13,47 @@ For perspective on the save sizes: the final text (excluding deleted chars) is 1
 Use `List` and send updates directly over a reliable link (e.g. WebSocket).
 Updates and saved states use JSON encoding, with optional GZIP for saved states.
 
-- Sender time (ms): 623
+- Sender time (ms): 671
 - Avg update size (bytes): 86.8
-- Receiver time (ms): 342
-- Save time (ms): 9
-- Save size (bytes): 804020
-- Load time (ms): 14
-- Save time GZIP'd (ms): 54
-- Save size GZIP'd (bytes): 89118
-- Load time GZIP'd (ms): 36
+- Receiver time (ms): 384
+- Save time (ms): 8
+- Save size (bytes): 803120
+- Load time (ms): 17
+- Save time GZIP'd (ms): 55
+- Save size GZIP'd (bytes): 89013
+- Load time GZIP'd (ms): 37
 - Mem used estimate (MB): 2.2
 
 ## AbsList Direct
 
 Use `AbsList` and send updates directly over a reliable link (e.g. WebSocket).
 Updates and saved states use JSON encoding, with optional GZIP for saved states.
 
-- Sender time (ms): 1504
+- Sender time (ms): 1576
 - Avg update size (bytes): 216.2
 - AbsPosition length stats: avg = 187.4, percentiles [25, 50, 75, 100] = 170,184,202,272
-- Receiver time (ms): 739
-- Save time (ms): 15
-- Save size (bytes): 868579
-- Load time (ms): 19
-- Save time GZIP'd (ms): 64
-- Save size GZIP'd (bytes): 87086
-- Load time GZIP'd (ms): 44
-- Mem used estimate (MB): 2.1
+- Receiver time (ms): 791
+- Save time (ms): 14
+- Save size (bytes): 867679
+- Load time (ms): 21
+- Save time GZIP'd (ms): 63
+- Save size GZIP'd (bytes): 87108
+- Load time GZIP'd (ms): 46
+- Mem used estimate (MB): 2.2
 
 ## List Direct w/ Custom Encoding
 
 Use `List` and send updates directly over a reliable link (e.g. WebSocket).
 Updates use a custom string encoding; saved states use JSON with optional GZIP.
 
-- Sender time (ms): 509
+- Sender time (ms): 556
 - Avg update size (bytes): 31.2
-- Receiver time (ms): 299
-- Save time (ms): 8
-- Save size (bytes): 804020
+- Receiver time (ms): 357
+- Save time (ms): 9
+- Save size (bytes): 803120
 - Load time (ms): 11
-- Save time GZIP'd (ms): 49
-- Save size GZIP'd (bytes): 89113
+- Save time GZIP'd (ms): 47
+- Save size GZIP'd (bytes): 89021
 - Load time GZIP'd (ms): 36
 - Mem used estimate (MB): 2.2
 
@@ -62,64 +62,64 @@ Updates use a custom string encoding; saved states use JSON with optional GZIP.
 Use `Text` and send updates directly over a reliable link (e.g. WebSocket).
 Updates and saved states use JSON encoding, with optional GZIP for saved states.
 
-- Sender time (ms): 619
+- Sender time (ms): 693
 - Avg update size (bytes): 86.8
-- Receiver time (ms): 389
+- Receiver time (ms): 444
 - Save time (ms): 5
-- Save size (bytes): 493835
+- Save size (bytes): 492935
 - Load time (ms): 8
-- Save time GZIP'd (ms): 36
-- Save size GZIP'd (bytes): 73737
-- Load time GZIP'd (ms): 22
-- Mem used estimate (MB): 1.3
+- Save time GZIP'd (ms): 35
+- Save size GZIP'd (bytes): 73709
+- Load time GZIP'd (ms): 24
+- Mem used estimate (MB): 1.4
 
 ## Outline Direct
 
 Use `Outline` and send updates directly over a reliable link (e.g. WebSocket).
 Updates and saved states use JSON encoding, with optional GZIP for saved states.
 Neither updates nor saved states include values (chars).
 
-- Sender time (ms): 587
+- Sender time (ms): 648
 - Avg update size (bytes): 78.4
-- Receiver time (ms): 326
-- Save time (ms): 5
+- Receiver time (ms): 365
+- Save time (ms): 6
 - Save size (bytes): 382419
 - Load time (ms): 7
 - Save time GZIP'd (ms): 24
-- Save size GZIP'd (bytes): 39367
-- Load time GZIP'd (ms): 14
-- Mem used estimate (MB): 1.2
+- Save size GZIP'd (bytes): 39364
+- Load time GZIP'd (ms): 13
+- Mem used estimate (MB): 1.1
 
 ## TextCrdt
 
 Use a hybrid op-based/state-based CRDT implemented on top of the library's data structures, copied from [@list-positions/crdts](https://github.com/mweidner037/list-positions-crdts).
 This variant uses a Text + PositionSet to store the state and Positions in messages, manually managing BunchMetas.
 Updates and saved states use JSON encoding, with optional GZIP for saved states.
 
-- Sender time (ms): 655
+- Sender time (ms): 722
 - Avg update size (bytes): 92.7
-- Receiver time (ms): 369
+- Receiver time (ms): 416
 - Save time (ms): 11
-- Save size (bytes): 599817
-- Load time (ms): 10
-- Save time GZIP'd (ms): 42
-- Save size GZIP'd (bytes): 87006
+- Save size (bytes): 598917
+- Load time (ms): 11
+- Save time GZIP'd (ms): 40
+- Save size GZIP'd (bytes): 86969
 - Load time GZIP'd (ms): 30
-- Mem used estimate (MB): 1.8
+- Mem used estimate (MB): 2.0
 
 ## ListCrdt
 
 Use a hybrid op-based/state-based CRDT implemented on top of the library's data structures, copied from [@list-positions/crdts](https://github.com/mweidner037/list-positions-crdts).
 This variant uses a List of characters + PositionSet to store the state and Positions in messages, manually managing BunchMetas.
 Updates and saved states use JSON encoding, with optional GZIP for saved states.
 
-- Sender time (ms): 701
+- Sender time (ms): 762
 - Avg update size (bytes): 94.8
-- Receiver time (ms): 472
+- Receiver time (ms): 507
 - Save time (ms): 13
-- Save size (bytes): 910002
-- Load time (ms): 21
-- Save time GZIP'd (ms): 64
-- Save size GZIP'd (bytes): 102650
-- Load time GZIP'd (ms): 35
-- Mem used estimate (MB): 2.5
+- Save size (bytes): 909102
+- Load time (ms): 15
+- Save time GZIP'd (ms): 57
+- Save size GZIP'd (bytes): 102554
+- Load time GZIP'd (ms): 36
+- Mem used estimate (MB): 2.6
diff --git a/package-lock.json b/package-lock.json
diff --git a/package.json b/package.json
@@ -36,7 +36,7 @@
   "dependencies": {
     "lex-sequence": "^2.0.0",
     "maybe-random-string": "^1.0.0",
-    "sparse-array-rled": "^1.0.0"
+    "sparse-array-rled": "^2.0.1"
   },
   "devDependencies": {
     "@istanbuljs/nyc-config-typescript": "^1.0.2",

diff --git a/src/index.ts b/src/index.ts
@@ -1,13 +1,13 @@
-export * from "./abs_list";
-export * from "./abs_position";
-export * from "./bunch";
-export * from "./bunch_ids";
-export * from "./lexicographic_string";
-export * from "./list";
-export * from "./order";
-export * from "./outline";
-export * from "./position";
-export * from "./text";
+export * from "./lists/abs_list";
+export * from "./lists/list";
+export * from "./lists/outline";
+export * from "./lists/text";
+export * from "./order/abs_position";
+export * from "./order/bunch";
+export * from "./order/bunch_ids";
+export * from "./order/lexicographic_string";
+export * from "./order/order";
+export * from "./order/position";
 export * from "./unordered_collections/position_char_map";
 export * from "./unordered_collections/position_map";
 export * from "./unordered_collections/position_set";
diff --git a/src/internal/item_list.ts b/src/internal/item_list.ts
@@ -1,7 +1,7 @@
-import type { SparseItems } from "sparse-array-rled";
-import { BunchMeta, BunchNode } from "../bunch";
-import { Order } from "../order";
-import { MAX_POSITION, MIN_POSITION, Position } from "../position";
+import { SparseIndices, type SparseItems } from "sparse-array-rled";
+import { BunchMeta, BunchNode } from "../order/bunch";
+import { Order } from "../order/order";
+import { MAX_POSITION, MIN_POSITION, Position } from "../order/position";
 
 export interface SparseItemsFactory<I, S extends SparseItems<I>> {
   "new"(): S;
@@ -244,6 +244,8 @@ export class ItemList<I, S extends SparseItems<I>> {
 
   /**
    * Returns the [item, offset] at position, or null if it is not currently present.
+   *
+   * **Warning**: item is aliased internally! Use immediately and discard.
    */
   getItem(pos: Position): [item: I, offset: number] | null {
     const data = this.state.get(this.order.getNodeFor(pos));
@@ -254,6 +256,8 @@ export class ItemList<I, S extends SparseItems<I>> {
   /**
    * Returns the [item, offset] currently at index.
    *
+   * **Warning**: item is aliased internally! Use immediately and discard.
+   *
    * @throws If index is not in `[0, this.length)`.
    * Note that this differs from an ordinary Array,
    * which would instead return undefined.
@@ -646,11 +650,11 @@ export class ItemList<I, S extends SparseItems<I>> {
     const savedState: { [bunchID: string]: number[] } = {};
     for (const [node, data] of this.state) {
       if (!data.values.isEmpty()) {
-        savedState[node.bunchID] = data.values
-          .serialize()
-          .map((item, i) =>
-            i % 2 === 0 ? this.itemsFactory.length(item as I) : (item as number)
-          );
+        const indices = SparseIndices.new();
+        for (const [index, item] of data.values.items()) {
+          indices.set(index, this.itemsFactory.length(item));
+        }
+        savedState[node.bunchID] = indices.serialize();
       }
     }
     return savedState;

diff --git a/src/abs_list.ts → src/lists/abs_list.ts b/src/abs_list.ts → src/lists/abs_list.ts
@@ -1,6 +1,6 @@
-import { AbsBunchMeta, AbsPosition, AbsPositions } from "./abs_position";
+import { AbsBunchMeta, AbsPosition, AbsPositions } from "../order/abs_position";
+import { Order } from "../order/order";
 import { List, ListSavedState } from "./list";
-import { Order } from "./order";
 
 /**
  * A JSON-serializable saved state for an `AbsList<T>`.
@@ -27,15 +27,14 @@ import { Order } from "./order";
  * uses a compact JSON representation with run-length encoded deletions, identical to `SerializedSparseArray<T>` from the
  * [sparse-array-rled](https://github.com/mweidner037/sparse-array-rled#readme) package.
  * It alternates between:
- * - arrays of present values (even indices), and
- * - numbers (odd indices), representing that number of deleted values.
+ * - arrays of present values, and
+ * - numbers, representing that number of deleted indices (empty slots).
  *
  * For example, the sparse array `["foo", "bar", , , , "X", "yy"]` serializes to
  * `[["foo", "bar"], 3, ["X", "yy"]]`.
  *
- * Trivial entries (empty arrays, 0s, & trailing deletions) are always omitted,
- * except that the 0th entry may be an empty array.
- * For example, the sparse array `[, , "biz", "baz"]` serializes to `[[], 2, ["biz", "baz"]]`.
+ * Trivial entries (empty arrays, 0s, & trailing deletions) are always omitted.
+ * For example, the sparse array `[, , "biz", "baz"]` serializes to `[2, ["biz", "baz"]]`.
  */
 export type AbsListSavedState<T> = Array<{
   bunchMeta: AbsBunchMeta;

diff --git a/src/list.ts → src/lists/list.ts b/src/list.ts → src/lists/list.ts
@@ -1,9 +1,9 @@
 import { SparseArray } from "sparse-array-rled";
-import { BunchMeta } from "./bunch";
-import { ItemList, SparseItemsFactory } from "./internal/item_list";
-import { normalizeSliceRange } from "./internal/util";
-import { Order } from "./order";
-import { Position } from "./position";
+import { ItemList, SparseItemsFactory } from "../internal/item_list";
+import { normalizeSliceRange } from "../internal/util";
+import { BunchMeta } from "../order/bunch";
+import { Order } from "../order/order";
+import { Position } from "../order/position";
 import { Outline, OutlineSavedState } from "./outline";
 
 const sparseArrayFactory: SparseItemsFactory<
@@ -45,15 +45,14 @@ const sparseArrayFactory: SparseItemsFactory<
  * uses a compact JSON representation with run-length encoded deletions, identical to `SerializedSparseArray<T>` from the
  * [sparse-array-rled](https://github.com/mweidner037/sparse-array-rled#readme) package.
  * It alternates between:
- * - arrays of present values (even indices), and
- * - numbers (odd indices), representing that number of deleted values.
+ * - arrays of present values, and
+ * - numbers, representing that number of deleted indices (empty slots).
  *
  * For example, the sparse array `["foo", "bar", , , , "X", "yy"]` serializes to
  * `[["foo", "bar"], 3, ["X", "yy"]]`.
  *
- * Trivial entries (empty arrays, 0s, & trailing deletions) are always omitted,
- * except that the 0th entry may be an empty array.
- * For example, the sparse array `[, , "biz", "baz"]` serializes to `[[], 2, ["biz", "baz"]]`.
+ * Trivial entries (empty arrays, 0s, & trailing deletions) are always omitted.
+ * For example, the sparse array `[, , "biz", "baz"]` serializes to `[2, ["biz", "baz"]]`.
  */
 export type ListSavedState<T> = {
   [bunchID: string]: (T[] | number)[];