Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Add developer docs to explain pagination tokens #12305

Closed
MadLittleMods opened this issue Mar 26, 2022 · 1 comment · Fixed by #12317
Closed

Add developer docs to explain pagination tokens #12305

MadLittleMods opened this issue Mar 26, 2022 · 1 comment · Fixed by #12317
Assignees
Labels
A-Docs things relating to the documentation T-Task Refactoring, removal, replacement, enabling or disabling functionality, other engineering tasks.

Comments

@MadLittleMods
Copy link
Contributor

MadLittleMods commented Mar 26, 2022

Add developer docs to explain pagination tokens.

The comment docs explain general things around the pagination tokens well but when I was confronted with s2633508_17_338_6732159_1082514_541479_274711_265584_1, it wasn't obvious to me how to decipher it. I knew the stream_ordering (s2633508) part but it was really fuzzy what the other numbers were and the comment docs don't explain that part. I only really figured it out while drafting this issue and looking at the code more.

Relevant code:

Relevant endpoints:

  • /sync
    • ?since
    • next_batch
    • prev_batch
  • /messages
    • ?from
    • ?to
    • start
    • end
  • others

Live tokens (stream_ordering)

synapse/synapse/types.py

Lines 436 to 437 in 3c41d87

Live tokens start with an "s" followed by the "stream_ordering" id of the
event it comes after. Historic tokens start with a "t" followed by the

ex.

  • s2633508_17_338_6732159_1082514_541479_274711_265584_1
    1. room_key: s2633508 -> 2633508 stream_ordering
    2. presence_key: 17
    3. typing_key: 338
    4. receipt_key: 6732159
    5. account_data_key: 1082514
    6. push_rules_key: 541479
    7. to_device_key: 274711
    8. device_list_key: 265584
    9. groups_key: 1
  • s1_33_0_1_1_1_1_7_1
  • s843_0_0_0_0_0_0_0_0

Each number key are concatenated together in this order:

synapse/synapse/types.py

Lines 636 to 649 in 3c41d87

async def to_string(self, store: "DataStore") -> str:
return self._SEPARATOR.join(
[
await self.room_key.to_string(store),
str(self.presence_key),
str(self.typing_key),
str(self.receipt_key),
str(self.account_data_key),
str(self.push_rules_key),
str(self.to_device_key),
str(self.device_list_key),
str(self.groups_key),
]
)

And represent the position of the various fields in the /sync response:

{
  "next_batch": "s12_4_0_1_1_1_1_4_1",
  "presence": {
    "events": [
      {
        "type": "m.presence",
        "sender": "@the-bridge-user:hs1",
        "content": {
          "presence": "offline",
          "last_active_ago": 103
        }
      }
    ]
  },
  "device_lists": {
    "changed": [
      "@alice:hs1"
    ]
  },
  "device_one_time_keys_count": {
    "signed_curve25519": 0
  },
  "org.matrix.msc2732.device_unused_fallback_key_types": [],
  "device_unused_fallback_key_types": [],
  "rooms": {
    "join": {
      "!QrZlfIDQLNLdZHqTnt:hs1": {
        "timeline": {
          "events": [],
          "prev_batch": "s10_4_0_1_1_1_1_4_1",
          "limited": false
        },
        "state": {
          "events": []
        },
        "account_data": {
          "events": []
        },
        "ephemeral": {
          "events": []
        },
        "unread_notifications": {
          "notification_count": 1,
          "highlight_count": 0
        },
        "summary": {},
        "org.matrix.msc2654.unread_count": 1
      }
    }
  }
}

Historic tokens (topological_ordering/depth)

synapse/synapse/types.py

Lines 437 to 439 in 3c41d87

event it comes after. Historic tokens start with a "t" followed by the
"topological_ordering" id of the event it comes after, followed by "-",
followed by the "stream_ordering" id of the event it comes after.

  • t175-530_0_0_0_0_0_0_0_0
    1. topological_ordering: t175 -> 175 (depth)
    2. stream_ordering: 530
    3. presence_key: 0
    4. typing_key: 0
    5. receipt_key: 0
    6. account_data_key: 0
    7. push_rules_key: 0
    8. to_device_key: 0
    9. device_list_key: 0
    10. groups_key: 0
    • You will see this from /messages probably because the endpoint is scoped to the room and so is depth
    • topological_ordering which is the same as depth in Synapse

Min-position tokens

This one seems pretty well explained by the comment docs already:

ex. m56~2.58~3.59

synapse/synapse/types.py

Lines 441 to 461 in 3c41d87

There is also a third mode for live tokens where the token starts with "m",
which is sometimes used when using sharded event persisters. In this case
the events stream is considered to be a set of streams (one for each writer)
and the token encodes the vector clock of positions of each writer in their
respective streams.
The format of the token in such case is an initial integer min position,
followed by the mapping of instance ID to position separated by '.' and '~':
m{min_pos}~{writer1}.{pos1}~{writer2}.{pos2}. ...
The `min_pos` corresponds to the minimum position all writers have persisted
up to, and then only writers that are ahead of that position need to be
encoded. An example token is:
m56~2.58~3.59
Which corresponds to a set of three (or more writers) where instances 2 and
3 (these are instance IDs that can be looked up in the DB to fetch the more
commonly used instance names) are at positions 58 and 59 respectively, and
all other instances are at position 56.

@MadLittleMods MadLittleMods added the A-Docs things relating to the documentation label Mar 26, 2022
@MadLittleMods MadLittleMods changed the title Add developer docs to explain position tokens Add developer docs to explain pagination tokens Mar 26, 2022
@anoadragon453
Copy link
Member

Thank you for collecting this information. Adding some contextual information about this under the Development -> Internal Documentation section of the docs website would be useful! These tokens are used in several places, so it may be useful to have a single Paginations Token page to document them, and then having pages about other system that use them (Sync) point to this page if the reader wants to learn more about the tokens.

I am wary of duplicating content between code comments and the documentation though. If we could refine the docstrings in the code, and then point to them in the documentation with a less in-depth explanation, that may strike a good balance?

@anoadragon453 anoadragon453 added the T-Task Refactoring, removal, replacement, enabling or disabling functionality, other engineering tasks. label Mar 28, 2022
@MadLittleMods MadLittleMods self-assigned this Mar 29, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
A-Docs things relating to the documentation T-Task Refactoring, removal, replacement, enabling or disabling functionality, other engineering tasks.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants