-
Notifications
You must be signed in to change notification settings - Fork 385
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #1598 from matrix-org/rav/proposals/id_grammar
MSC 1597: Better spec for matrix identifiers
- Loading branch information
Showing
1 changed file
with
273 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,273 @@ | ||
# Grammars for identifiers in the Matrix protocol | ||
|
||
## Background | ||
|
||
Matrix uses client- or server-generated identifiers in a number of | ||
places. Historically the grammars for these have been underspecified, which | ||
leads to confusion about what is or is not a valid identifier with the | ||
possibility of incompatability between implementations. | ||
|
||
This proposal presents tightly-specified grammars for a number of | ||
identifiers. | ||
|
||
## Common Identifiers | ||
|
||
[Spec](https://matrix.org/docs/spec/appendices.html#common-identifier-format) | ||
|
||
Proposal: | ||
|
||
> `localpart` may not include `:`. When parsing a Common Identifier, it should | ||
> be split at the *leftmost* `:`. | ||
Rationale: server names may contain multiple `:`s (think IPv6 literals), so the | ||
first colon is the only sane place to split them. This is a Known Thing, but I | ||
don't think we spell it out anywhere in the spec. | ||
|
||
## User IDs | ||
|
||
User IDs are | ||
[well-specified](https://matrix.org/docs/spec/appendices.html#user-identifiers), | ||
however we should consider dropping `/` from the list of allowed characters, | ||
because HTTP proxies might rewrite | ||
`/_matrix/client/r0/profile/@foo%25bar:matrix.org/displayname` to | ||
`/_matrix/client/r0/profile/@foo/bar:matrix.org/displayname`, messing things | ||
up. | ||
|
||
History: `/` was introduced with the intention of acting as a hierarchical | ||
namespacing character, particularly with consideration to the gitter protocol | ||
which uses it as a hierarchical separator. However, this was not as effective | ||
as hoped because `@foo/bar:example.com` looks like the ID is partitioned into | ||
`@foo` and `bar:example.com`. | ||
|
||
Proposal: | ||
|
||
> Remove `/` from the list of allowed characters in User IDs. | ||
`/` will of course be maintained under the grammar of "historical user | ||
IDs". Sorting out that mess is a longer-term project. | ||
|
||
## Room IDs and Event IDs | ||
|
||
[Issue](https://github.com/matrix-org/matrix-doc/issues/667) | ||
[Spec](https://matrix.org/docs/spec/appendices.html#room-ids-and-event-ids) | ||
|
||
These currently have similar formats, though it is likely that event ids will | ||
be replaced with something else due to | ||
[#1127](https://github.com/matrix-org/matrix-doc/issues/1127). | ||
|
||
Currently they are both specified as ``?opaque_id:domain``, without clues as to | ||
what the opaque_id should be. | ||
|
||
Synapse uses: `[A-Za-z]{18}`. | ||
[Dendrite](https://github.com/matrix-org/dendrite/blob/b71d922/src/github.com/matrix-org/dendrite/clientapi/routing/createroom.go#L125) | ||
uses (I think) `[A-Za-z0-9]{16}` via | ||
[json.go](https://github.com/matrix-org/util/blob/master/json.go#L185). However, | ||
some server implementations/forks are known to generate event IDs (and possibly | ||
room IDs) using a wide alphabet, which means that there exist rooms that | ||
include unusual event IDs. | ||
|
||
Proposal: | ||
|
||
> The opaque_id part must not be empty, and must consist entirely of the | ||
> characters `[0-9a-zA-Z.=_-]`. | ||
> | ||
> The total length (including sigil and domain) must not exceed 255 characters. | ||
> | ||
> This is only enforced for v2 rooms - servers and clients wishing to support | ||
> v1 rooms should be more tolerant. | ||
|
||
## Key IDs (for federation, e2e, and identity servers) | ||
|
||
These are always of the form `<algorithm>:<tok>`. | ||
|
||
Valid algorithms are defined at | ||
https://matrix.org/docs/spec/client_server/unstable.html#key-algorithms, though | ||
we should define the alphabet for future algorithms. | ||
|
||
Proposal: | ||
|
||
> Future algorithm identifiers will be assigned from the alphabet `[a-z0-9_.]` | ||
> and will be at most 31 characters in length. | ||
For federation keys, | ||
[Synapse](https://github.com/matrix-org/synapse/blob/74854a97191191b08101821753c2672efc2a65fd/synapse/config/key.py#L159) | ||
generates key ids as `ed25519:a_[A-Za-z]{4}`, though an HS admin can configure | ||
them manually to be anything without whitespace. | ||
|
||
Key IDs end up in an Authorization header which looks like `X-Matrix | ||
origin=origin.example.com,key="keyId",sig="ABCDEF..."`. The Synapse | ||
implementation splits on `,` and `=` without regard to quoting so this | ||
currently precludes the use of `,` or `=` in a key ID. | ||
|
||
For e2e, device keys have a `tok` corresponding to the device id, whilst | ||
one-time keys are generated by libolm, which uses a base64-encoded 32-bit int, ie | ||
`[A-Za-z0-9+/]{6}`. | ||
|
||
A key ID needs to be unique over the lifetime of the server (for federation) or | ||
the device (for e2e). However, they are used fairly widely, so making them long | ||
is unattractive as they could significantly increase the amount of data being | ||
transmitted. Let's limit the 'tok' part of the key to 31 characters too. | ||
|
||
Proposal: | ||
|
||
> Key IDs use the following BNF grammar: | ||
> | ||
> ``` | ||
> key_id = algorithm ":" tok | ||
> | ||
> algorithm = 1*31 alg_chars | ||
> | ||
> tok = 1*31 tok_chars | ||
> | ||
> alg_chars = %x61-7a / %30-39 / "_" / "." | ||
> ; a-z 0-9 _ . | ||
> | ||
> tok_chars = ALPHA / DIGIT / "." / "=" / "_" / "-" | ||
> ; A-Z a-z 0-9 . = _ - | ||
> ``` | ||
> | ||
Note that enforcing this grammar will mean: | ||
* Making sure that synapse handles "=" characters in key IDs (easy). | ||
* Making libolm not put + and / characters in key IDs (easy enough, but there | ||
will be a bunch of malformed unique keys out there in the wild. Possibly they | ||
would just get thrown away. Servers may need to continue to tolerate `+` and | ||
`/` in e2e keys for a while.) | ||
## Opaque IDs | ||
[Issue](https://github.com/matrix-org/matrix-doc/issues/666) | ||
This is a class of identifier types where nobody is really meant to parse any | ||
part of the ID - they are just unique identifiers (with varying scopes of | ||
uniqueness). See below for discussion on what is currently in use. | ||
I propose to specify the almost the same grammar for all of these, for | ||
simplicity and consistency. | ||
Proposal: | ||
> Opaque IDs must be strings consisting entirely of the characters | ||
> `[0-9a-zA-Z.=_-]`. Their length must not exceed 255 characters and they must | ||
> not be empty. | ||
For almost all of the current implementations I have looked at (listed below), | ||
the grammar above is a superset of the generated identifiers, and a subset of | ||
the understood identifiers. There should therefore be no | ||
backwards-compatibility problems with its introduction. | ||
The exception is transaction IDs generated by some clients. I think that we'll | ||
just have to fix those clients and accept that old versions may not work with | ||
future servers. | ||
### Call IDs | ||
[Spec](https://matrix.org/docs/spec/client_server/unstable.html#m-call-invite) | ||
These are only used within the body of `m.call.*` events, as far as I am | ||
aware. They should be unique within the lifetime of a room. (Some | ||
implementations currently treat them as globally unique, but that is considered | ||
an implementation bug.) | ||
[matrix-js-sdk](https://github.com/matrix-org/matrix-js-sdk/blob/4d310cd4618db4e98a8e6b5eb812480102ee4dee/src/webrtc/call.js#L72) uses `c[0-9.]{32}`. | ||
[matrix-android-sdk](https://github.com/matrix-org/matrix-android-sdk/blob/5c6f785e53632e7b6fb3f3859a90c3d85b040e7f/matrix-sdk/src/main/java/org/matrix/androidsdk/call/MXWebRtcCall.java#L221) uses `c[0-9]{13}`. | ||
Additional proposal: | ||
> Call IDs should be long enough to make clashes unlikely. | ||
### Media IDs | ||
[Spec](https://matrix.org/docs/spec/client_server/r0.3.0.html#id67) | ||
These are generated by the server on upload, and then embedded in `mxc://` URIs | ||
and used in the C-S API and the S-S API. | ||
They must be URI-safe to be sensibly embedded in `mxc://` URIs. | ||
[Synapse](https://github.com/matrix-org/synapse/blob/74854a97191191b08101821753c2672efc2a65fd/synapse/rest/media/v1/media_repository.py#L153) | ||
uses `[A-Za-z]{24}`, though it also uses `[0-9A-Za-z_-]{27}` for | ||
[URL | ||
previews](https://github.com/matrix-org/synapse/blob/74854a97191191b08101821753c2672efc2a65fd/synapse/rest/media/v1/preview_url_resource.py#L285). | ||
[matrix-media-repo](https://github.com/turt2live/matrix-media-repo/blob/539f25ee75ba6cdbb0410314b29978f4b8b1d7fe/src/github.com/turt2live/matrix-media-repo/controllers/upload_controller/upload_controller.go#L50) | ||
uses `[A-Za-z0-9]{32}`, via [random.go](https://github.com/turt2live/matrix-media-repo/blob/539f25ee75ba6cdbb0410314b29978f4b8b1d7fe/src/github.com/turt2live/matrix-media-repo/util/random.go#L18-L27). | ||
### Filter IDs | ||
[Spec](https://matrix.org/docs/spec/client_server/unstable.html#post-matrix-client-r0-user-userid-filter) | ||
These are generated by the server and then used in the CS API. They are only | ||
required to be unique for a given user. `{` is already forbidden by the spec. | ||
[Synapse](https://github.com/matrix-org/synapse/blob/74854a97191191b08101821753c2672efc2a65fd/synapse/storage/filtering.py#L70-L73) | ||
uses a stringified int. | ||
### Auth Session IDs | ||
[Spec](https://matrix.org/docs/spec/client_server/r0.3.0.html#user-interactive-authentication-api) | ||
These are generated by the server during auth, and then used in the CS | ||
API. However, they need to be unique for a given server. | ||
[Synapse](https://github.com/matrix-org/synapse/blob/74854a97191191b08101821753c2672efc2a65fd/synapse/handlers/auth.py#L494) uses `[A-Za-z]{24}`. | ||
### Transaction IDs (for federation) | ||
[Spec](https://matrix.org/docs/spec/server_server/unstable.html#put-matrix-federation-v1-send-txnid) | ||
Generated by sending server. Needs to be unique for a given pair of servers. | ||
[Synapse](https://github.com/matrix-org/synapse/blob/74854a97191191b08101821753c2672efc2a65fd/synapse/federation/transaction_queue.py#L593) uses a stringified int and accepts pretty much anything. | ||
### Transaction IDs (for C-S API) | ||
[Spec](https://matrix.org/docs/spec/client_server/unstable.html#put-matrix-client-r0-rooms-roomid-send-eventtype-txnid) | ||
These are generated by the client. They only need to be unique within the | ||
context of a single access_token/device. | ||
Synapse doesn't appear to do any sanity-checking here currently. | ||
[matrix-js-sdk](https://github.com/matrix-org/matrix-js-sdk/blob/c6b500bc09994ab5924ef8aab9bd10fc7ded5dae/src/base-apis.js#L123) | ||
uses `m[0-9]{13}.[0-9]{1,}`. | ||
[matrix-android-sdk](https://github.com/matrix-org/matrix-android-sdk/blob/088414fb187cae341690c3a01493b87d97f0169f/matrix-sdk/src/main/java/org/matrix/androidsdk/rest/model/Event.java#L503) | ||
uses a room ID plus a timestamp, hence kinda could be anything, but certainly | ||
will include a `!`. | ||
### Device IDs | ||
[Spec](https://matrix.org/docs/spec/client_server/unstable.html#relationship-between-access-tokens-and-devices) | ||
These are normally generated by the server on login. It's possible for clients | ||
to present their own device_ids, but we're not aware of this feature being | ||
widely used. | ||
They are used between users and across federation for E2E and to-device | ||
messages. They need to be unique for a particular user. They also appear in key | ||
IDs and must therefore be a subset of that grammar. | ||
[Synapse](https://github.com/matrix-org/synapse/blob/74854a97191191b08101821753c2672efc2a65fd/synapse/handlers/device.py#L89) | ||
generates device IDs with `[A-Z]{10}`. It appears to do little sanity-checking | ||
of client-generated device IDs currently. | ||
Additional proposal: | ||
> Device IDs must not exceed 31 characters in length. | ||
### Message IDs | ||
These are used in the server-server API for | ||
[Send-to-device messaging](https://matrix.org/docs/spec/server_server/unstable.html#send-to-device-messaging). | ||
Synapse uses `[A-Za-z]{16}`, and accepts anything that fits in a postgres TEXT | ||
field. Ref: [devicemessage.py](https://github.com/matrix-org/synapse/blob/74854a97191191b08101821753c2672efc2a65fd/synapse/handlers/devicemessage.py#L102). | ||
## Room Aliases | ||
These are a complex topic and are discussed in [MSC | ||
1608](https://github.com/matrix-org/matrix-doc/issues/1608). |