Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Make the event txnId mapping rely on the device_id instead of the access_token_id. #13083

Conversation

sandhose
Copy link
Member

@sandhose sandhose commented Jun 16, 2022

Fixes #13064

This changes the event txnId mapping to rely on the device_id instead of the access_token_id.
Note that there are access tokens created via the admin API which don't have a device_id. Those tokens would loose:

  • the txnId echo in /sync when they are sending an event to a room
  • idempotency when sending an event to a room (in case of a network failure)

Note that I decided to keep the token_id (and still writing to it) in the event_txn_id table to allow rolling back, but I'm not reading on it.

Pull Request Checklist

  • Pull request is based on the develop branch
  • Pull request includes a changelog file. The entry should:
    • Be a short description of your change which makes sense to users. "Fixed a bug that prevented receiving messages from other servers." instead of "Moved X method from EventStore to EventWorkerStore.".
    • Use markdown where necessary, mostly for code blocks.
    • End with either a period (.) or an exclamation mark (!).
    • Start with a capital letter.
    • Feel free to credit yourself, by adding a sentence "Contributed by @github_username." or "Contributed by [Your Name]." to the end of the entry.
  • Pull request includes a sign off
  • Code style is correct
    (run the linters)

@sandhose sandhose requested a review from a team as a code owner June 16, 2022 13:19
@reivilibre
Copy link
Contributor

Some brief thoughts.

Note that I decided to keep the token_id (and still writing to it) in the event_txn_id table to allow rolling back, but I'm not reading on it.

I think there's some fiddling of SCHEMA_VERSION/SCHEMA_COMPAT_VERSION to be done (and adding to the comments around those) when you make a column 'written but not read'.
If you can't figure it out from the example, let me/someone know (and I will try to figure out how that works again).
https://matrix-org.github.io/synapse/latest/development/database_schema.html#synapse-schema-versions also describes this process.

Otherwise:

  • handling of access tokens without device ID: I think I would rather uphold idempotency and the /sync feedback, even if that means we have to pretend all no-device-ID 'devices' are the same. Happy to hear comments from others.
  • This should be specified. I believe you've already found that the spec contradicts this change (:sob:) — whether that can be classed as a spec bug or needs an MSC, I don't know.
  • What should happen when you delete a device and create a new one with the same device ID? (Or alternatively: replace a device by logging in with the same ID?).
    • I would find it surprising if that means you inherit the pool of 'consumed' transaction IDs. Am I wrong to be surprised?
    • Regardless of this, I think it would be good to have this specced since it's an easy disagreement that homeserver implementations could have, that could also lead to subtle bugs. (Many clients use a timestamp in their ID and wouldn't care, but not all of them do this)

Copy link
Member

@richvdh richvdh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a general theme: please make sure you include clear comments about what is going on. People shouldn't have to have an extensive knowledge of the entire Synapse codebase to understand what individual bits of code are for.

Comment on lines +202 to +203
device_id: DictProperty[str] = DictProperty("device_id")
token_id: DictProperty[Optional[int]] = DefaultDictProperty("token_id", None)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comments please. What do these fields mean? What does it mean for them to be absent, or None?

Comment on lines +202 to +203
device_id: DictProperty[str] = DictProperty("device_id")
token_id: DictProperty[Optional[int]] = DefaultDictProperty("token_id", None)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we use a DefaultDictProperty for token_id when we do not do so for any of the other properties (all of which are optional)

Comment on lines +365 to +373
if config.device_id is not None and config.device_id == getattr(
e.internal_metadata, "device_id", None
):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comments here would be helpful. Why do we need to getattr rather than just read e.internal_metadata.device_id directly?

(just because the code was under-commented before doesn't mean you need to maintain that! What is this block actually doing?)


DROP INDEX event_txn_id_txn_id;
CREATE UNIQUE INDEX event_txn_id_txn_id ON event_txn_id (room_id, user_id, device_id, txn_id);
-- Keep this (non-unique) index in case we're rolling back
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again, what does "in case we're rolling back" mean?

* See the License for the specific language governing permissions and
* limitations under the License.
*/

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

more comments please. What is this delta doing? why?

(see

-- We're going to stop populating event_edges.room_id and event_edges.is_state,
for example, though I now wonder why I used -- comments rather than a /* one.

-- Keep this (non-unique) index in case we're rolling back
CREATE INDEX event_txn_id_txn_id_token_id ON event_txn_id (room_id, user_id, token_id, txn_id);

-- Make the device_id NOT NULL and remove the NOT NULL constraint from the token_id
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we have a foreign key constraint on devices ?

Comment on lines +26 to +28
DROP INDEX event_txn_id_event_id;
DROP INDEX event_txn_id_txn_id;
DROP INDEX event_txn_id_ts;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this will happen automatically when you drop the table?

Comment on lines +30 to +33
CREATE UNIQUE INDEX event_txn_id_event_id ON event_txn_id2(event_id);
CREATE UNIQUE INDEX event_txn_id_txn_id ON event_txn_id2(room_id, user_id, device_id, txn_id);
-- Keep this index in case we're rolling back
CREATE INDEX event_txn_id_txn_id_token_id ON event_txn_id2(room_id, user_id, token_id, txn_id);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's normally quicker to create the index once all the inserts are done.

* limitations under the License.
*/

CREATE TABLE event_txn_id2 (
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we seem to be missing the foreign key constraints here?

@clokep
Copy link
Member

clokep commented Jan 25, 2023

@sandhose Is this still needed? Looks like there's some outstanding feedback.

sandhose added a commit to sandhose/complement that referenced this pull request Feb 15, 2023
This adds two tests, which check the current spec behaviour of
transaction IDs, which are that they are scoped to a series of access
tokens, and not the device ID.

The first test highlight this behaviour, by logging in with refresh
token enabled, sending an event, using the refresh token and syncing
with the new access token. On the sync, the transaction ID should be
there, but currently in Synapse it is not.

The second test highlight that the transaction ID is not scoped to the
device ID, by logging in twice with the same device ID, sending an event
with the first access token, and syncing with the second access token.
In that case, the sync should not contain the transaction ID, but
I think it's the case in HS implementations which use the device ID to
scope the transaction IDs, like Conduit.

Related: matrix-org/matrix-spec#1133, matrix-org/matrix-spec#1236, matrix-org/synapse#13064 and matrix-org/synapse#13083
sandhose added a commit to sandhose/complement that referenced this pull request Feb 15, 2023
This adds two tests, which check the current spec behaviour of
transaction IDs, which are that they are scoped to a series of access
tokens, and not the device ID.

The first test highlight this behaviour, by logging in with refresh
token enabled, sending an event, using the refresh token and syncing
with the new access token. On the sync, the transaction ID should be
there, but currently in Synapse it is not.

The second test highlight that the transaction ID is not scoped to the
device ID, by logging in twice with the same device ID, sending an event
with the first access token, and syncing with the second access token.
In that case, the sync should not contain the transaction ID, but
I think it's the case in HS implementations which use the device ID to
scope the transaction IDs, like Conduit.

Related: matrix-org/matrix-spec#1133, matrix-org/matrix-spec#1236, matrix-org/synapse#13064 and matrix-org/synapse#13083
@DMRobertson DMRobertson added the X-Awaiting-Changes A contributed PR which needs changes and re-review before it can be merged label Feb 16, 2023
sandhose added a commit to sandhose/complement that referenced this pull request Feb 16, 2023
This adds two tests, which check the current spec behaviour of
transaction IDs, which are that they are scoped to a series of access
tokens, and not the device ID.

The first test highlight this behaviour, by logging in with refresh
token enabled, sending an event, using the refresh token and syncing
with the new access token. On the sync, the transaction ID should be
there, but currently in Synapse it is not.

The second test highlight that the transaction ID is not scoped to the
device ID, by logging in twice with the same device ID, sending an event
with the first access token, and syncing with the second access token.
In that case, the sync should not contain the transaction ID, but
I think it's the case in HS implementations which use the device ID to
scope the transaction IDs, like Conduit.

Related: matrix-org/matrix-spec#1133, matrix-org/matrix-spec#1236, matrix-org/synapse#13064 and matrix-org/synapse#13083
sandhose added a commit to sandhose/complement that referenced this pull request Feb 16, 2023
This adds two tests, which check the current spec behaviour of
transaction IDs, which are that they are scoped to a series of access
tokens, and not the device ID.

The first test highlight this behaviour, by logging in with refresh
token enabled, sending an event, using the refresh token and syncing
with the new access token. On the sync, the transaction ID should be
there, but currently in Synapse it is not.

The second test highlight that the transaction ID is not scoped to the
device ID, by logging in twice with the same device ID, sending an event
with the first access token, and syncing with the second access token.
In that case, the sync should not contain the transaction ID, but
I think it's the case in HS implementations which use the device ID to
scope the transaction IDs, like Conduit.

Related: matrix-org/matrix-spec#1133, matrix-org/matrix-spec#1236, matrix-org/synapse#13064 and matrix-org/synapse#13083
… `access_token_id`.

Signed-off-by: Quentin Gliech <quenting@element.io>
@sandhose
Copy link
Member Author

Replaced by #15318

@sandhose sandhose closed this Mar 24, 2023
hughns pushed a commit to sandhose/complement that referenced this pull request Apr 18, 2023
This adds two tests, which check the current spec behaviour of
transaction IDs, which are that they are scoped to a series of access
tokens, and not the device ID.

The first test highlight this behaviour, by logging in with refresh
token enabled, sending an event, using the refresh token and syncing
with the new access token. On the sync, the transaction ID should be
there, but currently in Synapse it is not.

The second test highlight that the transaction ID is not scoped to the
device ID, by logging in twice with the same device ID, sending an event
with the first access token, and syncing with the second access token.
In that case, the sync should not contain the transaction ID, but
I think it's the case in HS implementations which use the device ID to
scope the transaction IDs, like Conduit.

Related: matrix-org/matrix-spec#1133, matrix-org/matrix-spec#1236, matrix-org/synapse#13064 and matrix-org/synapse#13083
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
X-Awaiting-Changes A contributed PR which needs changes and re-review before it can be merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Bind the event TXN ID to the device ID instead of the access token ID
5 participants