Remove database differentiation in `_get_state_groups_from_groups_txn` - SQLite supports recursive queries #14531

MadLittleMods · 2022-11-23T04:24:33Z

Remove database differentiation (SQLite vs Postgres specific code) in _get_state_groups_from_groups_txn (_get_state_groups_from_groups). We can use recursive queries in our supported SQLite version range

Follow-up to #14527

Dev notes

Testing queries against the matrix.org database: https://gitlab.matrix.org/new-vector/internal/-/wikis/Matrix.org-Synapse-ops#access-matrixorg-synapses-database

Query test dummy data

CREATE TABLE state_groups_state (
    state_group BIGINT NOT NULL,
    room_id TEXT NOT NULL,
    type TEXT NOT NULL,
    state_key TEXT NOT NULL,
    event_id TEXT NOT NULL
);
INSERT INTO state_groups_state
	(state_group, room_id, type, state_key, event_id)
VALUES
  (1, 'room1', 'm.room.create', '', '$abc-create'),
  (1, 'room1', 'm.room.member', '@madlittlemods:matrix.org', '$abc-member'),
  (1, 'room1', 'm.room.history_visibility', '', '$abc-history'),
  (2, 'room1', 'm.room.member', '@madlittlemods:matrix.org', '$abc-member2'),
  (2, 'room1', 'm.room.history_visibility', '', '$abc-history2'),
  (3, 'room1', 'm.room.member', '@madlittlemods:matrix.org', '$abc-member3'),
  (3, 'room1', 'm.room.history_visibility', '', '$abc-history3');

CREATE TABLE state_group_edges (
    state_group BIGINT NOT NULL,
    prev_state_group BIGINT NOT NULL
);

INSERT INTO state_group_edges
	(state_group, prev_state_group)
VALUES
  (3, 2),
  (2, 1);

Postgres vs SQLite:

Recursive queries work just fine

Recursive queries are supported in "SQLite 3.8.3 or higher". Our minimum supported SQLite version is 3.27.0

SQLite doesn't support `?::bigint`

Solution: Use cast(? as bigint) ✔️

SQLite doesn't support `SELECT DISTINCT ON`

These nice SELECT DISTINCT ON (a, b) a, b, c queries that work in postgres are not supported in SQLite which make our lives difficult.

There are many context-specific ways to workaround/ignore these problems. One way is just to not get the distinct results in favor of sorting them and then only care about the first row result. But this means you're transporting a lot of duplicate pairs from the database back to your app just to ignore.

If you're lucky, maybe you can use GROUP BY if the columns you group by are the same ones you want to select (doesn't work in our case since we want to group by (type, state_key) but select type, state_key, event_id.

Or maybe you want to complicate things with a bunch of sub-queries.

Other references:

"SELECT DISTINCT ON was WONTFIXed in SQLite: https://code.djangoproject.com/ticket/22696" (https://stackoverflow.com/a/71924314/796832)

SQLite doesn't support parenthesis around `UNION` subqueries which also means no per-subquery `ORDER`/`LIMIT`

SQLite doesn't like parenthesis around each clause like (select_clause_A) UNION (select_clause_B) -> sqlite3.OperationalError: near "(": syntax error

https://stackoverflow.com/questions/4653124/what-does-the-sql-standard-say-about-parentheses-in-sql-union-except-intersect-s

If you're just doing a SELECT without ORDER/LIMIT, then you can easily just remove the parenthesis grouping. But you're kinda stuck if you wanted the per-subquery ORDER/LIMIT. You can ORDER/LIMIT at the end after the UNION but it's not the same.