Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

@cachedList can insert None instead of absence of a key #17726

Open
kegsay opened this issue Sep 18, 2024 · 5 comments
Open

@cachedList can insert None instead of absence of a key #17726

kegsay opened this issue Sep 18, 2024 · 5 comments

Comments

@kegsay
Copy link
Contributor

kegsay commented Sep 18, 2024

The docs for this state:

Note that any missing values are cached as None.

This is not semantically the same as absence of a key, and creates very subtle failure modes. I just spent an hour or two trying to figure out how a function could possibly set None and well, it can't. It turns out the cache is doing it because some rooms don't have entries in the DB. The function in question is

@cachedList(cached_method_name="_get_max_event_pos", list_name="room_ids")

Unfortunately, the caller of this function never did a None check causing runtime failures: see element-hq/element-x-ios#3300 (comment)

It is absolutely not clear that adding a cache decorator would change the semantics like this. I suggest we double check every caller of these functions to make sure they None check before usage.

synapse/storage/databases/main/end_to_end_keys.py:855:    @cachedList(
synapse/storage/databases/main/end_to_end_keys.py-856-        cached_method_name="_get_bare_e2e_cross_signing_keys",
--
synapse/storage/databases/main/receipts.py:441:    @cachedList(
synapse/storage/databases/main/receipts.py-442-        cached_method_name="_get_linearized_receipts_for_room",
--
synapse/storage/databases/main/roommember.py:199:    @cachedList(
synapse/storage/databases/main/roommember.py-200-        cached_method_name="get_user_in_room_with_profile", list_name="user_ids"
--
synapse/storage/databases/main/roommember.py:734:    @cachedList(
synapse/storage/databases/main/roommember.py-735-        cached_method_name="get_rooms_for_user",
--
synapse/storage/databases/main/roommember.py:793:    @cachedList(
synapse/storage/databases/main/roommember.py-794-        cached_method_name="does_pair_of_users_share_a_room", list_name="other_user_ids"
--
synapse/storage/databases/main/roommember.py:942:    @cachedList(
synapse/storage/databases/main/roommember.py-943-        cached_method_name="_get_user_id_from_membership_event_id",
--
synapse/storage/databases/main/roommember.py:1284:    @cachedList(
synapse/storage/databases/main/roommember.py-1285-        cached_method_name="_get_membership_from_event_id", list_name="member_event_ids"
--
synapse/storage/databases/main/transactions.py:211:    @cachedList(
synapse/storage/databases/main/transactions.py-212-        cached_method_name="get_destination_retry_timings", list_name="destinations"
--
synapse/storage/databases/main/relations.py:477:    @cachedList(cached_method_name="get_references_for_event", list_name="event_ids")
synapse/storage/databases/main/relations.py-478-    async def get_references_for_events(
--
synapse/storage/databases/main/relations.py:532:    @cachedList(cached_method_name="get_applicable_edit", list_name="event_ids")  # type: ignore[synapse-@cached-mutable]
synapse/storage/databases/main/relations.py-533-    async def get_applicable_edits(
--
synapse/storage/databases/main/relations.py:619:    @cachedList(cached_method_name="get_thread_summary", list_name="event_ids")  # type: ignore[synapse-@cached-mutable]
synapse/storage/databases/main/relations.py-620-    async def get_thread_summaries(
--
synapse/storage/databases/main/relations.py:793:    @cachedList(cached_method_name="get_thread_participated", list_name="event_ids")
synapse/storage/databases/main/relations.py-794-    async def get_threads_participated(
--
synapse/storage/databases/main/keys.py:137:    @cachedList(
synapse/storage/databases/main/keys.py-138-        cached_method_name="_get_server_keys_json", list_name="server_name_and_key_ids"
--
synapse/storage/databases/main/keys.py:207:    @cachedList(
synapse/storage/databases/main/keys.py-208-        cached_method_name="get_server_key_json_for_remote", list_name="key_ids"
--
synapse/storage/databases/main/presence.py:253:    @cachedList(
synapse/storage/databases/main/presence.py-254-        cached_method_name="_get_presence_for_user",
--
synapse/storage/databases/main/events_worker.py:1624:        # @cachedList chomps lots of memory if you call it with a big list, so
synapse/storage/databases/main/events_worker.py-1625-        # we break it down. However, each batch requires its own index scan, so we make
--
synapse/storage/databases/main/events_worker.py:1639:    @cachedList(cached_method_name="have_seen_event", list_name="event_ids")
synapse/storage/databases/main/events_worker.py-1640-    async def _have_seen_events_dict(
--
synapse/storage/databases/main/events_worker.py:2329:    @cachedList(cached_method_name="is_partial_state_event", list_name="event_ids")
synapse/storage/databases/main/events_worker.py-2330-    async def get_partial_state_events(
--
synapse/storage/databases/main/events_worker.py:2352:        # convert the result to a dict, to make @cachedList work
synapse/storage/databases/main/events_worker.py-2353-        partial = {r[0] for r in result}
--
synapse/storage/databases/main/stream.py:1495:    @cachedList(cached_method_name="_get_max_event_pos", list_name="room_ids")
synapse/storage/databases/main/stream.py-1496-    async def _bulk_get_max_event_pos(
--
synapse/storage/databases/main/signatures.py:41:    @cachedList(
synapse/storage/databases/main/signatures.py-42-        cached_method_name="get_event_reference_hash", list_name="event_ids", num_args=1
--
synapse/storage/databases/main/push_rule.py:262:    @cachedList(cached_method_name="get_push_rules_for_user", list_name="user_ids")
synapse/storage/databases/main/push_rule.py-263-    async def bulk_get_push_rules(
--
synapse/storage/databases/main/devices.py:1076:    @cachedList(
synapse/storage/databases/main/devices.py-1077-        cached_method_name="get_device_list_last_stream_id_for_remote",
--
synapse/storage/databases/main/room.py:1361:    @cachedList(cached_method_name="is_partial_state_room", list_name="room_ids")
synapse/storage/databases/main/room.py-1362-    async def is_partial_state_room_batched(
--
synapse/storage/databases/main/state.py:314:    @cachedList(cached_method_name="get_room_type", list_name="room_ids")
synapse/storage/databases/main/state.py-315-    async def bulk_get_room_type(
--
synapse/storage/databases/main/state.py:393:    @cachedList(cached_method_name="get_room_encryption", list_name="room_ids")
synapse/storage/databases/main/state.py-394-    async def bulk_get_room_encryption(
--
synapse/storage/databases/main/state.py:605:    @cachedList(
synapse/storage/databases/main/state.py-606-        cached_method_name="_get_state_group_for_event",
--
synapse/storage/databases/main/user_erasure_store.py:48:    @cachedList(cached_method_name="is_user_erased", list_name="user_ids")
synapse/storage/databases/main/user_erasure_store.py-49-    async def are_users_erased(self, user_ids: Iterable[str]) -> Mapping[str, bool]:
@clokep
Copy link
Contributor

clokep commented Sep 18, 2024

It could probably use a sentinel instead of None to know if it uncached?

@clokep
Copy link
Contributor

clokep commented Sep 18, 2024

Maybe the mypy plugin is missing this coercion?

@clokep
Copy link
Contributor

clokep commented Sep 18, 2024

@MadLittleMods
Copy link
Collaborator

MadLittleMods commented Sep 18, 2024

It could probably use a sentinel instead of None to know if it uncached?

That's what I had to do for bulk_get_room_type and bulk_get_room_encryption in #17450

Related discussion: #17450 (comment)

devonh pushed a commit that referenced this issue Sep 18, 2024
…ooms (#17727)

Fixes element-hq/element-x-ios#3300

Some rooms are missing from `sliding_sync_joined_rooms`. When this
happens, the first call will succeed, but any subsequent calls for this
room ID will cause the cache to return `None` for the room ID, rather
than not having the key at all. This then causes the `<=` check to
throw.

Root cause: #17726

### Pull Request Checklist

<!-- Please read
https://element-hq.github.io/synapse/latest/development/contributing_guide.html
before submitting your pull request -->

* [x] Pull request is based on the develop branch
* [ ] Pull request includes a [changelog
file](https://element-hq.github.io/synapse/latest/development/contributing_guide.html#changelog).
The entry should:
- Be a short description of your change which makes sense to users.
"Fixed a bug that prevented receiving messages from other servers."
instead of "Moved X method from `EventStore` to `EventWorkerStore`.".
  - Use markdown where necessary, mostly for `code blocks`.
  - End with either a period (.) or an exclamation mark (!).
  - Start with a capital letter.
- Feel free to credit yourself, by adding a sentence "Contributed by
@github_username." or "Contributed by [Your Name]." to the end of the
entry.
* [ ] [Code
style](https://element-hq.github.io/synapse/latest/code_style.html) is
correct
(run the
[linters](https://element-hq.github.io/synapse/latest/development/contributing_guide.html#run-the-linters))
@erikjohnston
Copy link
Member

I think the historical context for this is that it was very much expected that a value was returned for every key, i.e. the bulk fetch function should return the exact same thing as just calling the non-bulk function in a loop. The common case for store functions that couldn't find a row is to return None, and so that is likely why we treated missing values as None.

However, there are now quite a few functions that don't actually have a non-bulk equivalent, and so it could make sense for those functions not to return everything.

Agreed this is a footgun, and we probably should make @cachedList error if the function doesn't return an entry for each item. I'm not super keen on adding sentinel values in there by default, as it breaks the idea that @cachedList should behave like calling @cached function in a loop

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants