-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Speed up state res in rare case we don't have all events #16116
Conversation
If we don't have all the auth events in a room then not all state events will have a chain cover index. Even so, we can still use the chain cover index on the events that do have it, rather than bailing and using the slower functions. This situation should not arise for newly persisted rooms, as we check we have the full auth chain for each event, but can happen for existing rooms. c.f. #15245
6db50ed
to
cbaadc0
Compare
for event_id in state_set: | ||
chain_id, seq_no = chain_info[event_id] | ||
for state_id in state_set: | ||
chain_id, seq_no = chain_info[state_id] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was just to make mypy happy, as the changes above caused the type of event_id
to change :(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
grr at how we don't get Rust's shadowing rules in Python
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
quite intricate and tricky but I think I get it
# We pull out those events with their auth events, which gives us enough | ||
# information to construct the auth chain of an event up to auth events | ||
# that have the chain cover index. | ||
sql = """ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if I'm following along properly, this pulls out all (event_id, authing_event_id)
pairs with event_id
being in the set of chain-cover-uncalculated events for this room,
then annotates these (event_id, authing_event_id)
pairs with whether the auth event has a chain cover...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
then annotates these
(event_id, authing_event_id)
pairs with whether the auth event has a chain cover...
I'm not sure what you mean by "annotate" here? We look at all the events we've pulled out to see if they are indexed and mark them as such?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
'annotate' as in labels that keypair with a bool value in a map ;-)
processing = set(auth_ids) | ||
to_add = set() | ||
while processing: | ||
auth_id = processing.pop() | ||
to_add.add(auth_id) | ||
|
||
sub_auth_ids = event_to_auth_ids.get(auth_id) | ||
if sub_auth_ids is None: | ||
continue | ||
|
||
processing.update(sub_auth_ids - to_add) | ||
|
||
event_id_to_partial_auth_chain[event_id] = to_add |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so event_id_to_partial_auth_chain[event_id]
is the transitive auth-event closure of event_id
?
We say 'partial auth chain' here... what's the partial in respect to?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't pull out the full auth chain, we stop when the auth events are indexed.
This modifies `state_sets` so that they only include events that have a | ||
chain cover index, and returns a set of event IDs that are part of the | ||
auth difference. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, here's the hidden mut
keyword :p
Took me a moment to notice that modifying state_sets
is part of the result but not sure I have a better suggestion
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's icky, but here we are. I mostly wanted to avoid needlessly copying the state sets when its easier to just mutate tehm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, reasonable. Half tempted to suggest a _mut
suffix convention for this type of thing but probably gets overbearing soon enough. Meh, it will do.
# degree, we "fork" execution and run the algorithm for each node in the | ||
# zero degree. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i.e. for each node that has no remaining incoming connections, we give it a chance to be the next in the list?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, indeed.
T = TypeVar("T") | ||
|
||
|
||
def get_all_topologically_sorted_orders( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
at what point do you start writing tests for your test helpers haha, but looks good to me — if two humans agree on it, it must be right
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Heh, quite
for ordering in all_topological_orderings: | ||
ordering.reverse() | ||
|
||
for idx in range(len(ordering)): | ||
graph_subsets.add(frozenset(ordering[:idx])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so in each topological ordering then we just choose an arbitrary cut-off point, 'forking' as you put it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup!
return new_paths | ||
|
||
|
||
def get_all_topologically_consistent_subsets( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this test helper also LGTM
for event_id in state_set: | ||
chain_id, seq_no = chain_info[event_id] | ||
for state_id in state_set: | ||
chain_id, seq_no = chain_info[state_id] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
grr at how we don't get Rust's shadowing rules in Python
Co-authored-by: reivilibre <oliverw@matrix.org>
Thanks for the review @reivilibre! Do the answers make sense? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
answers make sense, thanks!
If we don't have all the auth events in a room then not all state events will have a chain cover index. Even so, we can still use the chain cover index on the events that do have it, rather than bailing and using the slower functions.
This situation should not arise for newly persisted rooms, as we check we have the full auth chain for each event, but can happen for existing rooms.
c.f. #15245