Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

No forward extremities left! #6708

Closed
jomat opened this issue Jan 14, 2020 · 7 comments
Closed

No forward extremities left! #6708

jomat opened this issue Jan 14, 2020 · 7 comments

Comments

@jomat
Copy link

jomat commented Jan 14, 2020

Description

This error (might be related to #5090) currently happens every second:

2020-01-14 21:28:17,535 - synapse.handlers.federation - 2243 - ERROR - PUT-1882184-$157900951562qJwrB:maclemon.at-$15788466572150988AvrIE:matrix.org-$15788465832150363FmtGN:matrix.org-$1578514266146638SnaKo:matrix.org-$1578423771156626YCsrc:matrix.org-$1578423734156249zGcBc:matrix.org-$1578422107140849CyrAX:matrix.org-$1578421306132795NvFmz:matrix.org-$1578351387218871XQHeR:matrix.org-$15782476022528767FKcYw:matrix.org-$15781560252036164NpEUl:matrix.org-$15781555392033049gcICc:matrix.org-$15780051521191302FzugJ:matrix.org-$15780030251176028oICrD:matrix.org-$15779908961079656FqoiU:matrix.org-$15779902471074539YxueT:matrix.org-$1577661474909073OJWVA:matrix.org-$1577659180894695fwsRo:matrix.org-$1577555484323561NNCku:matrix.org-$1577555452323365FYsVa:matrix.org-$15774668092588410SJivB:matrix.org-$15774667642588010uNyeT:matrix.org-$15774665192586146OYUyD:matrix.org-$15774663232584634ADKOD:matrix.org-$15772287461314535zsLkK:matrix.org-$1577118517616949riXKr:matrix.org-$1576807475364860YluNL:matrix.org-$1576807460364697kwPqr:matrix.org-$1576795288285934MRCWo:matrix.org-$1576793110264870jHWlb:matrix.org-$1576788210223357domQE:matrix.org-$1576783540182016JKcNh:matrix.org-$15757222581105945dsBlB:matrix.org-$1575575671676480oJnzS:matrix.org-$157530204249657edALc:matrix.org - Failed to get auth chain
Traceback (most recent call last):
  File "/opt/venvs/matrix-synapse/lib/python3.7/site-packages/synapse/handlers/federation.py", line 2235, in _update_auth_events_and_context_for_auth
    yield self._handle_new_event(origin, e, auth_events=auth)
twisted.internet.defer.FirstError: FirstError[#0, [Failure instance: Traceback: <class 'AssertionError'>: No forward extremities left!
/opt/venvs/matrix-synapse/lib/python3.7/site-packages/twisted/internet/defer.py:654:_runCallbacks
/opt/venvs/matrix-synapse/lib/python3.7/site-packages/twisted/internet/defer.py:1475:gotResult
/opt/venvs/matrix-synapse/lib/python3.7/site-packages/twisted/internet/defer.py:1416:_inlineCallbacks
/opt/venvs/matrix-synapse/lib/python3.7/site-packages/twisted/python/failure.py:512:throwExceptionIntoGenerator
--- <exception caught here> ---
/opt/venvs/matrix-synapse/lib/python3.7/site-packages/synapse/storage/persist_events.py:147:handle_queue_loop
/opt/venvs/matrix-synapse/lib/python3.7/site-packages/twisted/internet/defer.py:1416:_inlineCallbacks
/opt/venvs/matrix-synapse/lib/python3.7/site-packages/twisted/python/failure.py:512:throwExceptionIntoGenerator
/opt/venvs/matrix-synapse/lib/python3.7/site-packages/synapse/storage/persist_events.py:257:persisting_queue
/opt/venvs/matrix-synapse/lib/python3.7/site-packages/twisted/internet/defer.py:1418:_inlineCallbacks
/opt/venvs/matrix-synapse/lib/python3.7/site-packages/synapse/storage/persist_events.py:334:_persist_events
]]

Version information

@jomat
Copy link
Author

jomat commented Jan 15, 2020

Currently the DB grows ~30 GB per day which might be related to this.
Also some messages from other homeservers are missing since days.

@richvdh
Copy link
Member

richvdh commented Jan 15, 2020

that's not ideal.

These events are in room !rJBKGYMiDDhQwjWQLd:matrix.xwiki.com. What do you see for

select * from event_forward_extremities where room_id='!rJBKGYMiDDhQwjWQLd:matrix.xwiki.com';

@jomat
Copy link
Author

jomat commented Jan 15, 2020

synapse=> select * from event_forward_extremities where room_id='!rJBKGYMiDDhQwjWQLd:matrix.xwiki.com';
              event_id              |               room_id                
------------------------------------+--------------------------------------
 $15710505611767569NWgbz:matrix.org | !rJBKGYMiDDhQwjWQLd:matrix.xwiki.com
(1 row)

The insane disk space consumption stopped some hours ago, but the "Failed to get auth chain" followed by "No forward extremities left!" errors still spam the log every second.
I called the purge_history API on !QtykxKocfZaZOUrTwp:matrix.org (status: failed) and !cURbafjkfsMDVwdRDQ:matrix.org (status: complete), deleted a duplicate entry in user_directory_search, did a vacuum full analyze and restarted synapse in the meantime - I don't know if anything of that had to do with it, but I will keep you updated if the high disk space usage starts again.

@jomat
Copy link
Author

jomat commented Jan 15, 2020

!rJBKGYMiDDhQwjWQLd:matrix.xwiki.com = #cryptpad:matrix.xwiki.com

I joined this room a while ago, the last event I see is on Oct 18 2019:

{
  "event_id": "$15714017940PjWjS:dou.bet",
  "sender": "@daniel:dou.bet",
  "content": {
    "body": "Hello",
    "msgtype": "m.text"
  },
  "type": "m.room.message",
  "origin_server_ts": 1571401794094,
  "unsigned": {
    "age": 7529748652
  },
  "room_id": "!rJBKGYMiDDhQwjWQLd:matrix.xwiki.com"
}

I just joined with another account on matrix.org and realized there where conversations until today which I am missing on my HS.

@richvdh
Copy link
Member

richvdh commented Jan 15, 2020

I think I need better logs to investigate this. Can you grep your logs for PUT-1882184 (or another request which results in the No forward extremities exception) and paste what you see somewhere?

@jomat
Copy link
Author

jomat commented Jan 15, 2020

The log already rotated out, but I saved another one here: https://usercontent.asra.gr/homeserver.log.5

@richvdh
Copy link
Member

richvdh commented Jan 16, 2020

I've spent a while working with @jomat to try to figure out what was going on here.

We found that, in room !zZxICgDdJhCObEHSQs:matrix.org, his database had $157140868037190ZYwHi:matrix.org (depth 60507) as the sole forward extremity, even though he also has $1571423829181264BZYeu:matrix.org (its successor, depth 60508) as a valid event. This leads to the question of why $1571423829181264BZYeu:matrix.org was not a forward extremity.

Indeed, he also had $157143015536248bOTgi:matrix.org (depth 60512), so we can also wonder why that was not a forward extremity.

It's worth noting that $1571423829181264BZYeu:matrix.org was originally received as an outlier and later de-outliered, which at least explains why $157140868037190ZYwHi:matrix.org was still an extremity (#6607).

It appears that jomat has been running one of the DELETE queries from #1760, which I think has been deleting forward extremities it shouldn't have. In particular, the query only preserves the most recent forward extremity based on stream_ordering, and due to reasons, his homeserver received $157140868037190ZYwHi:matrix.org after either of the other two, so gave it a later stream_ordering.

In short: $157140868037190ZYwHi:matrix.org shouldn't really have been a forward extremity, and was caused by #6607. Once jomat ran the DELETE query from #1760, it became the only forward extremity, so the only forward extremity in the room was invalid, leading to the error message in the subject of this bug.

TLDR: this is an unfortunate interaction between #6607 and the DELETE queries on #1760.


To put it back together, we did (the equivalent of):

insert into event_forward_extremities (event_id, room_id)
select e.event_id, e.room_id from events e join event_json ej using (event_id) left join rejections r using (event_id) where 
e.room_id='!zZxICgDdJhCObEHSQs:matrix.org' 
and not e.outlier and r.event_id is null
and ej.internal_metadata::json->'soft_failed' is null
order by topological_ordering desc limit 1;

... and restarted synapse.

@richvdh richvdh closed this as completed Jan 17, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants