Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sent SyncRequest not arriving at peer #391

Closed
sandreae opened this issue May 26, 2023 · 6 comments · Fixed by #387
Closed

Sent SyncRequest not arriving at peer #391

sandreae opened this issue May 26, 2023 · 6 comments · Fixed by #387

Comments

@sandreae
Copy link
Member

No description provided.

@sandreae sandreae changed the title Duplicate replication session requests cause locked state (and no replication) Duplicate replication session requests cause locked state (no replication) May 26, 2023
@sandreae
Copy link
Member Author

Adding more error types and logging to try and observe the flow for handling duplicate sessions better.

@sandreae sandreae linked a pull request May 26, 2023 that will close this issue
4 tasks
@sandreae
Copy link
Member Author

sandreae commented May 26, 2023

Replication between two peers where one has schema + entries the other doesn't.

PEER A:

[2023-05-26T13:13:19Z INFO  aquadoggo::manager] Start network service
[2023-05-26T13:13:19Z INFO  aquadoggo::network::swarm] Network service peer ID: 12D3KooWHj2DTSmVHNoNi7qzkKzvbDtEW5NggWq3gH5d3z1N6Voh
[2023-05-26T13:13:19Z DEBUG aquadoggo::network::behaviour] AutoNAT network behaviour enabled
[2023-05-26T13:13:19Z DEBUG aquadoggo::network::behaviour] mDNS network behaviour enabled
[2023-05-26T13:13:19Z INFO  aquadoggo::network::service] Network service is ready
[2023-05-26T13:13:19Z INFO  aquadoggo::manager] Start replication service
[2023-05-26T13:13:19Z INFO  aquadoggo::network::service] Listening on /ip4/127.0.0.1/udp/2022/quic-v1
[2023-05-26T13:13:19Z INFO  aquadoggo::network::service] Listening on /ip4/192.168.1.87/udp/2022/quic-v1
[2023-05-26T13:13:19Z INFO  aquadoggo::network::service] Listening on /ip4/172.17.0.1/udp/2022/quic-v1
[2023-05-26T13:13:19Z DEBUG aquadoggo::network::service] IncomingConnection: /ip4/0.0.0.0/udp/2022/quic-v1 /ip4/192.168.1.87/udp/2023/quic-v1
[2023-05-26T13:13:19Z DEBUG aquadoggo::network::service] IncomingConnection: /ip4/0.0.0.0/udp/2022/quic-v1 /ip4/192.168.1.87/udp/2023/quic-v1
[2023-05-26T13:13:19Z INFO  aquadoggo::network::service] ConnectionEstablished: 12D3KooWFz7bKfrhMtPbJnSA6Z8TXTbLYwVayWXBxAjuZqHxWCrx Listener { local_addr: "/ip4/0.0.0.0/udp/2022/quic-v1", send_back_addr: "/ip4/192.168.1.87/udp/2023/quic-v1" } 1
[2023-05-26T13:13:19Z INFO  aquadoggo::replication::service] Connection established with peer: 12D3KooWFz7bKfrhMtPbJnSA6Z8TXTbLYwVayWXBxAjuZqHxWCrx
[2023-05-26T13:13:19Z DEBUG aquadoggo::replication::service] Re-initiate replication with: 12D3KooWFz7bKfrhMtPbJnSA6Z8TXTbLYwVayWXBxAjuZqHxWCrx
[2023-05-26T13:13:19Z INFO  aquadoggo::replication::manager] Initiate outbound replication session with peer PeerId("12D3KooWFz7bKfrhMtPbJnSA6Z8TXTbLYwVayWXBxAjuZqHxWCrx")
[2023-05-26T13:13:19Z TRACE aquadoggo::replication::service] Sending message on service channel: SentReplicationMessage(PeerId("12D3KooWFz7bKfrhMtPbJnSA6Z8TXTbLYwVayWXBxAjuZqHxWCrx"), SyncMessage(0, SyncRequest(Naive, TargetSet([SchemaDefinition(1), SchemaFieldDefinition(1), Application(SchemaName("band"), DocumentViewId([OperationId(Hash("00207c0dec3322d78efc419432c4a215b3635cb17b3cf20674ff47ec3aed0fa3f24e"))])), Application(SchemaName("event"), DocumentViewId([OperationId(Hash("0020c3560ffcfa50f14ab01740cd8f832e0fb35035eeb8cd4aca124bac2d42a47a2a"))])), Application(SchemaName("venue"), DocumentViewId([OperationId(Hash("0020ca1d2b85567443b2db42bba286c07fc64b1a46f2d3e67ba819550b09e18eccd0"))]))]))))
[2023-05-26T13:13:19Z TRACE aquadoggo::replication::service] Received SyncMessage: SyncMessage(0, SyncRequest(Naive, TargetSet([SchemaDefinition(1), SchemaFieldDefinition(1)])))
[2023-05-26T13:13:19Z INFO  aquadoggo::replication::manager] Initiate inbound replication session with peer PeerId("12D3KooWFz7bKfrhMtPbJnSA6Z8TXTbLYwVayWXBxAjuZqHxWCrx")
^C[2023-05-26T13:13:23Z INFO  aquadoggo::manager] Received shutdown signal

PEER B

[2023-05-26T13:13:19Z INFO  aquadoggo::manager] Start network service
[2023-05-26T13:13:19Z INFO  aquadoggo::network::swarm] Network service peer ID: 12D3KooWFz7bKfrhMtPbJnSA6Z8TXTbLYwVayWXBxAjuZqHxWCrx
[2023-05-26T13:13:19Z DEBUG aquadoggo::network::behaviour] AutoNAT network behaviour enabled
[2023-05-26T13:13:19Z DEBUG aquadoggo::network::behaviour] mDNS network behaviour enabled
[2023-05-26T13:13:19Z INFO  aquadoggo::network::service] Network service is ready
[2023-05-26T13:13:19Z INFO  aquadoggo::manager] Start replication service
[2023-05-26T13:13:19Z INFO  aquadoggo::network::service] Listening on /ip4/127.0.0.1/udp/2023/quic-v1
[2023-05-26T13:13:19Z DEBUG aquadoggo::network::service] mDNS discovered a new peer: 12D3KooWHj2DTSmVHNoNi7qzkKzvbDtEW5NggWq3gH5d3z1N6Voh
[2023-05-26T13:13:19Z DEBUG aquadoggo::network::service] mDNS discovered a new peer: 12D3KooWHj2DTSmVHNoNi7qzkKzvbDtEW5NggWq3gH5d3z1N6Voh
[2023-05-26T13:13:19Z INFO  aquadoggo::network::service] Listening on /ip4/192.168.1.87/udp/2023/quic-v1
[2023-05-26T13:13:19Z INFO  aquadoggo::network::service] Listening on /ip4/172.17.0.1/udp/2023/quic-v1
[2023-05-26T13:13:19Z INFO  aquadoggo::network::service] ConnectionEstablished: 12D3KooWHj2DTSmVHNoNi7qzkKzvbDtEW5NggWq3gH5d3z1N6Voh Dialer { address: "/ip4/192.168.1.87/udp/2022/quic-v1", role_override: Dialer } 1
[2023-05-26T13:13:19Z INFO  aquadoggo::replication::service] Connection established with peer: 12D3KooWHj2DTSmVHNoNi7qzkKzvbDtEW5NggWq3gH5d3z1N6Voh
[2023-05-26T13:13:19Z DEBUG aquadoggo::replication::service] Re-initiate replication with: 12D3KooWHj2DTSmVHNoNi7qzkKzvbDtEW5NggWq3gH5d3z1N6Voh
[2023-05-26T13:13:19Z INFO  aquadoggo::replication::manager] Initiate outbound replication session with peer PeerId("12D3KooWHj2DTSmVHNoNi7qzkKzvbDtEW5NggWq3gH5d3z1N6Voh")
[2023-05-26T13:13:19Z TRACE aquadoggo::replication::service] Sending message on service channel: SentReplicationMessage(PeerId("12D3KooWHj2DTSmVHNoNi7qzkKzvbDtEW5NggWq3gH5d3z1N6Voh"), SyncMessage(0, SyncRequest(Naive, TargetSet([SchemaDefinition(1), SchemaFieldDefinition(1)]))))
^C[2023-05-26T13:13:22Z INFO  aquadoggo::manager] Received shutdown signal

@sandreae
Copy link
Member Author

Seems like this:

[2023-05-26T13:13:19Z TRACE aquadoggo::replication::service] Sending message on service channel: SentReplicationMessage(PeerId("12D3KooWFz7bKfrhMtPbJnSA6Z8TXTbLYwVayWXBxAjuZqHxWCrx"), SyncMessage(0, SyncRequest(Naive, TargetSet([SchemaDefinition(1), SchemaFieldDefinition(1), Application(SchemaName("band"), DocumentViewId([OperationId(Hash("00207c0dec3322d78efc419432c4a215b3635cb17b3cf20674ff47ec3aed0fa3f24e"))])), Application(SchemaName("event"), DocumentViewId([OperationId(Hash("0020c3560ffcfa50f14ab01740cd8f832e0fb35035eeb8cd4aca124bac2d42a47a2a"))])), Application(SchemaName("venue"), DocumentViewId([OperationId(Hash("0020ca1d2b85567443b2db42bba286c07fc64b1a46f2d3e67ba819550b09e18eccd0"))]))]))))

Never arrives at peer B....

@sandreae
Copy link
Member Author

sandreae commented May 26, 2023

Ah!! The problem is in the diffing logic 👏 👏 👏 👏 👏 👏

There was an issue in the diffing logic, but solving that didn't fix this issue.

@sandreae
Copy link
Member Author

Fairly consistently getting this connection error for both peers, I'm assuming the session is timing out cos no messages are being sent:

[2023-05-26T17:06:29Z INFO  aquadoggo::replication::service] Connection established with peer: 12D3KooWKXm5p5JgU12tX9y3NXcYB2yPiNU1UBM3L3fzL5oerepn
[2023-05-26T17:06:29Z WARN  aquadoggo::replication::service] Duplicate established connection encountered
[2023-05-26T17:06:29Z DEBUG aquadoggo::replication::service] Initiate replication with: 12D3KooWKXm5p5JgU12tX9y3NXcYB2yPiNU1UBM3L3fzL5oerepn
[2023-05-26T17:06:29Z INFO  aquadoggo::replication::manager] Initiate outbound replication session with peer PeerId("12D3KooWKXm5p5JgU12tX9y3NXcYB2yPiNU1UBM3L3fzL5oerepn")
[2023-05-26T17:06:29Z WARN  aquadoggo::replication::service] Replication error: Duplicate session error: Tried to initialise duplicate outbound replication session for existing target set TargetSet([SchemaDefinition(1), SchemaFieldDefinition(1)])
[2023-05-26T17:06:34Z WARN  aquadoggo::network::service] OutgoingConnectionError: None Transport([("/ip4/172.17.0.1/udp/2022/quic-v1", Other(Custom { kind: Other, error: HandshakeTimedOut }))])
[2023-05-26T17:06:34Z WARN  aquadoggo::network::service] IncomingConnectionError: /ip4/0.0.0.0/udp/2023/quic-v1 /ip4/192.168.1.87/udp/2022/quic-v1 Transport(Other(Custom { kind: Other, error: HandshakeTimedOut }))
[2023-05-26T17:07:09Z INFO  aquadoggo::network::service] ConnectionClosed: 12D3KooWKXm5p5JgU12tX9y3NXcYB2yPiNU1UBM3L3fzL5oerepn Listener { local_addr: "/ip4/0.0.0.0/udp/2023/quic-v1", send_back_addr: "/ip4/192.168.1.87/udp/2022/quic-v1" } 1 Some(KeepAliveTimeout)
[2023-05-26T17:07:09Z INFO  aquadoggo::replication::service] Connection closed: remove sessions with peer: 12D3KooWKXm5p5JgU12tX9y3NXcYB2yPiNU1UBM3L3fzL5oerepn

@sandreae sandreae changed the title Duplicate replication session requests cause locked state (no replication) Sent SyncRequest not arriving at peer May 29, 2023
@sandreae sandreae linked a pull request May 31, 2023 that will close this issue
4 tasks
@sandreae
Copy link
Member Author

The problem was that the message was failing when being deserialized on the receiving node and errors were being swallowed so we didn't notice.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant