CassandraSinkCluster lost messages #1843

rukai · 2024-11-26T01:52:57Z

Lost messages can occur as a result of shotover closing outgoing connections as part of its logic for handing use statements.
Cassandra USE statements set per connection state. So to avoid issues where an incoming connection has some connections with different states, we close all outgoing connections and reopen them with the new USE keyspace.
Running USE on the existing connections, without closing them, would be better, but it would require significant refactors to KafkaSinkCluster to avoid returning the duplicate responses to the client.

Consider the following scenario which is causing intermittent test failures in CI and locally:

client sends use statement
client sends prepare
shotover duplicates prepare requests, there is now a use statement in between some of the duplicated prepare requests.
shotover sends one prepare request .
shotover clears open connections for use statement, this closes the connection which we are currently waiting for a response on.
shotover sends the other 2 prepare requests.
Shotover only receives 2/3 of the prepare responses so it never responds to the client with the combined prepare response.
The client eventually times out after 10s.

On my local machine I can reproduce this scenario by running cargo nextest run cassandra_int_tests::cassandra_5_cluster::case_2_cdrs in a loop within 10 tries.

But it should be possible to reproduce the issue by simply doing:

send query
send use statement
shotover sends query to a connection
shotover kills all outgoing connections as per use logic.
1. the response to the query is lost.
client times out waiting for response to query

Possible solution

The simplest possible solution is to flush the outgoing connections before killing them as part of the USE statement logic.

The text was updated successfully, but these errors were encountered:

rukai added the bug Something isn't working label Nov 26, 2024

rukai mentioned this issue Nov 26, 2024

CassandraSinkCluster fix lost messages #1845

Merged

rukai closed this as completed in #1845 Nov 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CassandraSinkCluster lost messages #1843

CassandraSinkCluster lost messages #1843

rukai commented Nov 26, 2024 •

edited

Loading

CassandraSinkCluster lost messages #1843

CassandraSinkCluster lost messages #1843

Comments

rukai commented Nov 26, 2024 • edited Loading

Possible solution

rukai commented Nov 26, 2024 •

edited

Loading