Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

client/core: OrderBook update notifications races #1

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

norwnd
Copy link
Owner

@norwnd norwnd commented Mar 7, 2023

This PR supersedes decred#2203.

Targeting 2 separate things:

  • 1st commit is mostly about applying suggestions from client/dex: candles race decred/dcrdex#2188 (passing immutable data over chan when possible)
  • 2nd commit is about dc.booksMtx, clarifying a lot of non-trivial stuff around it and handling sever subscribe/resubscribe/suspend flow such that server order book is observed in consistent state all the time by dexc client (and also feed receivers that subscribe to dexc order book feeds, which is just as important); that means it should fix order book notifications being dropped occasionally
  • 3rd, ... commits are just minor clarifications and will be squashed into 2nd commit eventually

Comment on lines 1174 to 1180
// Acquiring dc.booksMtx here in dc.bookie serves as a barrier for notifications that
// come after we sent new subscription request (e.i. notes that need to be applied
// on top of snapshot we get from new subscription request will wait on that mutex
// here). Note, we don't acquire dc.booksMtx for the whole handleBookOrderMsg
// duration; that leaves some room for notifications from previous subscription to
// come and apply concurrently with us applying snapshot we just got from new
// subscription request.
// TODO: I'm not sure at the moment whether is OK to execute this notification from
// old subscription while Resetting order book concurrently. Could this result into
// mixed OrderBook state (for orders, candles cache, or something else) ?
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would like to know answer to that TODO question (that's in handleEpochReportMsg func).

Copy link
Owner Author

@norwnd norwnd Mar 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really see anything in epoch_report notification that would need to be atomic with respect to current orders in the book or seq number, so probably don't even need this "barrier" here (and are locking dc.booksMtx in dc.bookie just to get the bookie). Acted on TODO from above accordingly.

Comment on lines 1205 to 1210
// Acquiring dc.booksMtx here in dc.bookie serves as a barrier for notifications that
// come after we sent new subscription request (e.i. notes that need to be applied
// on top of snapshot we get from new subscription request will wait on that mutex
// here). Note, we don't acquire dc.booksMtx for the whole handleBookOrderMsg
// duration; that leaves some room for notifications from previous subscription to
// come and apply concurrently with us applying snapshot we just got from new
// subscription request.
// TODO: I'm not sure at the moment whether is OK to execute this notification from
// old subscription while Resetting order book concurrently. Could this result into
// mixed OrderBook state (for orders, candles cache, or something else) ?
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would like to know answer to that TODO question (that's in handleEpochOrderMsg func).

Copy link
Owner Author

@norwnd norwnd Mar 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Epoch orders don't seem to be interfering with outstanding orders, so looks like epoch_order notification is meant to be executing concurrently with the rest order book updates (which it currently is both on master and in this PR).

Copy link
Owner Author

@norwnd norwnd Mar 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On closer inspection, it seems epoch orders (epoch_orders notification and other related stuff such as epochQueues, currentEpoch and proofedEpoch) are kinda half-baked now:

  • it doesn't have anything to do with orders or seq, but for some reason seq is incremented with each epoch_orders notification; we should probably remove seq usage by epoch_orders notification
  • but similar to orders<->seq atomic relationship epochQueues<-> currentEpoch<->proofedEpoch could benefit from atomicity because currently these fall out of sync quite often, which results in spamming something like this in logs (and what's more important long term, does not allow client to differentiate between server sending invalid match proof note and client's view on epoch orders being out of sync = non-atomic):
2023-03-09 05:28:53.450 [ERR] CORE: Route 'match_proof' notification handler error (DEX dex.decred.org:7232): match proof validation failed: epoch 167833983 match proof note references 1 orders, but local epoch queue is empty
...
2023-03-09 14:41:50.809 [ERR] CORE: Route 'match_proof' notification handler error (DEX dex.decred.org:7232): match proof validation failed: epoch 167837301 match proof note references 3 orders, but local epoch queue has 2

that atomicity isn't trivial to implement though ^ (and whether it should be done with current subscribe/resubscribe mechanisms or have a completely separate parallel machinery working is something to investigate) and it's not high priority any time soon; so I'm downgrading log level to Trace in this PR and supply relevant comments there too since it isn't useful to anyone to see as error the way it is currently implemented. Acted on TODO from above accordingly too.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By the way, saw relevant discussions somewhere around decred#278 (review), but nothing conclusive.

// Return any non-nil error, but still revoke purged orders (done already at
// this point) and clear the book.
Copy link
Owner Author

@norwnd norwnd Mar 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it OK to revoke all active orders before we Reset order book ? We might want to do it to not hold dc.booksMtx longer than necessary.

@norwnd
Copy link
Owner Author

norwnd commented Mar 7, 2023

I didn't test this change extensively just yet.

@norwnd
Copy link
Owner Author

norwnd commented Mar 7, 2023

I think this is also likely fixing a reason (if not The reason) for decred#1543

if we get order book snapshot right after order qty got updated to 0 here and drop unbook_order during resubscribe (note, update_remaining can come before order book snapshot is taken, in which case it will apply to the previous order book version, which is gonna be erased by the new snapshot that's coming since we issued resubscribe, but unbook_order can drop next anyway) ... then you get an order (that should have been unbooked) hanging with 0 qty in order book.

When testing I saw ghost order of non-zero quantity, meaning when resubscribe finished I saw 3 orders in the book on 1st dexc client (in UI) when on the 2nd client there were only 2 left (since I successfully canceled one, while 1st client was resubscribing). But the idea would be same. Could try to reproduce on master by adding simple delays in relevant places: decred@c2cfea2.

@norwnd
Copy link
Owner Author

norwnd commented Mar 8, 2023

So far, this PR solves missing update notifications caused by insufficient dc.booksMtx locking. But it does not prevent "old notifications = from previous subscription" from being applied on top of new snapshot we get when resubscribing. And #2 doesn't help with that either because (as is currently implemented) we don't just drop those old notifications (even though they have seq <= current OrderBook.seq value), we log it as an error and apply state change anyway (except for updating seq).

This seems like a bug to me. I think state change shouldn't apply in that case, not just be logged as error, shall I change that (probably better to split it off #4) ? The comments/changes I've added in this PR so far assume just drop too.

@norwnd norwnd force-pushed the client-dex-book-update-notification-races branch from edf407d to 6bbb728 Compare March 8, 2023 11:27
@norwnd norwnd changed the title client/dex: OrderBook update notifications races client/core: OrderBook update notifications races Mar 8, 2023
This is caused by insufficient dc.booksMtx locking, this change takes care of:
- clarifying a lot of non-trivial stuff around dc.booksMtx
- handling client->sever subscribe/resubscribe/suspend flow such that server order book is observed in consistent state all the time by dexc client (and also feed receivers that subscribe to dexc order book feeds, which is just as important); that means it should fix order book notifications being dropped occasionally
@norwnd norwnd force-pushed the client-dex-book-update-notification-races branch from 6bbb728 to 87e1108 Compare March 9, 2023 05:32
@norwnd
Copy link
Owner Author

norwnd commented Mar 9, 2023

Just moved some unrelated comments out of this PR somewhere else in that last force-push ^.

@norwnd norwnd force-pushed the client-dex-book-update-notification-races branch from c1bcd0d to cee0a95 Compare March 9, 2023 16:13
@norwnd norwnd force-pushed the master branch 3 times, most recently from e3c487c to 333e9f6 Compare January 18, 2025 13:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant