Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

core: don't fail match on misses and redeem errors #772

Merged
merged 5 commits into from
Oct 27, 2020

Conversation

buck54321
Copy link
Member

@buck54321 buck54321 commented Oct 22, 2020

dexc

  • Set default log level to debug.

core

  • Allow retries of failed swaps until the broadcast timeout has expired
  • Allow retries of redemptions frequently until the broadcast timeout, and less frequently afterwards until resolution.
  • Do not group swaps or redemptions for retries.
  • Retries on failed swaps and redeems are metered so that they don't run every tick, but reconfiguring a wallet clears the tick meterer.

@jholdstock
Copy link
Member

@buck54321 made this change because of an issue I encountered - my redeem contract did not broadcast successfully because my wallet was locked. I unlocked the wallet, but dexc never retried the broadcast. Running with the changes in this PR resolved the issue - dexc retried the broadcast until it succeeded.

Thanks for the support!

@chappjc
Copy link
Member

chappjc commented Oct 22, 2020

Thanks jholdstock. You had a peculiar issue with bitcoind reporting unlocked via getwalletinfo, which seems like a bitcoind issue, but we'll consider ways to be more robust. Encrypting an unencrypted wallet while dexc is running seems to be trouble that we'll work around too.

@buck54321 buck54321 force-pushed the failerr branch 2 times, most recently from 4b92346 to 9c92b98 Compare October 25, 2020 00:50
client/core/trade.go Outdated Show resolved Hide resolved
client/core/trade.go Outdated Show resolved Hide resolved
client/core/trade.go Outdated Show resolved Hide resolved
@buck54321 buck54321 marked this pull request as ready for review October 25, 2020 18:12
@chappjc chappjc added this to the 0.1.1 milestone Oct 26, 2020
@chappjc chappjc changed the base branch from release-0.1 to master October 26, 2020 18:07
@chappjc
Copy link
Member

chappjc commented Oct 26, 2020

Just changed base branch to master. We'll cherry-pick from master into release-0.1 instead of the other way around. The conflict will go away when rebasing on master

@@ -162,6 +179,16 @@ func (t *trackedTrade) broadcastTimeout() time.Duration {
return time.Millisecond * time.Duration(t.dc.cfg.BroadcastTimeout)
}

// delayTicks sets the tickGovernor to prevent retrying to quickly after an
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

too quickly

Comment on lines 1141 to 1152
c.swapMatchGroup(t, []*matchTracker{m}, errs)
}
if len(groupables) > 0 {
c.swapMatchGroup(t, groupables, errs)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking about if we should have the groupables go first since these are all in series now, and if one of the suspect ones takes too long (something is funny with them after all and the server request or something else could be slower than usual), the ok ones in groupables might take too long to get to.

Comment on lines 1250 to 1257
lastActionTime := match.matchTime()
if match.Match.Side == order.Taker {
// It is possible that AuditStamp could be zero if we're
// recovering during startup. The implications of this is that
// if we are 1) already in start up recovery, and 2) we fail
// again, we will not try again even if we have time before the
// broadcast timeout.
lastActionTime = encode.UnixTimeMilli(int64(match.MetaData.Proof.Auth.AuditStamp))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

during startup -> after reconnect? or just "if we missed the audit request"

The audit request may be missed while disconnected, and we'd get here after reconnect->match_status resolution, not just restart of dexc. I think we just observed this, although in the maker redeem situation where the audit was of taker's contract.

Comment on lines 1259 to 1273
if time.Since(lastActionTime) < t.broadcastTimeout() {
t.delayTicks(match, t.dc.tickInterval*3/4)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If they missed the audit request, this would jump to setting swapErr even though there's possibly still time to try again.

If we don't know the audit time stamp, we'd ideally figure out the time that the counter party's contract reached swapConf. That's trivial or free of backend RPCs, so we can at least set lastActionTime to match.matchTime() + something or even time.Now - something if we don't have an AuditStamp.

Another idea is to always allow at least 1 retry if AuditStamp is zero. To do that might require putting the bogus value mentioned above in AuditStamp though.

Comment on lines 1440 to 1441
// we will not try again for an hour even if we have time before the
// broadcast timeout.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for an hour 5 minutes

Comment on lines 1475 to 1476
(m.suspectSwap && t.wallets.fromAsset.ID == assetID ||
(m.suspectRedeem && t.wallets.toAsset.ID == assetID)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing parenthesis around the m.suspectSwap && t.wallets.fromAsset.ID == assetID.

Perhaps this would read better like:

	isFromAsset := t.wallets.fromAsset.ID == assetID
	for _, m := range t.matches {
		if m.tickGovernor != nil &&
			((m.suspectSwap && isFromAsset) || (m.suspectRedeem && !isFromAsset)) {

@chappjc chappjc merged commit bd3d828 into decred:master Oct 27, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants