-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
State Recovery Swapd Implementation #485
Conversation
Codecov Report
@@ Coverage Diff @@
## main #485 +/- ##
=======================================
- Coverage 12.3% 11.5% -0.8%
=======================================
Files 34 34
Lines 9033 9631 +598
=======================================
+ Hits 1111 1112 +1
- Misses 7922 8519 +597
Continue to review full report at Codecov.
|
4fecc22
to
61b6efb
Compare
I had a successful state restoration with the latest commit. I made and took an offer, funded it, terminated farcasterd, started farcasterd, restored the checkpoint and eventually got the amount refunded again. I'm going to keep on polishing and testing and on this pull request while I wait for review and merge of the others. |
a881cba
to
344a9ec
Compare
b7cae58
to
21046fe
Compare
93610cd
to
31adf58
Compare
31adf58
to
eaebb4f
Compare
eaebb4f
to
6304670
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since there are proper tests written already testing the added functionality, I would like to explore and test more on a fork of this branch
I guess I understood enough of the overall architecture, and the most of the intends on the code
ServiceBus::Msg, | ||
ServiceId::Farcasterd, | ||
ServiceId::Swap(swap_id.clone()), | ||
Request::Hello, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not convinced reusing Request::Hello is better than creating a new Request::NewVariant, that inherits no further semantics
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
farcasterd being the broker will send (or only receive?) this Hello request on the background as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess we can add a ConnectionTest
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure
let CheckpointEntry { | ||
public_offer, | ||
trade_role, | ||
.. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if the way we launch swapd is appropriate on the context of state recovery
passing public_offer
and trade_role
made sense previously, but should probably go
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's nice to have these, so you can retrieve them when you do GetInfo
on the swap.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my point is not about not having them. It's about not passing them as command line args, passing them through the Ctl bus instead
let _child = launch( | ||
"swapd", | ||
&[ | ||
swap_id.to_hex(), | ||
public_offer.to_string(), | ||
trade_role.to_string(), | ||
], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unconvinced weather public_offer
and trade_role
should be command line args here
self.txs.remove(&TxLabel::Buy); | ||
self.txs.remove(&TxLabel::Cancel); | ||
self.txs.remove(&TxLabel::Punish); | ||
} | ||
TxLabel::Buy | ||
if self.temporal_safety.final_tx(*confirmations, Coin::Bitcoin) | ||
&& self.state.a_refundsig() | ||
&& self.state.a_buy_published() => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this being removed? related to recovering state? and not yet knowing that you published buy? alice must checkpoint before publishing buy and after, so if a_buy_published()
returns false after recovery, there is a bug
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I get it. Checkpoint is not updated after checkpoint, but please confirm!
Not sure that is a bug. I was trying to be super strict on state transitions, but for state recovery it may justify it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If that is the case, I find it more correct to set buy_published to true before checkpointing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I get it. Checkpoint is not updated after checkpoint, but please confirm!
Yeah, that is it.
I need to understand what we are protecting against by checking a_buy_published
? How can this case be hit in the first place without it being published?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it does not look like a_buy_published is safety critical
It especially prevents us from publishing buy. the a_buy_published() == false
protects us from not doing stuff out of order
How can this case be hit in the first place without it being published?
For me if it happens that we find Buy tx and are in a state that a_buy_published == false, we have a bug that must be addressed, because the protocol state and swapd are out of sync. This is why I enjoy swapd being very strict on its state evolution. But I agree this one specifically can go.
Thank you for the great review zkao. I will attempt to address your comments first in writing and will follow-up with some code later. |
307d529
to
18a9d92
Compare
8a89085
to
8660617
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, continue deving in other PRs
self.txs.remove(&TxLabel::Buy); | ||
self.txs.remove(&TxLabel::Cancel); | ||
self.txs.remove(&TxLabel::Punish); | ||
} | ||
TxLabel::Buy | ||
if self.temporal_safety.final_tx(*confirmations, Coin::Bitcoin) | ||
&& self.state.a_refundsig() | ||
&& self.state.a_buy_published() => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it does not look like a_buy_published is safety critical
It especially prevents us from publishing buy. the a_buy_published() == false
protects us from not doing stuff out of order
How can this case be hit in the first place without it being published?
For me if it happens that we find Buy tx and are in a state that a_buy_published == false, we have a bug that must be addressed, because the protocol state and swapd are out of sync. This is why I enjoy swapd being very strict on its state evolution. But I agree this one specifically can go.
let res: Result<usize, strict_encoding::Error> = | ||
self.txs.iter().try_fold(len, |mut acc, (key, val)| { | ||
acc += key.strict_encode(&mut e).map_err(|err| { | ||
strict_encoding::Error::DataIntegrityError(format!("{}", err)) | ||
})?; | ||
acc += val.strict_encode(&mut e).map_err(|err| { | ||
strict_encoding::Error::DataIntegrityError(format!("{}", err)) | ||
})?; | ||
Ok(acc) | ||
}); | ||
len = match res { | ||
Ok(val) => Ok(val), | ||
Err(err) => Err(strict_encoding::Error::DataIntegrityError(format!( | ||
"{}", | ||
err | ||
))), | ||
}?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was my understanding that we are not be doing the (high level) data structure encoding manually, just encode the low level types, and derive the high level ones using the derive macros
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
high level here being HashMap and the low level the keys and values
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i guess the data type of interest is List<T>
from request.rs that is just a wrapper over vectors
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did not find a derive for HashMap, so I rolled this myself. I guess we can create a wrapper type for a HashMap, implement the strict encoding for it and then use it here.
This adds swapd state serialization, checkpointing and recovery.
The current workflow for restoration is only for cases where farcasterd is restarted. Peer reconnection is also not handled.
I left out of scope for now checking if we are overwriting a later state on recovery. For the current workflow it does not add much value and I don't want this pull request to have an even larger scope.