Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

State Recovery Swapd Implementation #485

Merged
merged 33 commits into from
Jun 30, 2022

Conversation

TheCharlatan
Copy link
Member

@TheCharlatan TheCharlatan commented Jun 9, 2022

This adds swapd state serialization, checkpointing and recovery.

The current workflow for restoration is only for cases where farcasterd is restarted. Peer reconnection is also not handled.

I left out of scope for now checking if we are overwriting a later state on recovery. For the current workflow it does not add much value and I don't want this pull request to have an even larger scope.

@TheCharlatan TheCharlatan marked this pull request as draft June 9, 2022 13:11
@codecov-commenter
Copy link

codecov-commenter commented Jun 9, 2022

Codecov Report

Merging #485 (18a9d92) into main (a2f8071) will decrease coverage by 0.8%.
The diff coverage is 0.5%.

@@           Coverage Diff           @@
##            main    #485     +/-   ##
=======================================
- Coverage   12.3%   11.5%   -0.8%     
=======================================
  Files         34      34             
  Lines       9033    9631    +598     
=======================================
+ Hits        1111    1112      +1     
- Misses      7922    8519    +597     
Impacted Files Coverage Δ
src/farcasterd/runtime.rs 0.0% <0.0%> (ø)
src/rpc/request.rs 15.2% <0.0%> (-0.1%) ⬇️
src/swapd/runtime.rs 0.0% <0.0%> (ø)
src/swapd/swap_state.rs 0.0% <0.0%> (ø)
src/swapd/syncer_client.rs 0.0% <0.0%> (ø)
src/swapd/temporal_safety.rs 0.0% <0.0%> (ø)
src/walletd/runtime.rs 0.0% <0.0%> (ø)
src/databased/runtime.rs 26.9% <3.7%> (-3.4%) ⬇️
src/rpc/mod.rs 22.2% <100.0%> (-2.8%) ⬇️
... and 9 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a2f8071...18a9d92. Read the comment docs.

@TheCharlatan TheCharlatan force-pushed the state_recovery3 branch 2 times, most recently from 4fecc22 to 61b6efb Compare June 9, 2022 21:16
@TheCharlatan
Copy link
Member Author

I had a successful state restoration with the latest commit. I made and took an offer, funded it, terminated farcasterd, started farcasterd, restored the checkpoint and eventually got the amount refunded again. I'm going to keep on polishing and testing and on this pull request while I wait for review and merge of the others.

@TheCharlatan TheCharlatan force-pushed the state_recovery3 branch 2 times, most recently from a881cba to 344a9ec Compare June 15, 2022 09:25
@TheCharlatan TheCharlatan force-pushed the state_recovery3 branch 6 times, most recently from b7cae58 to 21046fe Compare June 22, 2022 08:52
This was referenced Jun 23, 2022
@TheCharlatan TheCharlatan marked this pull request as ready for review June 26, 2022 14:26
@TheCharlatan TheCharlatan linked an issue Jun 26, 2022 that may be closed by this pull request
src/walletd/runtime.rs Outdated Show resolved Hide resolved
src/rpc/request.rs Outdated Show resolved Hide resolved
Copy link
Member

@zkao zkao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since there are proper tests written already testing the added functionality, I would like to explore and test more on a fork of this branch

I guess I understood enough of the overall architecture, and the most of the intends on the code

ServiceBus::Msg,
ServiceId::Farcasterd,
ServiceId::Swap(swap_id.clone()),
Request::Hello,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not convinced reusing Request::Hello is better than creating a new Request::NewVariant, that inherits no further semantics

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

farcasterd being the broker will send (or only receive?) this Hello request on the background as well

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we can add a ConnectionTest?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure

Comment on lines +968 to +971
let CheckpointEntry {
public_offer,
trade_role,
..
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if the way we launch swapd is appropriate on the context of state recovery

passing public_offer and trade_role made sense previously, but should probably go

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's nice to have these, so you can retrieve them when you do GetInfo on the swap.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my point is not about not having them. It's about not passing them as command line args, passing them through the Ctl bus instead

Comment on lines +1014 to +1020
let _child = launch(
"swapd",
&[
swap_id.to_hex(),
public_offer.to_string(),
trade_role.to_string(),
],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unconvinced weather public_offer and trade_role should be command line args here

src/swapd/runtime.rs Show resolved Hide resolved
src/swapd/runtime.rs Show resolved Hide resolved
self.txs.remove(&TxLabel::Buy);
self.txs.remove(&TxLabel::Cancel);
self.txs.remove(&TxLabel::Punish);
}
TxLabel::Buy
if self.temporal_safety.final_tx(*confirmations, Coin::Bitcoin)
&& self.state.a_refundsig()
&& self.state.a_buy_published() =>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this being removed? related to recovering state? and not yet knowing that you published buy? alice must checkpoint before publishing buy and after, so if a_buy_published() returns false after recovery, there is a bug

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I get it. Checkpoint is not updated after checkpoint, but please confirm!

Not sure that is a bug. I was trying to be super strict on state transitions, but for state recovery it may justify it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If that is the case, I find it more correct to set buy_published to true before checkpointing

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I get it. Checkpoint is not updated after checkpoint, but please confirm!

Yeah, that is it.

I need to understand what we are protecting against by checking a_buy_published? How can this case be hit in the first place without it being published?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it does not look like a_buy_published is safety critical

It especially prevents us from publishing buy. the a_buy_published() == false protects us from not doing stuff out of order

How can this case be hit in the first place without it being published?

For me if it happens that we find Buy tx and are in a state that a_buy_published == false, we have a bug that must be addressed, because the protocol state and swapd are out of sync. This is why I enjoy swapd being very strict on its state evolution. But I agree this one specifically can go.

src/swapd/runtime.rs Outdated Show resolved Hide resolved
src/swapd/runtime.rs Show resolved Hide resolved
src/walletd/runtime.rs Show resolved Hide resolved
src/walletd/runtime.rs Show resolved Hide resolved
@TheCharlatan
Copy link
Member Author

Thank you for the great review zkao. I will attempt to address your comments first in writing and will follow-up with some code later.

@zkao zkao force-pushed the state_recovery3 branch from c47b0e4 to c9c7d62 Compare June 29, 2022 14:46
Copy link
Member

@zkao zkao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, continue deving in other PRs

src/walletd/runtime.rs Show resolved Hide resolved
src/swapd/runtime.rs Show resolved Hide resolved
src/swapd/runtime.rs Show resolved Hide resolved
self.txs.remove(&TxLabel::Buy);
self.txs.remove(&TxLabel::Cancel);
self.txs.remove(&TxLabel::Punish);
}
TxLabel::Buy
if self.temporal_safety.final_tx(*confirmations, Coin::Bitcoin)
&& self.state.a_refundsig()
&& self.state.a_buy_published() =>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it does not look like a_buy_published is safety critical

It especially prevents us from publishing buy. the a_buy_published() == false protects us from not doing stuff out of order

How can this case be hit in the first place without it being published?

For me if it happens that we find Buy tx and are in a state that a_buy_published == false, we have a bug that must be addressed, because the protocol state and swapd are out of sync. This is why I enjoy swapd being very strict on its state evolution. But I agree this one specifically can go.

Comment on lines +275 to +291
let res: Result<usize, strict_encoding::Error> =
self.txs.iter().try_fold(len, |mut acc, (key, val)| {
acc += key.strict_encode(&mut e).map_err(|err| {
strict_encoding::Error::DataIntegrityError(format!("{}", err))
})?;
acc += val.strict_encode(&mut e).map_err(|err| {
strict_encoding::Error::DataIntegrityError(format!("{}", err))
})?;
Ok(acc)
});
len = match res {
Ok(val) => Ok(val),
Err(err) => Err(strict_encoding::Error::DataIntegrityError(format!(
"{}",
err
))),
}?;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was my understanding that we are not be doing the (high level) data structure encoding manually, just encode the low level types, and derive the high level ones using the derive macros

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high level here being HashMap and the low level the keys and values

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i guess the data type of interest is List<T> from request.rs that is just a wrapper over vectors

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not find a derive for HashMap, so I rolled this myself. I guess we can create a wrapper type for a HashMap, implement the strict encoding for it and then use it here.

src/swapd/runtime.rs Show resolved Hide resolved
@zkao zkao merged commit dcc9b06 into farcaster-project:main Jun 30, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

state management implementation
3 participants