Logbook 2024 H1

September 2024

2024-09-20

SB on incremental commits off-chain

Yesterday evening after work I realized what could be the problem with the e2e test for the recover. As I mention we try to balance the tx twice, I should only return the transaction from the handlers and then make the hydra-node post in on client input.
Ok, I was right and now I got the recover e2e green! Yee-haw!
I'll include the recover workflow also as part of this pr which makes me happy.
Question arizes: should I also observe the recover transaction and then have a way for a cleanup in the local hydra state? I think so since we can't rely to do a cleanup on posting only.
For sure, even our protocol.md diagram says so.
I'll use the opportunity to clean up unused variables etc. while I add the recover observation.
One thing is: how to get the head id from recover transaction? Currently there seems to be no way which means we would need to check the recovered UTxO against whatever we have in the state related to deposits and if the two UTxO match then we could say recover is approved/finalized.
I ended up looking for head id in the resolved recover inputs, in the deposit datum and this worked nicely.
Tests are green and now it is time for a major cleanup.

2024-09-19

SB on incremental commits off-chain

I almost came to the end of the recover implementation but got uncaught exception: JSONException which was surprising.
Debugging a bit to see where this happens and it seems sometimes on the api level we get this error but I don't see why exactly would this be happening.
Realizing I've changed what I return from the recover endpoint - it is no longer the transaction but just the string OK. Now the recover transaction is posted by the hydra-node and not the user in the end.
After making sure all pieces are in the correct place so that I can recover a deposit I get ErrNoFuelUTxOFound error when trying to post recover transaction.
Very weird since my wallet UTxO has plenty of ada.
I found out that this is happening in mkChange function that is supposed to calculate the change and error out in case inputs are lower or equal to the outputs but there is a fat fixme in there that says we need to account for minimum UTxO value too.
Realizing this is just the outcome of some other problem I have (most probably in the Handlers when we construct and balance the recover transaction)
Recover is trying to spend the deposit output and produce as outputs whatever was deposited (outputs present in the deposit datum)
I need to think more deeply about the recover tx and how does it play with the internal wallet since I need to check if all inputs are properly resolved and how come there is ErrNotEnoughFunds error coming out of the wallet
It is time to gain some sanity again I think. One question is how come I didn't see this error before when working with the recover transaction? The answer might be that I prevously passed in the UTxO user wants to commit to the draftRecoverTx and now I rely on the datum in the deposit output.
In the wallet code it seems that the input balance is lower than the output balance by around 6 ada:

> 24687860 - 24681788
6,072.0000000000

What am I not accounting for here?

Could it be that the UTxO's are duplicated and I need to remove the deposit and commit UTxO from spendable UTxO before giving the tx to the wallet for balancing?
Good thing is that the amount seems to be always the same!
I tried several things and I think the bottom line is that we are balancing the recover transaction and then post it which should be doing the balance all over again. In my mind this operation should be idepontent but perhaps it is not?
If I just try to submit the recover transaction directly after assembling it I don't see any errors but I fail to query the wallet address to make sure the recover worked.
I really feel under the weather right now so I'll leave dealing with recovers for tomorrow or next time while I try to make the rest of the codebase into some revievable state for the team to look at.

2024-09-18

SB on incremental commits off-chain

We did a nice session this morning to see the current state of the PR and discover the next steps
These are the steps I wanted to outline that would be needed for the users to test out this new feature:

     ### Prepare the UTxO to deposit
     
     - How to make the deadline configurable for our internal purposes like testing?
     
     ### How to issue a http request to kick off the deposit process
     
     - Can we copy/paste the json utxo representation directly from one of the explorers?
     
     
     ### Sign and submit the deposit transaction 
      
     - cardano-cli example most probably
     
     ### Query the Head to see the deposited UTxO present on L2
      
     - Issue `GetUTxO` call to view the L2 UTxO state 
     
     
     ## How to recover deposited funds 
     
     
     ### Follow the deposit steps  
     
     - How to demonstrate recover if hydra-node will pick up the deposit as soon as it is observed?
     
     - Play around with the network so one participant doesn't see the new snapshot request?
     
     ### Once we make sure deposit is not observed we can issue a recover request 
     
     - How to make sure user keeps the committed UTxO representation around so that they are able to 
     issue the api request with the same json? 
     
     - They also need to provide the txin which complicates things? How could we make it more simple, could they query the hydra-node 
     to get this information?

Plan right now is to get the PR into shape as much as possible and outline in the PR what is outstanding which will need to be implemented as series of next PRs.
I have a good understanding where the main problems are where are we at currently so this should not be an issue at all.
As a concrete next step I'll try to make the recover e2e test pass by shutting down one of the nodes to that the increment is not posted so that I can exercise the recover workflow nicely (right now the recover tx get's rejected by the cardano-node because of the lower validity bound).
If I don't get it to work in a reasonable time I will leave the complete recover feature to be something we think about and implement as one of the next steps since in the end we do need to think more about the recover and the optimal way of doing it.
Even before any of this I should not expect the users to provide the deposit script UTxO (actually any UTxO since we should be able to have it around in our chain state) and take in TxId instead of TxIn at the recover endpoint. Also use the path instead of query string (just remembering all of the stuff we talked about this morning).
I removed the undeeded UTxOs from the recover endpoint request and altered the e2e test to spin up two nodes, observe Head opened and commit approved and then shut down one of the nodes in order to be able to do recover instead of observing the increment.
Looking at the recover code - so you need to know the deadline to be able to construct the deposit datum so that one can spend the deposit output but how can you know the deadline if that is something hydra-node decides on?
The e2e recover tests are failing with the validator error related to invalid commit hash so currently debugging the UTxOs to see where is the discrepancy (commit utxo I am sending to hydra-node api and the local utxo we get from observing head transactions)
Realizing I just need to decode the datum of the deposit script output in order to get all parameters I need for the recover.

2024-09-17

SB on incremental commits off-chain

I looked at the increment observation tests and it turns out they are flaky. So not all of them fail but sometimes we experience some weird behavior and I decided to leave this problem for later on since the e2e is green and this should be a problem somewhere in how we wire things together (increment depends on deposit output also)
The most relevant next chunk of work should be connecting the recover endpoint to the handlers.
I slowly progressed one thing following another to make the e2e test that will prove that recover transaction worked.
This took some time - I decided to aslo encode the query string (TxIn) as base 16 string since the hash in the rendered transaction input doesn't play nice with query strings.
Now I finally get the unknown input error on the recover transaction so progress!
After making sure I use the correct UTxO for recover request (utxoFromTx depositTx) I am getting ErrScriptExecutionFailed D04 which is related to deposit deadline. Yeee-haw our validator is kicking in. I'll make sure to pass in the proper deadline. How to make the users life easier in terms of these recover inputs they need to provide?
I commented out the validator check for the dealine to see if the locked commits are matching and I'll worry later on about the deadline. Oh nice...I get the D05 so the locked commits are also not valid. Debug time!
Looking at the issue again I realize users don't need to provide the deadline at all so removing the field from the request type.
If I just remove true from the recover validator I am getting the NoFuelUTxOFound error. I think we modeled the recover as if we will just run it as a standalone cli tool but to have hydra-node wallet balance these transactions we need more. What I think will work is to provide the UTxO which is locked at the deposit script so that we can satisfy the validator. Then we would also need the deposit script UTxO since we need to spend it so it must be given to the hydra-node wallet.
Success. Validator doesn't run currently but at least our internal wallet is happy. Now turning on the validator checks so that I can construct a proper valid request and see the recover working. Feeling pretty close to success.
Ok, adding the appropriate slot lower bound made the deadline check pass. I need to make sure these values make sense.
The check for locked outputs is naturally ok once I've provided the correct UTxO to the recover tx so that's satisfying.
As the final step for this test let's make sure the deposited UTxO is returned back to the user and is not in the Head.
For some reason the test is hanging - can the user actually submit the tx node created?
I added the recover client input since I we need the hydra-node signature on the recover tx so it is easier if the node just does it.
Adding the RecoverApproved message so users can have some sort of insight in what is happening inside of Head.
I feel burned out a bit, there are a LOT of changes needed.

2024-09-17

SN on using etcd for networking

After fixing the Authenticate network module to not drop messages, I ran into an issue where offline mode would not open the head when configured with other --hydra-verification-key parties.
Fixing manually run offline mode nodes to communicate
- Need to simulate other parties' OnCommitTx
- Still seeing MessageDropped?
- Disabled signature verification for now
- Sending the ReqSn now seems not to arrive on the other end. Use distinct keys per node (again).
- Seeing InvalidMultisignature now:
BTW: at first the InvalidMultisignature content was tripping me now because it only contains a single vkeys when one signature fails
- Each node is seeing the other node's snapshot signature in AckSn as invalid. I suspect something's wrong with offline-mode.
- Re-generating keys did not help.
- Found it: the headId in the snapshot is offline-alice and offline-bob so we can't have networked offline heads because of that.
Side note: Is cabal bench hydra-cluster single mode broken?
Lots of our tests depend on PeerConnected messages. How would we deal with "connectivity" information in case of etcd? It's not really that we are connected to node x or y, but rather whether we are connected to the majority or not.
Etcd config initial-cluster requires a consistent list of peers across all instances. Right now I derive it from the given --host, --port and --peer cli options, but this is very fragile.
Problem: submitting messages is not possible while we are not connected to the majority or leader election is ongoing?
After giving etcd a bit time in the beginning the benchmarks work reliably. However, since I switched to watch it got slower? Maybe using the command line parsing is not the fastest way to do this (and it depends on the internal loop of etcdctl watch).

2024-09-16

SB on incremental commits off-chain

Continuing where I left off yesterday - see how to fix the increment tx posting after changes in the deposit observation.
It seems like I need to re-gain some sanity here so let's break down things.
I made changes to the deposit observation so now what we return from the observation is not just the deposit UTxO but also what we locked inside of the deposit script so the code around observing deposits/ posting the increment needed to change to accomodate for these changes.
Now that e2e is green again and I also added the GetUTxO call in the end to assert deposited funds ended up in the head on L2 it is time to consolidate and try to get somebody involved with this work since there are too many things in scope and we want to get this work merged asap.
One area in which I need to grow is to learn to ask for help! Right now I think the fastest thing to do is keep coding since I know I will lose time explaining what needs to be done but at the same time this puts the pressure on me to deliver all in time.
So what is outstanding is to implement recover e2e, implement api endpoint to list pending deposits, remove increment parts (observation mainly) but leave the transaction construction since we want to keep uninplemented validators to return False if we want to merge this thing. Also write a tutorial explaining the steps to deposit/recover.
I've looked at the increment observation tests and it seems like we will need to find a way to alter these tests so that we can return some UTxO that is working nicely with increment (needs to contain both head and deposit scripts). Main thing I see is we can't find the deposit datum in the observed UTxO

2024-09-15

SB on incremental commits off-chain

After changing the deposit observation (reminder: still need to implement increment observation tests ) the e2e test is broken now since what I return from the observation is input UTxO locked in the deposit script not the actual UTxO we saw coming in from L1 so there is some work needed in order to get this right but nothing too hard to do.

2024-09-14

SB on incremental commits off-chain

As I realized before the observations are not done correctly confirmed by SN. They should be pretty similar to the commit ones since they are even using the same type Commit.
I've made sure to implement the deposit observation in the same manner as commit one (since they are pretty much the same) but didn't have the strength to continue further (too tired) so I'll make it work on monday.

2024-09-12

SB on incremental commits off-chain

After adding the increment mutation I want to make sure that the increment tx is valid as a starting point.
Did a small change to provide the correct UTxO as the argument to incrementTx. What I need here is the output from a deposit tx. Right now I get:

         Redeemer report: fromList [(ScriptWitnessIndexTxIn 0,Left (ScriptErrorEvaluationFailed (CekError An error has occurred:
         The machine terminated because of an error, either from a built-in function or from an explicit use of 'error'.) ["PT5"])),(ScriptWitnessIndexTxIn 1,Left (ScriptErrorMissingScript (ScriptWitnessIndexTxIn 1) (ResolvablePointers ShelleyBasedEraConway (fromList [(ConwaySpending (AsIx {unAsIx = 0}),(ConwaySpending (AsItem {unAsItem = TxIn (TxId {unTxId = SafeHash "913afe731ffea84a50b0b

so it seems like I didn't attach correctly a deposit script appropriately (index 1 here) and the head script (index 0 here) failed with PT5 (which is expected since validator returns false).

After returning True from a increment validator I can see that the head script got attached properly while the deposit is still missing (need to look at the deposit datum which is missing)
Found out this was because of two issues:
- I used mkScriptReference and we are not referencing the deposit script so it should be mkScriptWitness
- I was attaching the headScript instead of deposit script derp
I quickly added a test to make sure we don't observe deposits from a different head
Now I am able to connect the deposit snapshot signing and posting of the increment tx as a result of it. Posting from head logic and connecting the handlers to be able to actually post the increment tx.
Wrote some Handler code to be able to post increment on deposit observation. Also added a type for observing the increment. We need this so that we can say the commit is finalized once the increment is observed.
I can try the e2e workflow now to make sure everything works up to observing the increment. It seems that deposit was not observed back after hitting the api so I investigate.
Turns out that after balancing we end up with a deposit that has two inputs and two outputs. One of these belongs to the hydra-node wallet as it's the one responsible to pay the fees to we need it's input and also we end up with the output that holds the change from balancing the deposit tx.
Debugging to see why we don't observe back the deposit. So it turns out we need a utxoFromTx on a deposit tx observation in order to be able to find the correct script output.
What I notice is that we never issue AckSn but that is not so important right now.
Figured it out: I shouldn't include the the deposit tx in localTxs on ReqSn. This got the head stuck by waiting to apply the deposit transaction which is never applicable. (perhaps I am misunderstanding something since my explanation is fragile)
We get to the next error which happens on posting the actual increment:

ConwayUtxowFailure (NotAllowedSupplementalDatums (fromList [SafeHash \"3135e9c31e91873b8b3a7d2922038e934785abd68401589bc67f40591af137c3\"

This happened because the increment tx construction was wrong and I should have used InlineScriptDatum when constructing the deposit witness.
Progressing further since now I get to see the CommitApproved message and all that is left for the e2e to work is the increment observation! Nice, I am happy things are progressing nice and fast although there is really a lot of changes needed.
Adding the increment observation now.
After needed changes I get to observe increment transaction but the committedTxOuts are not matching with the local deposit utxo. This should be easy to fix.
I've decided to return the UTxO directly from the observaton and compare it to the local one.
BAM! e2e is green now!

2024-09-11

SN on Raft consensus as network

Need to change semantics of broadcast as it should only "deliver" once the majority of the network saw a message. Going to give the callback a name too.
While NetworkCallback{deliver} makes things more consistent with literature and easier to read in the components, its also more verbose. Need to ask the team about this trade-off.
When disabling the broadcast shortcut, the NodeSpec tests failed first, but only because it was asserting on additional message being submitted to ourselves. The mocked, recording network is not short-cutting and does not need to, so simply adding some more received messages and asserting less is fine here.
The BehaviorSpec tests were eating the whole memory until I removed the filter for own party in the broadcast implementation (to follow the expectation). Probably some busy loop wait for a message which never came.
After fixing the ModelSpec to also broadcast to self (in createMockNetwork), I wondered.. which test is actually covering that short-cutting (or non-existence of it)?
Turns out only hydra-cluster end-to-end tests are testing this integration. Anyhow, making the lowest level network component (Hydra.Network.Ouroboros) self-deliver after enqueuing a fire-forgt should resolve the previous behavior.
Not seeing the network layer operational right now (hard to tell through only hydra-cluster tests). I suspect it's because the network stack is filtering when seing a message delivered at the lowest layer.

SB on incremental commits off-chain

Ok so after adding a field to the Snapshot that would hold the information on what will be committed to a Head it is time to move further and make the HeadLogic correct upon observing a deposit transaction.
Looking at the issue it seems that deposit observation was a separate task but I did it already so I updated the issue.
Added some code to wait on unresolved decommit before issuing a ReqSn on deposit observation.
Looking into the spec to figure out if there is any validation needed before including a depositted UTxO into a Head. Seems like there is no validation necessary which makes sense. If you managed to lock a utxo with a deposit tranasaction then this UTxO can't be spent twice so it is safe to include it in the Head UTxO.
I added check upon observing ReqSn where I need to make sure there is no other de/commit in flight before including the deposit UTxO into an active UTxO set and asking for a snapshot signature.
Also added the commitUTxO to the local CoordinatedHeadState to keep track of it.
Seems like the time has come to actually construct the increment transaction from the observed deposit (and recover also). Perhaps it is better to make sure all tests are green first and implement recover workflow. The increment transaction is most interesting work but we need to build everything else around it since this would be the last step in this PR.
Made some logging tests green and now looking into failing deposit observations in hydra-chain-observer and hydra-node. Basic observation seems to work but plugging in the deposit generation + observation seems to fail.
Fixing this required me to pass in also the utxoFromTx so that the deposit can be observed since head UTxO doesn't contain the necessary datum as these transactions are not tied to the Head ones.
Making almost all green (a part from one test in HTTPServer) paved the way directly into increment transaction construction and observation (I could do the observation in the separate PR).
Wondering if we merge this (incoplete) branch to master are we harming the regular hydra-node operation in any way? I'd assume not since users should only rely on the released versions anyway.
Detecting a place in code in the HeadLogic where the posting of the increment transaction should be triggered - most probably on snapshot ackgnowledgement.
I was right, we can plug the increment posting easily on AckSn. Let's build the transaction first and then think about the plugNplay :)
Doing this soon I realize we will need to add the Deposit to the script registry. BUT on the other hand I have the deposit UTxO so I can find the deposit input.

2024-09-09

SB on incremental commits off-chain

I tried running the end-to-end test for the incremental commits and get PPViewHashesDoNotMath which is surprising.
Seems like our wallet code is assuming we are always having scripts in the Head transactions but deposit is different. It produces the script output but it doesn't run any scripts.
Tracing some code in the Hydra wallet module to figure out why we can't have a transactions without any scripts.
I did a conditional attaching of script integrity hashes with scriptIntegrityHashTxL and notice we also do recomputeIntegrityHash after a tx construction again.
In case this hash is SNothing I just return already constructed tx body so hopefully if there are no scripts the wallet would still be able to construct the correct transaction.
If I just remove recomputeIntegrityHash I got success on the deposit but the question is if we actually need this function for other transactions. Why did we need this in the first place?
Decommit works just fine without recomputeIntegrityHash let's see about the other transactions but they should also be fine I think.
So it turns out that recomputeIntegrityHash was a leftover that needed to be removed and all of the sudden the deposit tx is valid again. Nice!
Moving on to the observation code now.
Writing simple observation didn't seem hard at all. Let; see if it actually works.
It worked after some tweaks - we look at the input UTxO and don't try to resolve the inputs as we are only interested if we can decode the datum to the correct one. Should we look at the deadline here too?
Moving into the changes in the HeadLogic now. I need to add some code to request a snapshot once a deposit tx is observed.
I stubbed out a function to be called on observed Deposit, added new field to hold commit utxo data in the ReqSn
Added utxoToCommit to the Snapshot and alphaUTxOHash to the Close datum. This required me to build a lot of modules. Tired now, continuing tomorrow.

SB on incremental commits

Moving forward, I got the hydra-tx changes so I want to see the deposit end-to-end workflow working up to the API level.
Connected the hydra-tx/depositTx to our API and now doing the same for recoverTx
Ok, the recover is connected too. I am itching to try out if deposit could be imported into a wallet but I need to work further :)

2024-09-06

SB on missing datum hashes in the hydra txs

Continuing where I left off - making the script output that contains the datum hash so to prove that we either have a problem (most likely) or hydra decommit/fanout work as expected.
Realized that the createOutputAtAddress is just providing a pub key witness (using makeShelleyKeyWitness). Somehow I need to also add the ScriptWitness but not sure exactly how in this setup and why is this needed now? Shouldn't I add the witness when trying to spend? ... Ok yeah. The error is coming from a hydra-node that tries to spend this script output so makes sense.
It is actually funny since I was the one implementing the commit feature. I need to send a transaction that contains the script witness for the script I am trying to spend :)
Giving it a show and the test is green! hmmm.....

SB on implement deposit/recover validators and off-chain code

Failing test related to snapshot signature verification. This is surprising since I didn't touch the related code.
Trying out with removing the validation of some snapshot fields to see which field trips the validation.
It is the utxoToDecommit field that causes this.
I can get the valid signature if I make sure that I also produce hash for mempty in case this field is not present and I don't see that any code changed at all which is surprising.
It has to be something related to hashing of the empty UTxO by using serializeData $ toBuiltinData mempty. Previously we had Maybe UTxO as the argument and now we have Maybe Hash but I am not sure if I should spend time exploring this.
Making the signature accept a Hash instead of Maybe Hash and fixing all the call places. This should introduce changes to our on-chain code mostly while we would still keep Maybe UTxO in the Snapshot.
This yields some changes I would need to think about :

case
 contestRedeemer of 
   Head.ContestCurrent{} -> -- TODO: think of the consequences of this 
    toBuiltin $ hashUTxO @Tx $ fromMaybe mempty utxoToDecommit 
   _ -> toBuiltin $ hashUTxO @Tx $ fromMaybe mempty utxoToDecommit

so the check isNothing deltaUTxOHash should be replaced with the check for empty hash hashTxOuts mempty.
Close mutations are affected by these changes. H30 is happening now in the mutations for CloseUsed
Turns out in the CloseOutdated redeemer we need to make sure that utxoToDecommit is not present. The CloseOutdated means we are closing while the decommit is pending but I am not entirely sure why then utxoToDecommit needs to be Nothing?
This highlights the fact that this knowledge needs to be captured somewhere (it was in my head). Most likely spec has the answer to this - checking.
The spec uses different terminolgy Initial, UnusedDec, UsedDec so UnusedDec reminds me of our CloseOutdated but this is not the case where we expect utxoToDecommit to be bottom! It is actually the filename of the mutation test that points to the correct redeemer on-chain!
We need to make the connection between mutations and the spec/on-chain code obvious to avoid spending much time on this in the future and make it obvious which is which!
Working on incremental commits will hightlight this since we will need to add more close redeemers for the close!

2024-09-05

SB on incremental decommits

I realized I called deposit drafting draftIncrementalCommitTx so renaming that and connecting the chain handle to our API server.
Next step would be to actually construct and serve the deposit transaction but since I still don't have the code to do it I will move to the observation.
I will not be able to complete the OnDepositTx observation off course but I can hook things up.
After adding OnDepositTx I had to add a corresponding type to ChainTransition but should this really be here? Stubbed out the observation code with undefined's and errors here and there
Stubbed out the on-chain increment validator too and now since I got the green light on the PR I'll start using the new code and finally kick off incremental commits for real!

2024-09-04

SB on incremental commits off-chain

Continuing with the off-chain changes for incremental commits
Separating the recover flow to use REST DELETE so POST /commit route will be responsible for deposit flow as well as normal commits
Commit client input now takes a UTxO instead of tx
For the next stage I would need the deposit PR merged since I'd like to create and balance the deposit tx and return it to the user for signing/submitting.
Could quickly specify DELETE /commit route for recovering pending deposit
Question arises: Should the hydra-node pay for the recover tx balancing or we should serve the tx to the user to balance/sign/submit?
It is a bit better UX to make the hydra-node do it since we don't have an easy way (yet) to balance the tx using cardano-cli and we are not sure if importing tx and balancing in the user wallet would even work (eternl can import the tx but not sure of other wallets)
Let's make the node balance the transactions since we mentioned a plan already that we could have a feature to make only the requester's hydra-node pay for the transactions once they are the next leader.
Tackling initial code to handle DELETE /commits/tx-id so that the user is able to also recover the deposit

SB on PR changes in deposit/recover workflow

Moving to this since I got a review and I need this work to continue on incremental commits
Not too bad, I can get this done quickly it seems
I should get the deadline and headid from the decoded datum of a recover UTxO
Recover mutations seem to be missing here also so I'll need to spend some time to add them
One mutation should be one that mutates the deadline of the recover transaction. I expect to see DepositDeadlineNotReached D04.
In order to do that I need to mutate the input datum deadline.
If I just remove the lower bound validity I should get DepositNoLowerBoundDefined D03
It took some sweet time to compile the mutation that changes the input datum deadline and I still don't get the wanted failure. This happens because of slot/time conversion so I am looking into this right now
I noticed I was using the same deadline for deposit and recover so that's one problem. I got the mutation working but broke the recover transaction so that's a bit puzzling. Working with time is always a headache especially in the pure Haskell setting like this.
Debugging the current seed:

input datum deadline is 7364703000 
recover datum deadline is 7364703001
tx lower validity slot is 7364703

and we get D04 which means we tried to recover before deadline has passed or said differently recover lower validity slot is not after the deadline.

Ok, I was re-using the recover slot so found a bug. I really like logging the work since it helps me immensely.

2024-09-03

SB on incremental commits

I could start working on implementing first the offchain parts of incremental commits since deposit/recover PR is now in review. I'll check with Noon where did they left off since they started looking at this initially.
Ended up looking at haddoc builds and our website haskell documentation. Turns out you can't build haddocs if they are in the testlib (at least it seems that way)
Ok, hydra-tx initial work is merged, deposit/recover validators and tx building is PR'd so I'll continue further and see where things are standing on the initial commit branch
First I'll make everything compile again and then focus on the mermaid graphs and make changes which should be well specified already in our Hydra spec
I removed the mermaid diagram in protocol.md which was I suppose added as a possible alternative and kept the one we are currently implementing.
Did a small refactor to resolve a TODO in HTTPServerSpec and now looking into issue itself to make sure I don't miss something important before diving in.
Added the new Chain handle to tackle the new request (could reuse the draftCommitTx too but wanted to be verbose at the beginning at least) and I look around the API server code to determine what changes are needed.
Seems like we need two more types in order to support deposit/recover drafting. Possible code solution:

...
  | IncrementalCommitDepositRequest
      { utxo :: UTxOType tx
      , deadline :: UTCTime
      }
  | IncrementalCommitRecoverRequest
      { utxo :: UTxOType tx
      , deposittedTxIn :: Text -- TODO: This should be a TxIn but in the API server we are not yet using the Tx (Cardano) type.
      , deadline :: UTCTime
      , recoverStart :: Natural
      }

Slightly problematic is that in the API server level we are still abstract over the transaction type so using for example TxIn as the argument to recover should be done differently. We can always just try to parse the request field into the type we expect off course.
I wanted to explore first to get understanding and so that we all agree on the execution plan in order to minimize implementation time.
Will we call hydra-tx directly from the API server? Do we even need new Chain handle?

2024-09-02

SB on wrapping things up in hydra-tx/recover/deposit

I wanted to add number of recover outputs in the recover redeemer
While doing this I realize the recover script witness is not properly attached
To fix the healthcheck for recover tx I need to do exactly what I mentioned couple of days ago - build depositTx and obtain the resulting utxo so I can create arguments to the recover tx from it.
Ok after all this is done my mutations still reveal the issue D05 which means the locked utxo hash is not matching with the outputs.
I realized I actually create a script output from a recover transaction. ouch. What I should do is try to decode the datum off-chain and produce whatever was locked in the datum.
I like current work on this as I feel like I am gaining better understanding but it can also be that I was too distracted when I implemented this originally?
Looking at what we did to unlock the committed outputs - logic should be exactly the same
Ok, so I ended up decoding the datum from the user provided TxOut and had a minor bug with the number of hashed outputs therefore the mutations turned out to be red

August 2024

2024-08-29

SB on Increment transaction and mutation

I have stubbed out the deposit and recover transactions and now it is time to do the same for the increment one.
Working on increment transaction should reveal missing parts in the deposit/recover (if any) and allow us to write mutation tests to test out the validators.
I drafted increment tx for the mutation tests but it is not yet valid:

Redeemer report: fromList [(ScriptWitnessIndexTxIn 0,Left
(ScriptErrorTranslationError (RedeemerPointerPointsToNothing (AlonzoSpending
(AsIx {unAsIx = 1}))))),(ScriptWitnessIndexTxIn 1,Left
(ScriptErrorRedeemerPointsToUnknownScriptHash (ScriptWitnessIndexTxIn 1)))]

It seems I didn't attach the deposit script properly but from what I see all is good
Since the deposit datum needs to match in the increment transaction I could assemble a deposit tx and grab a UTxO from it to pass it to the increment transaction. Maybe this is the cause of increment not being valid?
Ok, this seems to produce nicer errors

Redeemer report: fromList [(ScriptWitnessIndexTxIn 0,Right (ExecutionUnits
{executionSteps = 209565180, executionMemory =
672406})),(ScriptWitnessIndexTxIn 1,Left (ScriptErrorEvaluationFailed (CekError
An error has occurred: The machine terminated because of an error, either from
a built-in function or from an explicit use of 'error'.) ["C02","PT5"]))]

By the way C01 means the PT token is missing and if I disregard this for now I get to see D02 error which means deposit doesn't have upper bound defined. Nice! we are making progress in terms of deposit tx now. Mutation tests like this are really valuable since they allow us to make sure all parts of the assembled transactions are as we expect.
If I want to add upper validity for the deposit transaction I need a slot and currently we accept UTCTime in hydra-tx cmd line arguments.
If I choose to accept slots how hard it would be to build a deposit datum?
I couldn't find appropriate functions for now to go from SlotNo -> POSIXTime since this type is what get's set in the deposit datum.
Turns out I was looking at outdated tx traces 🤦 The spec PR contains the new changes related to these validators.

2024-08-16

SB on implement deposit validators and off-chain code

Starting to work on https://github.com/cardano-scaling/hydra/issues/1574
We want to have a deposit (both $\mu_deposit$ and $\v_deposit$ ) and refund validators in order to be able to commit to a Head or refund to the user in case of unsuccessfull commit
I am looking at the tx trace graphs and mermaid diagrams for these workflows
$\nu_deposit$ is parametarized by the seed (outputs user wants to commit) and the $\v_deposit$ validator hash
It seems like it would be easiest to start with $\v_deposit$ and specify needed mutations first but in order to specify mutations we need off-chain tx construction together with the on-chain code so I'll start with drafting the needed types/functions for $\v_deposit$ instead and then $\mu_deposit$
I added checks for before/after deadline for the Claim/Recover redeemers, check that head id is correct and deposit token is burned (not complete yet)

2024-08-16

SN on magic-nix

Another trial of magic-nix which uses the github internal cache as a binary cache for nix.
Magic nix does not offer out-of-the-box any public cache interaction and we would need to make this work together with cachix. In another exploration, @locallycompact encountered problems by having both enabled: https://github.com/DeterminateSystems/magic-nix-cache/issues/78
However, before tackling this I wondered what the build time improvement would be?
Plan: Getting a first single package built and rebuilt with both magic-nix or cachix.
Magic-nix:
- Using accept-flake-config to seed magic cache from cachix.
- Rerun of hydra-node build with primed cache: https://github.com/cardano-scaling/hydra/actions/runs/10418269282/job/28854308115 -> build step only: 1m04s, overall 2m49s (still uploading)
- Most time is spent on evaluating the derivation (IFD)
- Post action uploads are rate limited - does a typical rebuild result in enough uploads?
- Seems like magix-nix post action is re-uploading similar store paths on each rerun -> overall build time counts: 3m14s Are our builds non-deterministic due to IFD?
- Rebuild hydra-node (changed version number) with magic nix: https://github.com/cardano-scaling/hydra/actions/runs/10418479282 -> 4m45s and no pushed paths pushed?
- Rerun of the same action also resulted in a rebuild (indeed no paths were pushed) -> 4:19s, also no paths pushed again
Cachix:
- Switching over to using cachix-action https://github.com/cardano-scaling/hydra/actions/runs/10419403396 -> rebuild of hydra-node in 4m47s
- Rerun of cachix build https://github.com/cardano-scaling/hydra/actions/runs/10419403396/job/28857732485 -> 1m28s, 1m50s
- Dedicated rebuild of hydra-node with cachix https://github.com/cardano-scaling/hydra/actions/runs/10419717683/job/28858340501 -> 1m25s (accidental cache hit)
- Now with a proper, new version for rebuild with cachix: https://github.com/cardano-scaling/hydra/actions/runs/10420234054/job/28859845924 -> 5m04s
- Rerun after this https://github.com/cardano-scaling/hydra/actions/runs/10420234054/job/28860297642 -> 1m39s
Conclusion:
- Magic nix is slightly faster in fetching (from github cache than download from cachix), but also spends more time in pushing paths after builds
- Also saw some paths pushed too often, or not at all (e.g. the deliberate rebuild of hydra-node was never cached)
- It does not provide public access to the cached derivations
- Not worth it

2024-08-04

SB on prevent decommit with the same version and different utxo to decommit

When looking at the BehaviorSpec tests related to the fix we wanted to do I noticed actor is requesting a decommit with UTxO they don't own.
Another problem ( and this is th original issue we started to look at) is can process transactions while decommit pending Here the problem is that we still have around the decommit transaction which was snapshotted but not observed (so it is still both in the new snapshot and the confirmed one) and we don't have a way of detecting if the next snapshot (L2 tx in this case) should ignore the pending decommit and just sign the snapshot.
I am tempted to add a field to ReqSn to detect if the reqest for snapshot is just L2 tx and in this case not check for decommit tx in the snapshot.
Would this approach somehow influence also requests for snapshots that have some decommit tx? Not sure at this point but I don't think so.
So it is possible to have multiple snapshots with the same decommit tx in case there is a pending decommit but this decommit tx can't be altered until the decommit is settled. Only then you can request a new decommit and the local txs would still continue to work with the pending decommit.

2024-08-01

SN exploring peerbit

Browsed https://github.com/dao-xyz/peerbit/ a bit: A replicating document storage framework that uses libp2p under the hood to discover peers and establish transport.
Peerbit applications seem to be structured around Documents that are replicated across peers. https://github.com/dao-xyz/peerbit/blob/master/packages/programs/data/document/document/src/program.ts#L89
Eeach document has an and index and a SharedLog that seems to concern itself with replication to "replicas".
Modifications on documents are AppendOperations replicated through the shared log https://github.com/dao-xyz/peerbit/blob/master/packages/programs/data/document/document/src/program.ts#L395
More details on the sharding approach can be found here: https://peerbit.org/#/topics/sharding/sharding
The shared log ultimately uses lbp2p pubsub to reach replicas and submit messages https://github.com/dao-xyz/peerbit/blob/5205167d4d1be4176741aa08458a6841f9b32c81/packages/programs/rpc/src/controller.ts#L284
The RemoteBlocks abstraction suggests it does something similar like ouroboros BlockFetch https://github.com/dao-xyz/peerbit/tree/5205167d4d1be4176741aa08458a6841f9b32c81/packages/transport/blocks

SB on new transactions work even if decommit is pending

Fix I did to make this work enables us to not apply decommit transaction twice by looking at the previous confirmed snapshot.
We may want to exclude utxoToDecommit on the next snapshot if this is the case.
I will draw transaction traces on the miro board to see possible scenarios
There is another problem with the pr: by adding the threadDelay to make sure we observe transactions on the next block we broke some logging test.
Perhaps I can insert this thread delay only if the caller says so?
Ended up wrapping the withSimulatedChainAndNetwork so it takes extra parameter that determines if there should be a delay when posting/observing a tx.
Other main issue is that we are not sure what are the consequences of keeping/removing utxoToDecommit in the following snapshot/s.
It seems that we want to make sure that we do not sign snapshot if the version is the same as the confirmed one and the utxoToDecommit is different. In this case we should reject the reqSn request and in case where version and utxoToDecommit are the same we just want to skip applying it.
There are tests already that help us reason about these changes and I'll focus next on trying to come up with a design for both spec and code that we care to maintain. Currently it feels fragile and it seems we can't predict the exact behaviour of decommit feature since we came a long way just to realize this problem right now.
Looking at can only process one decommit at once in the BehaviorSpec

{"by":{"vkey":"ecc1b58727f3f12b3194881a9ecb9de0b28ce7b207230d8e930fe1bce75e256c"},"effect":{"serverOutput":{"decommitTxId":1,"headId":"31323334","tag":"DecommitFinalized"},"tag":"ClientEffect"},"effectId":0,"inputId":13,"tag":"BeginEffect"}
{"by":{"vkey":"ecc1b58727f3f12b3194881a9ecb9de0b28ce7b207230d8e930fe1bce75e256c"},"effectId":0,"inputId":13,"tag":"EndEffect"}
{"by":{"vkey":"ecc1b58727f3f12b3194881a9ecb9de0b28ce7b207230d8e930fe1bce75e256c"},"inputId":13,"tag":"EndInput"}
{"by":{"vkey":"d5bf4a3fcce717b0388bcc2749ebc148ad9969b23f45ee1b605fd58778576ac4"},"effect":{"serverOutput":{"decommitTxId":1,"headId":"31323334","tag":"DecommitFinalized"},"tag":"ClientEffect"},"effectId":0,"inputId":14,"tag":"BeginEffect"}
{"by":{"vkey":"ecc1b58727f3f12b3194881a9ecb9de0b28ce7b207230d8e930fe1bce75e256c"},"input":{"chainEvent":{"newChainState":{"slot":7},"observedTx":{"distributedOutputs":[42],"headId":"31323334","newVersion":0,"tag":"OnDecrementTx"},"tag":"Observation"},"tag":"ChainInput"},"inputId":14,"tag":"BeginInput"}
{"by":{"vkey":"d5bf4a3fcce717b0388bcc2749ebc148ad9969b23f45ee1b605fd58778576ac4"},"effectId":0,"inputId":14,"tag":"EndEffect"}
{"by":{"vkey":"d5bf4a3fcce717b0388bcc2749ebc148ad9969b23f45ee1b605fd58778576ac4"},"inputId":14,"tag":"EndInput"}
{"by":{"vkey":"ecc1b58727f3f12b3194881a9ecb9de0b28ce7b207230d8e930fe1bce75e256c"},"outcome":{"error":{"message":"decrement observed but no decommit pending","tag":"AssertionFailed"},"tag":"Error"},"tag":"LogicOutcome"}

Seems like another issue we have is to somehow loose the local decommit tx before the observation happens?
I see 4 observations of OnDecommitTx so on first occurrence we delete the local decommit tx but afterwards we experience the error?
Then afterwards we see a new DecommitRequested and a ReqSn:

{"by":{"vkey":"d5bf4a3fcce717b0388bcc2749ebc148ad9969b23f45ee1b605fd58778576ac4"},"outcome":{"effects":[{"serverOutput":{"decommitTx":{"id":2,"inputs":[2],"outputs":[22]},"headId":"31323334","tag":"DecommitRequested","utxoToDecommit":[22]},"tag":"ClientEffect"}],"stateChanges":[{"decommitTx":{"id":2,"inputs":[2],"outputs":[22]},"newLocalUTxO":[],"tag":"DecommitRecorded"}],"tag":"Continue"},"tag":"LogicOutcome"}
{"by":{"vkey":"ecc1b58727f3f12b3194881a9ecb9de0b28ce7b207230d8e930fe1bce75e256c"},"effect":{"serverOutput":{"decommitTx":{"id":2,"inputs":[2],"outputs":[22]},"headId":"31323334","tag":"DecommitRequested","utxoToDecommit":[22]},"tag":"ClientEffect"},"effectId":0,"inputId":15,"tag":"BeginEffect"}
{"by":{"vkey":"ecc1b58727f3f12b3194881a9ecb9de0b28ce7b207230d8e930fe1bce75e256c"},"effectId":0,"inputId":15,"tag":"EndEffect"}
{"by":{"vkey":"ecc1b58727f3f12b3194881a9ecb9de0b28ce7b207230d8e930fe1bce75e256c"},"effect":{"message":{"decommitTx":{"id":2,"inputs":[2],"outputs":[22]},"snapshotNumber":2,"snapshotVersion":0,"tag":"ReqSn","transactionIds":[]},"tag":"NetworkEffect"},"effectId":1,"inputId":15,"tag":"BeginEffect"}

And eventual failure because of not bumped snapshot version:

{"by":{"vkey":"d5bf4a3fcce717b0388bcc2749ebc148ad9969b23f45ee1b605fd58778576ac4"},"outcome":{"error":{"requirementFailure":{"lastSeenSv":0,"requestedSv":0,"tag":"ReqSvNumberInvalid"},"tag":"RequireFailed"},"tag":"Error"},"tag":"LogicOutcome"}

July 2024

2024-07-30

SB on No processing of transactions during a decommit

I start with a failing test case already where we try to decommit, wait to see the decommit approved and then do a new tx and expect it to be snapshotted correctly
Snapshot number 1 is confirmed with utxo = 2 and utxoToDecommit = 42
Then we see TxValid/ReqTx by both nodes after ReqSn one node can't apply next snapshot:

{"by":{"vkey":"d5bf4a3fcce717b0388bcc2749ebc148ad9969b23f45ee1b605fd58778576ac4"},"outcome":{"error":{"requirementFailure":{"error":{"reason":"cannot apply transaction"},"requestedSn":2,"tag":"SnapshotDoesNotApply","txid":1},"tag":"RequireFailed"},"tag":"Error"},"tag":"LogicOutcome"}

So we have a utxoToDecommit in the next snapshot and it is not applicable since the utxoToDecommit was already removed from the confirmedUTxO but we also didn't observe the decommit tx.
How would we know when the next snapshot comes if we already applied decommit tx or not?
One idea I had is to check before trying to apply decommit tx and see if in the local confirmed snapshot we already have the same utxo and if that's the case then we would not try to apply the decommit transaction again.
This works but we need to think if this strategy will work in all different scenarios we can have with L2 transactions while decommit tx is still kept locally/not observed.

2024-07-25

SB on prevent empty decommit

Turns out the ledger rules are actually preventing empty inputs/outputs in some transaction. In case you don't provide any inputs you get TxBodyEmptyTxIns since it doesn't make sense to accept a transaction with empty inputs and in case there are no outputs the ValueNotConservedUTxO is thrown.
I'll add one test that will assert this is true for decommit transaction and assert on the error so we can call this non-issue done.

SN on hydra-doom

Exploring hydra-doom watch mode
Start by adding a command line flag to individually enable -send-hydra and -recv-hydra
Looking around for command line arguments and found the -wss and -connect args from doom websockets.. should we integrate as a net_module too?
The doom-workers repo is a nice, small, vanilla JS application to take inspiration from (good shortcuts like display)
When only watching with -hydra-recv I realized that the datum decoding was actually not working.
Disabling the demo makes it nice and idle if ?watch is set
After figuring out encoding and decoding the redeemer, submitting ticcmd_t pieces this was quite easy. Now the question is.. how would we get a synchronized view/start of the game? How is this working in multiplayer?

2024-07-22

SB on Reviewing final PR for incremental decommits

Everything looks good to me. Nice job, I am glad this is finally done.

SB on Add Snapshot type to the spec

I introduced new snapshot type in our transaction traces https://miro.com/app/board/uXjVMA_OUlM=/?moveToWidget=3458764594997541131&cot=14
There can be Normal , Inc or Dec type of snapshots each signaling their purpose. Normal ones are reserved for new L2 transactions. Inc snapshot includes some new UTxO into the Head and Dec takes something back to L1.
For the spec changes I decided on the letter $\kappa$ to represent this new snapshot type for no specific reason $\kapa \cup Normal \cup Inc \cup Dec$
The tricky part is to explain all of the possible decisions we can make in close transaction based on this snapshot type and the type of a close redeemer.
In the end it wasn't that bad to update the close transaction. I just had to specify 4 different close current/outdated with respect to Inc and Dec snapshot types and also update the figure to include the snapshot type $\kappa$ in the close redeemer.

2024-07-19

Draw transaction traces for unordinary scenarios (de/commit)

I updated the tx traces we added yesteday to depict the flow with the increment/decrement transactions
I think we gained some clarity together with Franco
Updated the drawing with the new snapshot type I propose and the close redeemer addition to specify this type
Now close needs to distinguish what to do based on the snapshot type: Normal, Inc or Dec and the redeemer type: CloseCurrent, CloseOutdated, CloseInitial
It is time to update the spec with these proposed changes
Changes in spec mainly need to consist of explaining the new snapshot type where appropriate and use the new type in the increment/decrement transaction.
Validators need to contain check for conditions I explained in the tx traces https://miro.com/app/board/uXjVMA_OUlM=/?moveToWidget=3458764595074776662&cot=14 which take into account the new snapshot type together with close redeemer type
This snapshot type will be propagated through the close redeemer too

Open a Head on preview

Seems hard to find information on when the head was initialized on-chain?
I used the slot/block when I published hydra scripts for that but how would you find it otherwise? Answer: We could search for tx metadata. All hydra init transactions contain HydraV1/InitTx.
Do we need to expand on the documentation for these corner cases in order to provide more information to our users?

2024-07-11

SN on incremental decommit cleanup

After putting the close redeemer casing, I had also aligned the "outdated" case signature verification to use version - 1 -> this made tests fail!?
Using version instead of version - 1 makes mutation tests green. This is wrong?
EndToEnd tests now also fail with "MustNotChangeVersion" -> this is connected!
Fixing the version used on close to the last known open state version (offChainVersion), makes E2E pass
While the "can decommit utxo" was green right away, "can close with pending decommit utxo" was red first. After adding a delay of 2 secs after submitting the decrement it was green -> race condition in hydra-node when trying to close while decrement was not yet observed back. I think this is expected and we need to manage the UX here (e.g. try close again).
In any case, the pending commit E2E test is redundant.. it's not a strictly a pending decommit and the test is the same as the one above
After mutation and e2e tests are fine, model tests still choke on new MustNotChangeVersion
Found an original bug (or current inconsistency?) in HeadLogic: version is incremented whenever the snapshot leader sees the decrement, but not to what the chain observation includes.. could be a race condition (model tests fail)
When trying to tie the knot correctly in onDecrementTx, more and more abstract functions are needed. What is the most common denominator?
Testing correct observation of decrement is not easy as all our tests use "pure transactions" and only e2e tests spot misalignment on "real" transactions (with fees paid, change output etc.)
Removing snapshot number check from decrement: should motivate not having number in open state datum. Running mutation tests, tx trace and model tests after dropping it.
Only UseDifferentSnapshotNumber failed, and fails with a different reason: SignatureVerificationFailed.
This is expected as the snapshot number to verify in decrement txs is still taken from the datum instead of the redeemer. Let's change that now.
Also an interesting TxTrace failure with a single Decrement of snapshot number 0. Was expected to fail and does not fail now (is this a problem?)
Sub-typing all datums and redeemers likely makes sense. At what cost?
When fixing the TxTraces to not expect failure on decrement with snapshot 0, I run into another issue where contestTx is called with version 0 or otherwise too old version, resulting in a "should not happen" case.
I realize there are pros and cons about utxoToDecommit and deltaUTxOHash being a Maybe UTxO and Maybe Hash respectively.

2024-07-05

SB on incremental decommit

Maybe it can be worthwhile exploring the idea of how to trick the Head protocol into decommitting twice.
So we would ask for a decommit after which there would be no further snapshots.
Then we would close but the contestation should try to contest with the snapshot we used for decommit.
Would the fanout re-distribute already decommitted funds again?

June 2024

2024-06-27

SB & FT on failing decommit test

Franco wrote an e2e test that wanted to send a decommit request, wait to see it approved but then close instead.
The test was failing because of SignatureVerificationFailed and it was happening because of the version missmatch.
I fixed this by updating the version check:

-  snapshotVersion = alreadyBumpedVersion - 1
+  snapshotVersion = max (version - 1) 0

This way we don't go below 0 which was causing the snapshot validation to fail.
This worked and got us to another interesting problem. Close transaction was failing with:

InsufficientCollateral (DeltaCoin 0) (Coin 2588312)``

This is interesting since the collateral input had around 21ADA available which should be more than enough.
I was looking at the cardano-node logs and what seems to happen is that the posted decommit tx got into the mempool just before the close one. The close tx was then failing with the error above but this doesn't explain the error I was seeing. What I would have expected is the close tx to fail with BadInputsUTxO.
Our suspicion is that since the hydra-node has never observed the DecrementTx taking place on-chain, our local view of the wallet UTXO is invalid. This is likely causing the collateral error we are experiencing.
We decided to leave exploring this for now since it seemed like a race condition between the decommit and close tx but still I'd like to get to the bottom of it.

June 2024

2024-06-17

SB on decommit demo

I wanted to prepare a small demo for the team and demonstrate that decommits are working as expected.
In the process of decommitting I had small problems submitting a decommit tx since the balancing is on user side so setting the fees is smallish annoyance.
After successfully decommitting (a single utxo of 10ADA) now the head is still open but there are no funds inside.
Posting a close yields:

An error happened while trying to post a transaction on-chain: FailedToPostTx {failureReason = "TxValidationErrorInCardanoMode (ShelleyTxValidationError ShelleyBasedEraBabbage (ApplyTxError (UtxowFailure (AlonzoInBabbageUtxowPredFailure (ExtraRedeemers [AlonzoSpending (AsIx {unAsIx = 0})])) :| [UtxowFailure (AlonzoInBabbageUtxowPredFailure (PPViewHashesDontMatch (SJust (SafeHash \"0e96eb70e55ac2cfa0b9d93aa5ef3fd57dff54de704d72b7d355896581e70a85\")) (SJust (SafeHash \"2862039cf16f77fc21cb3673a4cb6e9185636c9b6a3e519f3ec8416a4af469e5\")))),UtxowFailure (UtxoFailure (AlonzoInBabbageUtxoPredFailure (ValueNotConservedUTxO (MaryValue (Coin 0) (MultiAsset (fromList []))) (MaryValue (Coin 9982912420) (MultiAsset ( fromList [(PolicyID {policyID = ScriptHash \"ff0fc67587997fc3330fb29dff6c388166baaf0d62f934882deee38d\"},fromList [(\"4879647261486561645631\",1),(\"f8a68cd18e59a6ace848155a0e967af64f4d00cf8acee8adc95a6b0d\",1)])])))))),UtxowFailure (UtxoFailure (AlonzoInBabbageUtxoPredFailure (BadInputsUTxO (fromList [TxIn (TxId {unTxId = SafeHash \"11e5ce75e751d901d4f233960ffa6e86f4c9e5c3f9c3a3c276eea5a38da213e2\"}) (TxIx 0),TxIn (TxId {unTxId = SafeHash \"11e5ce75e751d901d4f233960ffa6e86f4c9e5c3f9c3a3c276eea5a38da213e2\"}) (TxIx 2)])))),UtxowFailure (UtxoFailure (AlonzoInBabbageUtxoPredFailure NoCollateralInputs)),UtxowFailure (UtxoFailure (AlonzoInBabbageUtxoPredFailure (InsufficientCollateral (DeltaCoin 0) (Coin 2586728))))])))"}

So we get InsufficientCollateral errors and now I want to see why would hydra-node's internal wallet yield this error.
If there are no funds in the Head this should not be related to what hydra-node internal wallet owns.
Ok, the error above seems to be just the outcome of trying to Close multiple times without waiting for a close tx to be observed BUT fanout yields two UTxO with the same amount!

{
    "11e5ce75e751d901d4f233960ffa6e86f4c9e5c3f9c3a3c276eea5a38da213e2#1": {
        "address": "addr_test1vp5cxztpc6hep9ds7fjgmle3l225tk8ske3rmwr9adu0m6qchmx5z",
        "datum": null,
        "datumhash": null,
        "inlineDatum": null,
        "referenceScript": null,
        "value": {
            "lovelace": 9836259
        }
    },
    "cddf3345a486396a8516418244c54161aab511efb99234580836114fa0f5b6cc#0": {
        "address": "addr_test1vp5cxztpc6hep9ds7fjgmle3l225tk8ske3rmwr9adu0m6qchmx5z",
        "datum": null,
        "datumhash": null,
        "inlineDatum": null,
        "referenceScript": null,
        "value": {
            "lovelace": 9836259
        }
    }
}

which means we decommitted 98362259 lovelace, then closed with the same ConfirmedSnapshot which had the utxoToDecommit field still populated and that ended up creating another output with the same amount! A BUG!

May 2024

2024-05-17

SB on Decommit model spec

We went through the decommit model for testing and determined we can't just randomly produce some utxo to snapshot. We need to maintain the utxo set of the head if we want to be able to decommit something but not alter the head value.
I did a spike yesterday and tried to evolve the head utxo set in the model but decided to drop it since I noticed even before this change there were some flaky tests.
Now I want to make sure everything is green all the time before I try to add decommits again.
I ended up removing the Augmented snapshot number constructor. We didn't account for them in the validFailingAction and it was causing weird behavior in the presence of negative testing. Now all non-decrement actions are green and I need to come up with a way of evolving the head utxo and make sure there is some value to decrement before actually attempting to decrement (TxBodyOutputNegative errors).
I enabled Decrement actions again and made sure that the head utxo value contains the initial utxo value at the start. Then, when decrementing I make sure to pick some utxo from the spendableUTxO and decrement it. This way all is green but we also need to make sure to test actions with various snapshots.
This is not so straight forward to do and I need to think about the easiest way to do this.
Perhaps keep the seen snapshots in the model and occasionally pick one of the old ones?
Are the negative actions we produce enough to say the decrement is working properly?
Seems like everything is green just because we adjustUTxO and not really remove anything from the head utxo.
I want to add a check to make sure the value in the Head is reduced right after a decrement.

2024-05-16

SN on hydra-doom

Lots of pain with emscripten and typescript, but forgot to keep track of log items... anyhow:
Websocket.addEventListeners were leaking heavily, with removeEventListener things seem fine. I wonder though, is there a way to re-use "subscription infra" on the frontend and put events into a queue of some sorts?
When running the scenario for a while, the log of the hydra-node grow heavily (should control log density). So I removed the file from disk (the file descriptor likely was still existing) .. and perceived performance got a bump!?

2024-05-14

SB on Decommit model spec

Last thing I remember is that we experienced CannotFindHeadOutput errors sometimes when running the TxTraceSpec tests. There is some flakyness here but most of the time the tests are green which is not something I would expect. We wanted to see failures in close/fanout related to Uw (U omega) addition/checks.

Sprinkled some tracing in the close/contest/fanout tx creation to see what kind of UTxO is there and detect why Head output is not found.

"Close: 9553a3fd89#0 \8614 100 lovelace\n9553a3fd89#1 \8614 100 lovelace\n9553a3fd89#2 \8614 100 lovelace\n52f744b6ac#71 \8614 1 25e84a2978367f8eaf844ab0ec17b27f7af27b0f4277fbc43717cccd.4879647261486561645631 + 1 25e84a2978367f8eaf844ab0ec17b27f7af27b0f4277fbc43717cccd.6749e4613f026c0e05707edb47e9fcf72a74400b6540a6c65a50dc78 + 1 25e84a2978367f8eaf844ab0ec17b27f7af27b0f4277fbc43717cccd.67ec7ad74d4eab5c5a0fcaf7fc333d3e6e64c2488161ac182ce7fe8f + 1 25e84a2978367f8eaf844ab0ec17b27f7af27b0f4277fbc43717cccd.ae254a0090f3a1995aab7b20d8b16d36f5925998e2d503815f28cb14"

"Contest: 1d3367a1b0#0 \8614 1 25e84a2978367f8eaf844ab0ec17b27f7af27b0f4277fbc43717cccd.4879647261486561645631 + 1 25e84a2978367f8eaf844ab0ec17b27f7af27b0f4277fbc43717cccd.6749e4613f026c0e05707edb47e9fcf72a74400b6540a6c65a50dc78 + 1 25e84a2978367f8eaf844ab0ec17b27f7af27b0f4277fbc43717cccd.67ec7ad74d4eab5c5a0fcaf7fc333d3e6e64c2488161ac182ce7fe8f + 1 25e84a2978367f8eaf844ab0ec17b27f7af27b0f4277fbc43717cccd.ae254a0090f3a1995aab7b20d8b16d36f5925998e2d503815f28cb14\n9553a3fd89#0 \8614 100 lovelace\n9553a3fd89#1 \8614 100 lovelace\n9553a3fd89#2 \8614 100 lovelace"

"Fanout: f5110cd941#0 \8614 1 25e84a2978367f8eaf844ab0ec17b27f7af27b0f4277fbc43717cccd.4879647261486561645631 + 1 25e84a2978367f8eaf844ab0ec17b27f7af27b0f4277fbc43717cccd.6749e4613f026c0e05707edb47e9fcf72a74400b6540a6c65a50dc78 + 1 25e84a2978367f8eaf844ab0ec17b27f7af27b0f4277fbc43717cccd.67ec7ad74d4eab5c5a0fcaf7fc333d3e6e64c2488161ac182ce7fe8f + 1 25e84a2978367f8eaf844ab0ec17b27f7af27b0f4277fbc43717cccd.ae254a0090f3a1995aab7b20d8b16d36f5925998e2d503815f28cb14\n9553a3fd89#0 \8614 100 lovelace\n9553a3fd89#1 \8614 100 lovelace\n9553a3fd89#2 \8614 100 lovelace"

From the traces above I don't see clearly the head output with some PT's
I disabled the Decommit action generation and the same problem still exists
- so this is not tied to the changes I added for decommit action. I'll try to go back few commits to assert tests were green before decommit snapshot addition.
Going back just leads to missing decommit utxo in the snapshot errors.
Tracing in the State module suprisingly reveals the head UTxO is actually there???
Ok, removing shrinking with qc-max-shrinks=0 reveals the problem again.

       Assertion failed (after 4 tests):
         do action $ Close {actor = Alice, snapshot = Augmented 3}
            failingAction $ Fanout {snapshot = Normal 3}
            action $ Fanout {snapshot = Augmented 3}
            pure ()
         Model {headState = Closed, latestSnapshot = Augmented 3, alreadyContested = []}
         Fanout {snapshot = Augmented 3}
         Failed to construct transaction: CannotFindHeadOutputToFanout

Seems like we always seem to produce something to decrement (decommitUTxO in the snapshot can be Just mempty) which doesn't make sense (and there was a comment in the PR related to semantics of Just mempty vs Nothing). Let's fix this first since I think it is tied to missing Head utxo problem above.
After directly producing the snapshot with something to decrement we get the problem of negative output TxBodyOutputNegative which is expected. We just generate snapshots without taking care the actual UTxO is part of the Head UTxO.
I think the next step is to make sure whatever we decrement needs to be part of the Head UTxO.
After this change I see H09 or HeadValueIsNotPreserved error in the decrement tx.
This means that the head input value does not match the head out value since we added decommit outputs.
If I update the UTxO state in the run model with the new UTxO containing the decommit outputs then I get H24 which is FannedOutUtxoHashNotEqualToClosedUtxoHash.
This points to the fact that we don't do a good job of maintaining the UTxO set between close/fanout in our specs. But we wanted to keep it minimal so that was fine so far, maybe it is time to add more UTxO modelling in our specs.

2024-05-13

SB on fixing the tx parsing from text envelope

We seem to think that the cardano-cli is using the TextEnvelope transaction format. At least this is what we expressed in one of our tests (accepts transactions produced via cardano-cli).
This test was generating an arbitrary transaction and then serialise it to the text envelope format and check if we can parse it back.
This test now fails because the tx type we expect is not correct. The transaction type depends on the fact that tx is signed or not ("Un/Witnessed Tx BabbageEra") but the serialiseToTextEnvelope produces just "Tx BabbageEra" which makes our test red.
I tried assembling a tx using cardano-cli and noticed that what we get is the "Ledger CDDL" format instead and proper string for the tx type ("Un/Witnessed Tx BabbageEra").
So it seems there is some discrepancy between what we expect and what we actually get as the cardano-cli tx format.

2024-05-10

SB on fixing Auxiliary data hash in commit tx

Progress. I can see the blockfrost submit error with the instructions provided by MLabs.
Seems like the correct hash for the commit tx is detected (12e5af821e4510d6651e57185b4dae19e8bd72bf90a8c1e6cd053606cbc46514) but for whatever reason it doesn't hash to 61f765d1abd17495366a286ab2da9dbcb95d34729928f0f2117f1e5ad9f9e57b which was expected hash.

SB on Model spec for decommits

Last thing I did in the new model tests for decommits is to add the actual UTxO to decommit to our modelling of the Snapshots.
This led to H35/34 validator errors which means the Snapshot signature is invalid and in turn it means whatever outputs wanted to decommit are not complete in the Snapshot and/or they are not properly signed.
Tracing the produced Snapshot with something to decommit reveals that decommit output is actually present in the decommit transaction. Does this mean the snapshot is not properly signed?
Indeed this is the case. We create a snapshot and the signatures but don't set utxoToDecommit. Afterwards we set the utxoToDecommit before calling decrement function but the signatures are invalid at that point.
After re-signing the proper snapshot with decommit output we go further and the model spec fails with invalid Decrement action which is expected to fail.
Turns out the problem was that I was removing the decommit outputs from the utxo set and this is not what we want to do since the snapshot utxo is just generated and was never actually committed.
Now we come to a problem where head output was not found for contest/fanout txs.

2024-05-09

SB on fixing Auxiliary data hash in commit tx

Our users MLabs are having issues with the commit tx produced by hydra-node. The issue is that the auxiliary data hash is not correct.
Our current tests are all green and we explored the wallet code in order to figure out at which point the auxiliary data hash becomes invalid but with no success.
There are both property tests and the tests that actually complete the whole workflow and return the signed commit tx to the user but nothing reveals the issue.
The last attemp was to recalculate the aux hash in the conversion from/to ledger tx but this turned out to not be a proper fix.
After this I tried to assert the following for different combinations of:
- Aux data and hash is valid when create tx using commitTx function against draftCommitTx_
- Aux data and hash is valid when create tx using draftCommitTx_ function against signed commitTx tx
- Aux data and hash is valid when we encode/decode tx to make sure that the aux data is not lost in the process and all the tests are green ...
Is this somehow blockfrost specific? I know MLabs are using blockfrost to submit this transaction and so maybe this is worth exploring.
They have the workaround by just recalculating the aux hash since they have a way to re-sign the hydra transaction but this is not a good solution.
I think further steps are to try to reproduce the issue together with MLabs and see if we can find the root cause.

February 2024

2024-02-14

SB on Validating transactions in MockChain

When working on reproducing the contest bug we found out that our mock chain layer is not validating transactions properly.
When calling the evaluateTx we get Either EvaluationError EvaluationReport but we never check if EvaluationReport contains some ScriptExecutionError
Another problem we found is that addNewBlockToChain updates the new slot and block but keeps the UTxO same.
We added check when trying to submitTx to see if the tx we are trying to submit is actually valid, but, with the changes above we observe H03 error which is ReimbursedOutputsDontMatch.
We error out in the checkAbort with this error in case the hash of the committed UTxO doesn't match with our hashed outputs in the abort.
So since the error points that there is perhaps something wrong in the way we simulate commits I added check in the performCommit to make sure when we see the Committed output, the UTxO inside matches with what we actually committed. I think we were observing other node's commits and that led to H03 we were seeing.
Now we get a completely different failure (seed is 590033147):

TransactionInvalid (TransactionValidityTranslationError (TranslationLogicMissingInput (TxIn (TxId {unTxId = SafeHash "2c7ce242a2cd37f5d53e0c733985eb1430a2cbb52ade7e2d95225f0be578e64c"}) (TxIx 5))))

The only thing is, when printing a tx I actually see the input 2c7ce242a2cd37f5d53e0c733985eb1430a2cbb52ade7e2d95225f0be578e64c with Ix 5 so WAT is happening.
I see that we need to invoke toApi on our utxo in the evaluateTransactionExecutionUnits and this is just coerce so it shouldn't matter but I added roundtrip tests anyway just to regain some sanity.

February 2024

2024-02-14

SN on reproducing the contest bug

Worked on the model and some back and forth, succeeded to issue close transactions of the initialUTxO available in the mock chain (surprised that it is kept around and made available as futureCommits?)
The model did not fail and wee seemingly can contest and fanout without problems. Let's look back at the logs of the bug issue
State changed events: Both nodes see the HeadClosed at slot 38911656 with deadline 2024-01-18T08:48:47Z
After four blocks with TickObserved, the HeadIsReadyToFanout
Node 1 stores a ChainRolledBack state changed event, seemingly to slot 38911656 (which would be exactly the point where the head closed)
Node 2 has two rollback events, one to slot 38911660 (after close) and then to 38911656 (at close)
In the logs, node 1 observes at 2024-01-18T08:47:36.188861Z in slot 38911656 a OnCloseTx, at 2024-01-18T08:47:40.410448Z in slot 38911660 a OnContestTx with snapshot 1, and at 2024-01-18T09:09:22.762856Z in slot 38911660 another OnContestTx of snapshot 1 (Arnaud mentioned he tried to wipe the state, re-observe and try fanout again)

Node 1 tried to post twice the CloseTx, first on 2024-01-18T08:47:04.257801Z with success producing tx 94a7f0b86ee448dedf3667114f968dd825055bee4c90379a4659d8563b2e7d4a, the second attempt at 2024-01-18T08:47:28.565181Z failed with

TxValidationErrorInMode (ShelleyTxValidationError ShelleyBasedEraBabbage
(ApplyTxError [UtxowFailure (AlonzoInBabbageUtxowPredFailure (ExtraRedeemers
[RdmrPtr Spend 0])),UtxowFailure (AlonzoInBabbageUtxowPredFailure
(PPViewHashesDontMatch (SJust (SafeHash
\"1119bb152dc2c11c54389752f0d6c8398912582a73d2517570e9898c1818bb3e\"))
(SJust (SafeHash
\"7bf7ccdce09da7e95d17ed5faf048b52f3d77a9c96928dd7656ad4e1647059b1\")))),UtxowFailure
(UtxoFailure (AlonzoInBabbageUtxoPredFailure (ValueNotConservedUTxO
(MaryValue 0 (MultiAsset (fromList []))) (MaryValue 89629254 (MultiAsset
(fromList [(PolicyID {policyID = ScriptHash
\"1f8f4076f72a0427ebe14952eb0709c046a49083baef749920cf6bfa\"},fromList
[(\"1ad7cb51c9e2d6250bd80395a5c920f6f628adc4b1bd057a81c9be98\",1),(\"dc9f50cac6e292e1aeaa438c647010f7f6f4fc85010092ff6cb21cf3\",1)]),(PolicyID
{policyID = ScriptHash
\"3ee6157268c7a3b7d21111b6772bc06aed0f425ef64b285223b308fc\"},fromList
[(\"2d2da703f0e1fb682f997f121e3f90084d12c11203dc8c1314da16d3\",1),(\"4879647261486561645631\",1),(\"c312067b108b2ddd1a50ad1e757720570100fad3f14d6f88fd64ed42\",1)])])))))),UtxowFailure
(UtxoFailure (AlonzoInBabbageUtxoPredFailure (BadInputsUTxO (fromList [TxIn
(TxId {unTxId = SafeHash
\"3b6ded925ce15cffc3f47ebee62242e75bb57b63d7dad943397ebe55112c7094\"}) (TxIx
0),TxIn (TxId {unTxId = SafeHash
\"b3704d8d7d7067c4b6910e4dd84b2074ffa4713027c14d81f18928ae3700d40d\"}) (TxIx
1)])))),UtxowFailure (UtxoFailure (AlonzoInBabbageUtxoPredFailure
NoCollateralInputs)),UtxowFailure (UtxoFailure
(AlonzoInBabbageUtxoPredFailure (InsufficientCollateral (Coin 0) (Coin
5293200))))]))

Node 1 did post a ContestTx at 2024-01-18T08:47:36.189167Z with id 495878f0639aa2aed384af4b72bcdea058b339e6fd5fc6f63b5e7238d7297baf
Then node 1 did post a FanoutTx at 2024-01-18T08:49:18.378833Z which failed validation with error code H25
All further attempts to fanout from node 1 failed for the same reason (2024-01-18T08:50:07.49598Z, 2024-01-18T08:58:38.158696Z, 2024-01-18T09:03:42.313769Z, 2024-01-18T09:17:42.869217Z)
Node 2 did close the head with the initial snapshot at 2024-01-18T08:46:49.728882Z with tx id 5a4ed22b4e8137118eff07f26b366e031d880e66ebf11b3fafa130b50fc95441 on slot 38911656
All fanout attempts of node 2 failed with error code H24, on 2024-01-18T08:49:18.323083Z, 2024-01-18T09:04:42.427032Z and 2024-01-18T09:18:44.892685Z
Node 2 observed OnCloseTx at 2024-01-18T08:47:36.18Z on slot 38911656, OnContestTx at 2024-01-18T08:47:40.478408Z on slot 38911660 and the same ContestTx again at 2024-01-18T09:09:26.563872Z

2024-02-06

SB/SN on debugging the stuck head on preview

Unclear what actually caused it
Situation: Tx requestor did not sign snapshot, the others (including snapshot leader) did
Consequence: No re-requesting of snapshot, original requestor can't do anything now?
- Can we test this?
- Could / should simulate this in tests (e.g. by deliberately using wrong hydra keys in BehaviorSpec)
Manipulating state is hard, especially as we also have acks and network-messages which need to be consistent
- Why are we storing non-confirmed data? e.g. things that aggregate into localUTxO and localTxs
Coordinated reset https://github.com/input-output-hk/hydra/issues/1284?
- Not loading in-flight information from persistence would have the same effect
- Gossip of signed snapshots is related to this as well (and would not require persisting network messages?)
- Use the chain to check point a snapshot to coordinatedly (safe) reset point?
Multiple ReqSn for same snapshot (upon new transactions "in the mempool"?)
- Can't recover from non-consensus
Remove localUTxO to at least void "inconsistent" local views and instead do that only upon Snapshot request / validation?
This whole thing feels super ad-hoc
Concrete next step: Sasha pays for closing and initiates a new head

January 2024

2024-01-22

SB on Importing hydra-node transaction to eternl

Tried removing the commit witness in order to test out if the produced cbor is compliant with the HW/CIP-21 but we get ScriptFailedInWallet when trying to cover the fee using internal wallet.
Not sure if I really want to pursue this further but at the same time it is interesting to see the needed changes.
I might just do this in my free time. One option is to make the user pay for the fees instead of hydra-node wallet.

2024-01-15

SB on Importing hydra-node transaction to eternl

Trying to figure out how to produce commit tx from hydra-node that is compatible with HW.
There is no option to import the tx into trezor-suite and simply test that it works.
Pasting here the answer from the Eternl guys just for the reference:

The issue comes down to the body map ordering like I said in the beginning of
this thread. I then said that it was a requirement to have all maps sorted
canonically according to CIP21. This is the issue we face with Eternl as well.

Eternl parses the input cbor, extracts the body cbor raw as is, ie as its
sorted in original cbor. 
This raw body cbor is hashed and signed.
Eternl then needs to re-construct a final tx, for this, CSL is used to
de-serialize the tx, inject the witness Eternl creates and then outputs the new
tx cbor.
But when this is done, the body map is sorted in canonical order and thus the
body doesn't match the original one any more. 

So the witness created by Etenl is correct, but the tx cbor returned contain a
different structure for the body

Possible solutions

- Make sure that tx created follow CIP-21
- After creating hydra tx, pass it through cardano-hw-cli transaction transform
  to reconstruct body to follow CIP-21. Must be done before any witnesses are
  added.
- Change Eternl to put tx back together using the original body cbor and not CSL parsed body.

cardano-hw-cli tool is not really acceptable to us since we should be able to figure out how to produce appropriate tx using cardano-api. Seems like what we produce is just not compatible with CIP-21.

SB on Decrement observation

Continuing work on incremental decommits I wanted to add the off-chain observation code.
Looking at contest tx which should be similar to decrement in a sense that it consumes Head output and produces another one.
I added the observation code without too much problems but need to revisit to make sure I didn't skip some expected checks.
Now our test fails because of NotEnoughFuel which makes sense since we are actually setting fees to zero in mkSimpleTx.
Increasing the amount of seeded lovelace for hydra internal walled fixes this issue and now the only thing that fails is the last check in the test where we want to assert that the UTxO we just created is adopted and can be queried from cardano-node.