Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Negociation bug appearing when pulling in several steps (https://github.com/mirage/irmin/issues/600) #338

Closed
wants to merge 1 commit into from

Conversation

clecat
Copy link
Contributor

@clecat clecat commented Jan 22, 2019

In this issue, of mirage/irmin, @zshipko found out a bug where the negotiation fails.

Origin of the problem:

During the negotiations, when the client sends several 'have' lines followed by a flush line, the server answers with Ack for every hash he posses too.
If the server has no more hash to acknowledge, he sends a Nak to the clent.
But if he acknowledged every hash, he may send a Nak to the client.

And after investigations, it appears that ocaml-git does not consume the Nak if all the hash were acknowledged. Results this situation:

Client sends:

0031have {hash 1}
...
0031have {hash n}
0000

Server sends back

0038ACK {hash 0} common
...
0038ACK {hash n} common
0008NACK

The client consumes all line excepted the NACK, so when the client sends another number of 'have', the NACK is still in the input buffer:

Client sends:

0031have {hash n + 1}
...
0031have {hash n + m}
0000

Server sends back

0038ACK {hash n + 1} common
...
0038ACK {hash n + m} common
0008NACK

But when the client reads the server response, he gets

0008NACK

And consider that all hashes from hash n + 1 to n + m are not available in the server and stops consuming the input, which leaves

0038ACK {hash n + 1} common
...
0038ACK {hash n + m} common
0008NACK

in the buffer.

After that, considering the negociation has ended, the client attemps to read

0038ACK {hash x}

Instead he gets

0038ACK {hash x} common

Which creates the error

Fatal error: exception (Invalid_argument
  "Sync.pull_exn: {`Smart (`No_assert_predicate #predicate)}")

Solution:

In order to solve the problem, we need to understand when a NAK response may be send by the server in order to consume it if necessary.

After reading the code of the git implementation in C, it appears that a NAK response will always be send by the server after a flush line, if we have the capability multi_ack or multi_ack_detailed (which we have in the issue) which corresponds to the lines 1332 to 1338 of 'smart.ml'

      | `Ack (hash, detail) ->
          let hashes = Hash.Set.remove hash hashes in
          let acks = {acks with acks= (hash, detail) :: acks.acks} in
          if Hash.Set.is_empty hashes then
            k {acks with acks= List.rev acks.acks} decoder
          else p_pkt_line (p_negociation_one ~mode (go hashes acks)) decoder

This case is only accesible when we have the capabilities multi_ack or multi_ack_detailed.
The solution would be to remove the if statement and try to read a new line again in all the cases:

  • if we have finished all the ACKs and receive something else than a NAK, we will have an error anyway.
  • if we have a finished all the ACKs and we receive a NAK, we will consume it.
  • if we receive a premature NAK, the behavior is not changed.

This modification should solve the problem without breaking anything, but I'm not certain. This is why...

Further improvement:

We need some code mocking a server in order to do some integration test and ensure that we keep the right behavior after doing some modifications.

I will try to do this as soon as possible, it should allow a lot of safety in further modifications.

@dinosaure dinosaure self-requested a review January 22, 2019 20:33
Copy link
Member

@dinosaure dinosaure left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a NOTE about this PR, it's mostly a great reverse engineering. Git documentation did not have a clear definition of what it should happen. So this is fix #600 but probably hide a bigger problem. When @clecat finsh regression test, I will merge it 👍 ! Great job!

@samoht
Copy link
Member

samoht commented Feb 18, 2019

What's the status of this issue ?

@clecat
Copy link
Contributor Author

clecat commented Feb 19, 2019

Actually, it appeared to me (too late sadly) that this bug was not that easy to reproduce: Indeed, after attempting to mock the problem with a fake server, I was unable to get the problem again.
The way I proceeded I most likely the problem, as it should not be impossible to reproduce, but with the actual tools we have, it may need a lot of efforts for little results.
I think we need a server implementation before going back to this kind of testing as it would simplify everything.
The temporary solution should be to add a test marked as "slow" which actually do the same thing that triggered the problem: cloning directly the repository of irmin

@samoht samoht mentioned this pull request Mar 27, 2019
20 tasks
@hannesm
Copy link
Member

hannesm commented Apr 6, 2019

I observed the following behaviour:

  • an initial clone was always successful
  • subsequent pull were only successful if the remote didn't have any updates
  • if there were new commits, I saw an error
  • once I used this PR (thanks), the problem disappeared and now I can push and pull and pull and pull and push from elsewhere and pull and pull and pull and pull again :D \o/

since this is rather static, a test case could be to dump the (binary tcp stream of the) server, and replay the replies when the client requests them... instead of trying with a real server or implementing a full server.

in any case, thanks for this PR, the lengthy description, and I hope it to be released very soon now :D 🐫 🌵

@dinosaure
Copy link
Member

Closed by #351

@dinosaure dinosaure closed this Apr 10, 2019
dinosaure added a commit to dinosaure/opam-repository that referenced this pull request Jul 12, 2019
CHANGES:

- Move to the last version of `decompress` (@dinosaure, mirage/ocaml-git#366)
- Check order of entries in a tree object (bug found by @samoht, fixed by @dinosaure, mirage/ocaml-git#365)
- Use `mmap` package (@dinosaure, mirage/ocaml-git#347, mirage/ocaml-git#360)
- Update README.md (@tcoopman, @dinosaure, mirage/ocaml-git#337, mirage/ocaml-git#359)
- `trim` the window used to pack (@pqwy, @dinosaure, mirage/ocaml-git#357, mirage/ocaml-git#358)
- Use lastest version of `lru.0.3.0` (@pqwy, @dinosaure, mirage/ocaml-git#352, mirage/ocaml-git#356)
- Fix smart protocol (fixed by @clecat and @dinosaure, feedbacks from @hannesm)
 * Pull-request mirage/ocaml-git#351, mirage/ocaml-git#350, mirage/ocaml-git#338
 * Issues mirage/ocaml-git#335, mirage/ocaml-git#342, mirage/ocaml-git#346

   + regression tests was added (@dinosaure)
   + semantics about negociation was explained (@clecat)
   + end-to-end tests partially done (@hannesm)

- Remove `sexplib` dependency (@samoht, mirage/ocaml-git#349)
- Fix smart protocol to accept empty response from `ls-remote` (bug found by @hannesm, fixed by @dinosaure, mirage/ocaml-git#348)
- Add `io-page-unix` as dependency to tests `git-mirage` (@dinosaure, mirage/ocaml-git#345)
- Remove deprecated `Cstruct.add_len` (replaced by `ke`) (@dinosaure, mirage/ocaml-git#345)
- Use `Uri.user_info` to be able to be authentified by a service like GitHub (@linse, review by @dinosaure, mirage/ocaml-git#341, mirage/ocaml-git#343)
- avoid clash between `digestif.c` and `digestif.ocaml` implementation (same for `checkseum`)
 * remove implementation dependencies on `git-unix` and `git-mirage` (bug found by @hannesm and @linse, fixed by @dinosaure, mirage/ocaml-git#339)

   This update should be fixed by `dune`'s variants and `>= digestif.0.7.2` and `>= checkseum.0.1.0`

- **breaking-change** add `etmp` as already-allocated buffer to encode Git object (@dinosaure, mirage/ocaml-git#336)
 * add `ke.0.3` as new dependency
- consumed inputs for every entries in a tree (bug found by @zspicko, fixed by @dinosaure, mirage/ocaml-git#334)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants