Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor Public Inputs and add them to the transcript #134

Merged
merged 22 commits into from
May 13, 2022

Conversation

davidnevadoc
Copy link
Collaborator

In this PR we have added the Public Inputs to the transcript as the first step of the prove and verify algorithms. They have been added as the evaluations of the PI polynomial over the used domain (n field elements).

Close #133

davidnevadoc and others added 2 commits April 14, 2022 12:57
Update `construct_dense_pi_vec` so it returns the evaluation
representation of the PI polynomial over the used domain. This means
that the result vector will always have a power of two length.
Co-authored-by: Joshua Fitzgerald <joshuabfitzgerald@gmail.com>
@davidnevadoc
Copy link
Collaborator Author

This method may not be best way of solving the issue so discussions on alternative solutions are welcome!

plonk-core/src/proof_system/proof.rs Outdated Show resolved Hide resolved
plonk-core/src/proof_system/prover.rs Outdated Show resolved Hide resolved
@ghost
Copy link

ghost commented Apr 14, 2022

Halo 2 also includes the circuit description in the transcript, which I think is another good level of defense in depth. In general, my understanding of the Trail of Bits recommendation is that pretty much everything the verifier has access to should go in the transcript.

@ghost
Copy link

ghost commented Apr 15, 2022

Per Alistair's suggestion on ZK study club telegram, it's not a bad idea to add the first element of the KZG SRS to the transcript as well. This is a bit tricky as it's not generic over the poly commit scheme

davidnevadoc and others added 9 commits April 25, 2022 20:58
Update `construct_dense_pi_vec` so it returns the evaluation
representation of the PI polynomial over the used domain. This means
that the result vector will always have a power of two length.
Co-authored-by: Joshua Fitzgerald <joshuabfitzgerald@gmail.com>
The new implmenentation uses BTreeMap to ensure no duplicates are
introduced and to easily get an ordered representation. This is
particualrly helpuful to get a unique representation to add to the
transcript.
@davidnevadoc davidnevadoc changed the base branch from master to MockProver April 26, 2022 21:16
@davidnevadoc davidnevadoc changed the base branch from MockProver to master April 26, 2022 21:16
@davidnevadoc davidnevadoc changed the title Add Public Inputs to the Transcript Refactor Public Inputs and add them to the transcript Apr 26, 2022
@davidnevadoc davidnevadoc marked this pull request as ready for review April 26, 2022 21:21
@davidnevadoc davidnevadoc force-pushed the add-pi-transcript branch 2 times, most recently from cbc7e97 to b7bc1a4 Compare April 26, 2022 21:59
@CPerezz
Copy link
Contributor

CPerezz commented Apr 28, 2022

@joebebel makes sense.

Note that for the ordering, the hash will take care too. As we store the PI in a BTreeSet so therefore the order is preserved and therefore, ir prevents any order maleability.

@lopeetall
Copy link
Collaborator

lopeetall commented Apr 28, 2022

The only issue I see with serializing the sparse form of the PI is that we would need to provide a spec for the serialization so that other implementations of the Verifier serialize PI in the same way.

Instead, why don't we just use the SRS to get a binding commitment the PI polynomial and add it to the transcript using the existing Merlin API wrapper? It may not be the most efficient solution as it requires doing another MSM, but we can still take advantage of the sparseness of the PI by skipping all the zeros. And since all we need is a binding commitment we won't need to use an FFT before MSM.

This method would use our existing tools without modification and is consistent with the rest of the protocol.

@bhgomes
Copy link
Collaborator

bhgomes commented Apr 28, 2022

For example, there is another problem with the code (besides the Box::leak distraction) - it violates the property that no label is a prefix of another label.

Why is prefixing an issue? The documentation on merlin says that it's a requirement of the transcript being "parseable". What does this mean? We aren't parsing anything here.

@CPerezz
Copy link
Contributor

CPerezz commented Apr 28, 2022

The only issue I see with serializing the sparse form of the PI is that we would need to provide a spec for the serialization so that other implementations of the Verifier serialize PI in the same way.

In the verifier (Usually a Smart Contract) will probably be much more expensive to do the MSM than the hash. That's why I suggested that @lopeetall.

For now, we could start with something easy that solves the bug. And then move on into a better design.

Something like this as purposed by @davidnevadoc in private.

// For each PI != 0
transcript.append(b"pi_pos", &F::from(*pos));
transcript.append(b"pi_val", &F::from(val));

This is not the most optimal. But works for now. And we can discuss the approach we want to go for in a follow-up issue.

@lopeetall
Copy link
Collaborator

Can we reuse those labels pi_pos and pi_val though?

From the Merlin website:

The labels should be fixed, not defined at runtime, as runtime data should be in the message body, while the labels are part of the protocol definition. A sufficient condition for the transcript to be parseable is that the labels should be distinct and none should be a prefix of any other.

@ghost
Copy link

ghost commented Apr 28, 2022

Why is prefixing an issue? The documentation on merlin says that it's a requirement of the transcript being "parseable". What does this mean? We aren't parsing anything here.

The absolute requirement is that the transcript is binding to the messages in it, that is, it is not feasible to find a different transcript with different labels/messages that gives the same challenges. The property that merlin promises is that if labels are prefix free, then the transcript is unambiguously parseable (a mathematical definition - there is no computation to parse the transcript), and if the transcript is parseable then it is message binding.

It could be that the transcript is message binding even if the labels are not prefix-free, but this requires an additional argument and can't be assumed from the guarantees that merlin provides. It's just much simpler to defer to merlin's requirements.

Instead, why don't we just use the SRS to get a binding commitment the PI polynomial and add it to the transcript using the existing Merlin API wrapper? It may not be the most efficient solution as it requires doing another MSM, but we can still take advantage of the sparseness of the PI by skipping all the zeros. And since all we need is a binding commitment we won't need to use an FFT before MSM.

Or, just serialize the PI polynomial? Though I think your original suggestion of serializing the PI directly is probably fine as well. I think the MSM is expensive.

@bhgomes
Copy link
Collaborator

bhgomes commented Apr 28, 2022

Why is prefixing an issue? The documentation on merlin says that it's a requirement of the transcript being "parseable". What does this mean? We aren't parsing anything here.

The absolute requirement is that the transcript is binding to the messages in it, that is, it is not feasible to find a different transcript with different labels/messages that gives the same challenges. The property that merlin promises is that if labels are prefix free, then the transcript is unambiguously parseable (a mathematical definition - there is no computation to parse the transcript), and if the transcript is parseable then it is message binding.

It could be that the transcript is message binding even if the labels are not prefix-free, but this requires an additional argument and can't be assumed from the guarantees that merlin provides. It's just much simpler to defer to merlin's requirements.

If this is the case then why are we letting the user specify any labeling scheme at all, just choose a prefix-free sequence of labels and use each one in turn. From an engineering perspective, I don't want to have to think about these safety requirements if there exists a protocol that takes care of it for me, especially when these labels don't actually help me accomplish anything inside of my application (in this case PLONK). This is the API I actually care about:

trait Transcript {
    fn append<M>(&mut self, message: M);
    fn challenge<C>(&mut self) -> C;
}

In which scenarios is knowing or specifying the labels manually an important feature to have?

@ghost
Copy link

ghost commented Apr 29, 2022

In which scenarios is knowing or specifying the labels manually an important feature to have?

Yeah, you are absolutely correct that from an implementation perspective, it is entirely possible to do without labels, or without 'static lifetimes on the labels, etc etc. It's probably entirely possible to use the exact "pi{}" format labels too.

The reason it's important, though, is because "Frozen Heart" is just the latest in a long history of Fiat-Shamir implementation mistakes (mostly of the "omit data" kind). The idea is that we should write down - very explicitly - the exact Plonk interactive protocol being used, and then duplicate it exactly with merlin, down to the labels and everything. The goal is not only 0 mistakes in converting the interactive proof to non-interactive, but not even the possibility of mistakes.

Public Inputs are serialized and added to the transcript under the label
"pi". The struct PI is responsible of guaranteeing a unique
representation of each possible public input vector.
@davidnevadoc
Copy link
Collaborator Author

davidnevadoc commented Apr 29, 2022

After reading the discussion and learning about the correct usage of labels in merlin I believe that we should open a different PR to tackle this issue, as there are several parts of the code where they are misused. For example we are adding the commitment for the permutation with the label z and also adding the challenge used for lookup table RLC with the label zeta, which breaks the prefix rule. We are also using Box::leak() in the same way I did in the first version of the PR when adding the evaluation of custom gate constraints.

Having the merlin requirements in mind I think the simplest solution is to serialize PI and add it to the transcript with the label "pi". This struct is responsible of holding a one-to-one representation of each possible public input vector. In particular we use BTreeMap to guarantee the inputs are stored in order and we only store non-zero values, as zeros are implicit when no value is provided.

@hdevalence
Copy link

After reading the discussion and learning about the correct usage of labels in merlin I believe that we should open a different PR to tackle this issue, as there are several parts of the code where they are misused.

The docs on Transcript Protocols might be helpful; the recommended way to use the library is for each protocol to define a protocol-specific extension trait that maps the protocol-level transcript actions ("do a domain separation", "append a public key", "append a commitment to whatever", etc) operating on higher-level types into the underlying calls to the byte-oriented Merlin API. So in this case, for instance, it might make sense to have a transcript protocol that handled the serialization and labeling, and presented higher-level methods to the rest of the crate.

@davidnevadoc davidnevadoc requested a review from a user May 6, 2022 07:27
Copy link
Collaborator

@bhgomes bhgomes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few points, and a possible bug, otherwise LGTM.

plonk-core/src/circuit.rs Outdated Show resolved Hide resolved
plonk-core/src/constraint_system/composer.rs Outdated Show resolved Hide resolved
plonk-core/src/constraint_system/helper.rs Outdated Show resolved Hide resolved
plonk-core/src/proof_system/pi.rs Outdated Show resolved Hide resolved
plonk-core/src/proof_system/pi.rs Outdated Show resolved Hide resolved
plonk-core/src/proof_system/pi.rs Show resolved Hide resolved
plonk-core/src/proof_system/pi.rs Show resolved Hide resolved
plonk-core/src/proof_system/pi.rs Outdated Show resolved Hide resolved
plonk-core/src/proof_system/proof.rs Outdated Show resolved Hide resolved
@davidnevadoc davidnevadoc requested a review from lopeetall May 11, 2022 14:09
Copy link
Contributor

@CPerezz CPerezz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

I like that refactor of the Public Inputs TBH. Although I think we can do more in the future with that and improve them much more.

As for the concerns of @bhgomes , they're legit. But at the same time, panicking due to a PI re-assignation is not something crazy. As it means you're not building a correct circuit. Which is not worth then compiling or making proofs for.

Would like the approval of @bhgomes and maybe one more before we merge TBH as this topic lead to a broad discussion and lots of members were involved.

Copy link
Contributor

@LukePearson1 LukePearson1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Feel free to merge

@davidnevadoc davidnevadoc merged commit 5323cd5 into master May 13, 2022
@davidnevadoc davidnevadoc deleted the add-pi-transcript branch May 13, 2022 07:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
T-bug Type: bug T-refactor Type: cleanup/refactor
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add Public Inputs to the transcript.
6 participants