-
Notifications
You must be signed in to change notification settings - Fork 479
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
"Empty" validator generates 3KB serialized contracts. #3702
Comments
Small addition, here's the serialised contract (base16-encoded):
What I find interesting is the repetition of some sequences of bytes like:
etc.. which I really have a hard time correlating with the pretty-printed output. I first thought it was maybe something odd with the DeBruijn indices encoding but even if removed (encoding a program with |
What you've posted is the Plutus IR, which isn't what goes on chain. The untyped Plutus Core won't have all the type stuff, which isn't "dead" in PIR.
What specifically are you referring to? Did you serialize the PIR? |
That said, we probably do retain datatype matchers for datatypes which are used only at the type level. That's a problem, but a slightly unusual one. I made https://jira.iohk.io/browse/SCP-2638. |
I am referring to (the serialization of) this https://github.com/input-output-hk/plutus/blob/fa95d9e85d26341addbb34c62556ea86ed4febae/plutus-ledger-api/src/Plutus/V1/Ledger/Scripts.hs#L88-L90 which is said to be If this isn't what ends up on-chain, then what is |
Okay, just checking! |
I'm also surprised it's that large. We know there's some redundancy somewhere, given that compression still works, but it's not obvious where. |
I know you're probably looking into this (or at least, in this area), so I didn't spend too much time diving into the Flat encoder itself. But from a quick look, it looked okay. Yet, I've got quite accustomed to work with serialized binary data over the past years (although mostly CBOR) and, intuitively the output looks really strange and not something I'd expect from the Flat encoders as they are declared. Hence why I am reporting this. If this isn't something you are currently looking into, I could give in a bit more and maybe provide a better analysis of why is it this big? |
I'm not looking at this in detail right now, so go ahead! The main reason we're using Flat is that it should give us compact encodings of small integers, of which we have a lot. I'd be very interested if you spot any easy means of improving things! |
I'll have a look. When you say small, what's the range you have in mind? Because CBOR is actually pretty good at encoding small numbers 😅 |
0-7 (constructor tags). You can see the results of our previous investigation here: https://github.com/input-output-hk/plutus/blob/master/notes/plutus-core/Serialisation-flat.md |
So, I've been really diving into the Still, it does not explain the large size of that empty program and the many repetitions in the output. The PIR looks quite neat, and now that I have familiarized a bit more with the base internal constructs ( I've heard some people mentioning that writing contracts "by hand" led to much smaller contracts; I believe they meant, not using the compiler but directly going for the |
Well, as I said, we retain all the datatype definitions for the types that get used, even if they're only used at the type level. What does it mean to retain a datatype definition? It means that we get some type abstractions (which get erased to force/delay), and some lambda abstractions for the constructors and the pattern matcher. That would all be useful if you actually did something with your |
Are those information any useful for the execution of the script? I'd imagine that any information about types do indeed disappear after compilation for they're only useful for static analysis. Once done, one can certainly produce an untyped program with no type annotations whatsoever, right? |
A datatype has some type-level stuff, but also some term-level stuff: constructors and a pattern matcher. That's the stuff that's needlessly retained. As I said, it's something we could in principle optimize away. It's "just" a matter of writing a smarter compiler. |
@michaelpj Is there anything we can help with to reduce the output script size? We are always ready to tackle these problems. |
Note that there are lots of issues affecting script size, but a lot of it is pretty difficult to improve on. This particular issue is only about totally trivial scripts that don't even use the transaction information. As such, I'm not sure it's worth that much attention. But if anyone is interested, here's what I wrote on two internal tickets about improving this.
Both of these would require doing some relatively complex work on the dead code analysis, but if someone's really keen we could talk more about it. |
@michaelpj Yes, I meant script outputs in general. I'd love to work on both of those directions. We have a few experienced compiler people in-house too. I think all we need is a short description for each task to start. The above is sufficient for now, but anything more will be appreciated. |
Okay, let me make some public github tickets with a bit more detail. |
@michaelpj Awesome, thank you! |
1. Change dependency analysis to account for the fact that the term-level parts can be removed (see note). 2. Simplify datatype bindings into trivial type bindings if all their term-level parts are dead. Had to do a bit of test rearrangement since a lot of the `plutus-tx-plugin` tests for a type T just used a lambda with an unused argument of type T... which gets simplified with this PR! Fixes #4414. Fixes #3702.
1. Change dependency analysis to account for the fact that the term-level parts can be removed (see note). 2. Simplify datatype bindings into trivial type bindings if all their term-level parts are dead. Had to do a bit of test rearrangement since a lot of the `plutus-tx-plugin` tests for a type T just used a lambda with an unused argument of type T... which gets simplified with this PR! Fixes #4414. Fixes #3702.
1. Change dependency analysis to account for the fact that the term-level parts can be removed (see note). 2. Simplify datatype bindings into trivial type bindings if all their term-level parts are dead. Had to do a bit of test rearrangement since a lot of the `plutus-tx-plugin` tests for a type T just used a lambda with an unused argument of type T... which gets simplified with this PR! Fixes #4147. Fixes #3702.
…4289) * SCP-2638: simplify datatypes which are used only at the type-level 1. Change dependency analysis to account for the fact that the term-level parts can be removed (see note). 2. Simplify datatype bindings into trivial type bindings if all their term-level parts are dead. Had to do a bit of test rearrangement since a lot of the `plutus-tx-plugin` tests for a type T just used a lambda with an unused argument of type T... which gets simplified with this PR! Fixes #4147. Fixes #3702. * Comments
…ntersectMBO#4289) * SCP-2638: simplify datatypes which are used only at the type-level 1. Change dependency analysis to account for the fact that the term-level parts can be removed (see note). 2. Simplify datatype bindings into trivial type bindings if all their term-level parts are dead. Had to do a bit of test rearrangement since a lot of the `plutus-tx-plugin` tests for a type T just used a lambda with an unused argument of type T... which gets simplified with this PR! Fixes IntersectMBO#4147. Fixes IntersectMBO#3702. * Comments
Area
[x] Plutus Foundation Related to the GHC plugin, Haskell-to-Plutus compiler, on-chain code
[] Plutus Application Framework Related to the Plutus application backend (PAB), emulator, Plutus libraries
[] Marlowe Related to Marlowe
[] Other Any other topic (Playgrounds, etc.)
Summary
Empty validators generates serialized scripts of 3KB filled with unused primitives. Perhaps expected at this stage?
Steps to reproduce
Out of curiosity and to better understand where some errors in another contract could com from, I checked the size of an empty validator:
The resulting compiled version is ~3KB and looking at the pretty-printed compiled code (see below), it seems mostly filled with primitives and base constructions for the
ScriptContext
. Even if unused.see compiled code
As a side note, what I find also interesting is that the pretty-printed code is ~5KB, which isn't that much bigger than the serialized code. Intuitively, I'd expect the serialized code to be at least an order of magnitude smaller 🤔. Plus, doing 'dummy' transformations on the pretty-printed code like replacing a few of the keywords by some shorter identifier (e.g. vardecl -> v, datatypebind -> dtb, fun -> f ...) rapidly brings down the size of the pretty-printed code to 3KB and less, which suggests that something is wrong with the serialized code?
Expected behavior
System info (please complete the following information):
Screenshots and attachments
Additional context
Add any other context about the problem here.
The text was updated successfully, but these errors were encountered: