Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CIP-33: Reference scripts #161

Merged
merged 2 commits into from
Jan 25, 2022
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
117 changes: 117 additions & 0 deletions CIP-0033/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,117 @@
---
CIP: 33
Title: Reference scripts
Authors: Michael Peyton Jones <michael.peyton-jones@iohk.io>
Comments-Summary: No comments
Comments-URI:
Status: Draft
Type: Standards Track
Created: 2021-11-29
License: CC-BY-4.0
Requires: CIP-31
---

# Reference scripts

## Abstract

We propose to allow scripts ("reference scripts") to be attached to outputs, and to allow reference scripts to be used to satisfy script requirements during validation, rather than requiring the spending transaction to do so.
This will allow transactions using common scripts to be much smaller.
Copy link
Contributor

@paluh paluh Feb 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for probably stupid question or if I missed this point in the discussion. Do you think that this idea could be extended to something like content addressable imports in plutus core - something similar to dhall imports or unison imports or nix artifact build and caching strategy?
This way we could possibly build an ecosystem of libraries on the chain which would be "cheap" to use.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, that's drastically different and more difficult. This proposal is entirely external to the language: it just treats scripts as black boxes and lets you get those boxes from other places. Something that actually intruded into the language itself would be massively more complicated.


## Motivation

Script sizes pose a significant problem. This manifests itself in two ways:
1. Every time a script is used, the transaction which caused the usage must supply the whole script as part of the transaction. This bloats the chain, and passes on the cost of that bloat to users in the form of transaction size fees.
2. Transaction size limits are problematic for users. Even if individual scripts do not hit the limits, a transaction which uses multiple scripts has a proportionally greater risk of hitting the limit.

We would like to alleviate these problems.

The key idea is to use reference inputs and modified outputs which carry actual scripts ("reference scripts"), and allow such reference scripts to satisfy the script witnessing requirement for a transaction.
This means that the transaction which _uses_ the script will not need to provide it at all, so long as it referenced an output which contained the script.

## Specification

We extend transaction outputs with a new optional field, which contains a script (a "reference script").

The min UTXO value for an output with an additional script field depends on the size of the script, following the `coinsPerUTxOWord` protocol parameter.

When we are validating a transaction and we look for the script corresponding to a script hash, in addition to the scripts provided in the transaction witnesses, we also consider any reference scripts from the outputs referred to by the inputs of the transaction.

### Script context

Scripts are passed information about transactions via the script context.
We propose to augment the script context to include some information about reference scripts.

Changing the script context will require a new Plutus language version in the ledger to support the new interface.
The change is: a new optional field is added to outputs and inputs to represent reference scripts.
Reference scripts are represented by their hash in the script context.

Old versions of the language will retain the old interface.
We do not propose to try and include information about reference scripts in the old interface.

### CDDL

The CDDL for transaction outputs will change as follows to reflect the new field.
```
transaction_output =
[ address
, amount : value
, ? datum : $hash32
, ? ref_script : plutus_script
]
```
TODO: can we use a more generic type that allows _any_ script in a forwards-compatible way?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could do something like this:

any_script = [ 1, timelock_scipt ] / [ 2, plutus_script ] ;    ... / [ n, other_script ]

Our future selves would also be grateful if the tags matched the ones used for the script buckets in either the auxiliary data or the witness sets (which don't match each other 😓)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just allow attaching arbitrary data to the UTXO? This would allow it to be used for other purposes too, besides scripts. It does overlap a bit with having inline datums, but I think it's important to consider that the datum might also contain information necessary only for consuming it, and not relevant for the referencer. By having this extended to arbitrary data, we could choose to ignore the datum (which would contain irrelevant information) and only use the information we are interested in.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@redxaxder I'd be very happy to just use a production for a discriminated union of scripts if we wanted to promote one (of the two that we apparently have...). Let me know which one!

@L-as The "data" here is used for a specific purpose in the ledger. Just having the field contain arbitrary data has the same issue as putting it in the datum: we then need rules for when it "counts" as a reference script for the purposes of the ledger, and when it's "just" uninterpreted data. Having a field specifically for this purpose avoids any confusion.

On a practical note, the issue of functional confusion between referencing and spending is much less problematic for datums. You can just put a pair in the datum and only look at the first component when spending and the second when referencing. A trick like this isn't viable for reference scripts themselves, where we need the serialized data to exactly match the expected hash.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest using the same indices as the auxiliary data, since they're not interleaved with other stuff

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@michaelpj I don't see why you'd need to know when to "count" it as a reference script or uninterpreted data. Were it arbitrary data, to get the data for a hash specified in a transaction, you would in addition to the attached scripts, txInfoData, etc., also check the "reference data" in each input and output. Perhaps there's an issue with this i'm missing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the type of the extra_txout_data field?

Is it Data? Well, scripts are an incompatible format with Data, so we'd have to e.g. parse the Data and look for a nested bytestring or something.

Is it just a raw ByteString? Now we have an inelegant situation where we just try to match it up with various kinds of thing, and use that to decide how to deserialise it. Which is quite tricky and also not how we treat any other part of the transaction.

I think you're suggesting the latter?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A raw bytestring would also be much less useful if the output contains data that isn't supposed to be a hash pre-image for something in the transaction, and just contains extra data you might want to look at. Scripts would then have no easy way of interpreting it (unless we provided parseData :: ByteString -> Data, which I'd rather not).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, I agree.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using elements on a list of flexible size has always created problems down the road for me. You always need to check length size and access context by the index, even if you use a data structure to parse.
I would already consider for the third entry to be either the datumhash or a map where I can name the content, either datumhash, the inline datum, a timelockscript, a plutus script, and whatever else might come in the future. This lets you put any data, and if you need have a clear way to implement a data parser as the key on the map would specify the parser.


## Rationale

The key idea of this proposal is stop sending frequently-used scripts to the chain every time they are used, but rather make them available in a persistent way on-chain.

The implementation approach follows in the wake of CIP-31 (Reference inputs) and CIP-32 (Inline datums).
The former considers how to do data sharing on chain, and concludes that referencing UTXOs is a good solution.
The latter shows how we can safely store substantial data in UTXOs by taking advantage of existing mechanisms for size control.

It is therefore natural to use the same approach for scripts: put them in UTXOs, and reference them using reference inputs.

### Storing scripts in outputs

There are a few possible alternatives for where to store reference scripts in outputs.

#### 1: The address field

In principle, we could add an "inline scripts" extension that allowed scripts themselves to be used in the address field instead of script hashes.
We could then use such scripts as reference scripts.

However, this approach suffers from a major confusion about the functional role of the script.
You would only be able to provide a reference script that _also_ controlled the spending of the output.
This is clearly not what you want: the reference script could be anything, perhaps a script only designed for use in quite specific circumstances; whereas in many cases the user will likely want to retain control over the output with a simple public key.

#### 2: The datum field

With inline datums, we could put reference scripts in the datum field of outputs.

This approach has two problems.
First, there is a representation confusion: we would need some way to know that a particular datum contained a reference script.
We could do this implicitly, but it would be better to have an explicit marker.

Secondly, this prevents having an output which is locked by a script that needs a datum _and_ has a reference script in it.
While this is a more unusual situation, it's not out of the question.
For example, a group of users might want to use a Plutus-based multisig script to control the UTXO with a reference script in it.

#### 3: A new field

A new field is the simplest solution: it avoids these problems because the new field clearly has one specific purpose, and we do not overload the meanings of the other fields.

### UTXO set size

This proposal gives people a clear incentive to put large amounts (i.e. kilobytes) of data in outputs as reference scripts.

This is essentially the same problem which is faced in CIP-32, and we can take the same stance.
We don't want to bloat the UTXO set unnecessarily, but we already have mechanisms for limiting that (in the form of the min UTXO value), and these should work transparently for reference scripts as they will for inline datums.

### Changing the script context

We don't strictly need to change the script context.
We could simply omit any information about reference scripts carried by outputs.
This would mean that we don't need to change the interface.

We don't have obvious use cases for the information about reference scripts, but the community may come up with use cases, and our general policy is to try and include as much information about the transaction as we can, unless there is a good reason not to.