Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Package manifest discussion #2

Closed
tcoulter opened this issue Oct 1, 2016 · 10 comments
Closed

Package manifest discussion #2

tcoulter opened this issue Oct 1, 2016 · 10 comments

Comments

@tcoulter
Copy link
Contributor

tcoulter commented Oct 1, 2016

Before deciding on how package management may work, we should first decide on what information should be stored in the package manifest (think, package.json). Afterward, we should decide which format this data should be stored in.

Possible data to be stored within the manifest

  • package name
  • author
  • version (preferably semver)
  • description
  • contract metadata
    • abi
    • unlinked binary
    • deployed address if applicable
    • name/address pairs for linked libraries
    • compiler version for bytecode verification
    • key/value pairs of the sha3 hash of events this contract can emit with the abi definition of that event (i.e., the contract metadata should contain information for all events that can be triggered by this contract, including those triggered by linked libraries)
  • external project URIs (homepage, repository uri, etc.)
  • dependencies (i.e., references to other packages and versions)
  • updated time (likely added by the packager)

Manifest format

As far as package formats, I've made some assumptions:

  1. Package manifests are primarily for non-contract actors and will not be consumed by contracts themselves.
  2. Following from that, we will likely store the manifest on some distributed protocol like IPFS or Swarm (although this is jumping ahead in the discussion).
  3. Since our off chain actors are primarily software, the message format needs to be easily readable by most programming languages.
  4. That said, it's likely human actors will write the manifest, meaning it needs to be in a format humans can easily understand and comprehend.
  5. Contrasting to the above, we need a format that takes up the least amount of space possible so that it costs the least to make available

All that said, my preferred format is JSON. Something like MessagePack could be interesting however; it's less consumable by humans but can easily be converted to and from JSON

@j-h-scheufen
Copy link

I would vote for JSON as the preferred format as well.
We have been maintaining a manifest for Eris smart contract packages (we call them contract bundles) that is a slightly modified version of a NPM package.json. The metadata is used for documentation purposes only at the moment as no dependency manager exists, yet. I'll share some more details soon for discussion.
Some remarks regarding the above:

  • I would like to suggest including another identifier for the group/company as originator of the package, e.g. groupId. The combination of groupId and packageName (e.g. acmecorp.fooService) would be the unique id. This in my opinion would support a growing ecosystem nicely as reputation can be accumulated under a group name.
  • To reduce size of the manifest itself a possible solution would be to assume the existence of a (optional, but standardized interface) management contract that acts as a contract metadata repository. This way only one contract's address needs to be known to look up already deployed libraries, etc. by name and dynamically include the name:addresses when deploying the contracts of the current package. Plus the manifest would not have to be edited . The manifest should ideally not have to be touched to facilitate a particular deployment.

@raineorshine
Copy link

JSON seems like the obvious choice. MessagePack seems more useful for optimizing data sent over the wire at runtime than for manifests.

I like @j-h-scheufen's idea of a groupId (i.e. scope or namespace). We should examine how npm does scoped packages for reference.

A couple other considerations:

  1. The scripts section of package.json has proved enormously useful within the node community, providing minimal build systems, server hooks, and other easy integrations with server/testing environments. Is there an equivalent in the ethereum world? Test accounts? Deployment configurations? Transpilation? Obviously we don't want to shoehorn something completely different into the package file, but there may be an opportunity here. The scripts section in package.json files has had few downsides that I am aware of, other than some controversy around usage of postinstall.
  2. There was some complexity and backtracking in regard to different types of dependencies in package.json: dependencies, devDependencies, peerDependencies (specifically peerDependencies that was deprecated). We should think carefully about the types of dependencies that we may need to avoid the issues npm has had with having to redact peerDependencies.

@tcoulter
Copy link
Contributor Author

tcoulter commented Oct 3, 2016

I see a couple questions brewing as a result of this discussion:

  • What type of package manager are we creating?
  • Which format should the user interact with when creating and consuming packages?
  • And which format should be the serialization format stored on IPFS/Swarm, if any?

Let me rephrase the first question a bit differently: Are we creating a package manager for full projects including frontend and library code, images, assets, etc? Or are we simply creating a package manager for contracts and contract metatdata? For example, should the package management system contain files for all my dapp's contract code along with its associated JS frontend code, while at the same time manage all your dapp's contract code and its associated python library interface?

I'm currently leaning toward us only creating a contract packager, for now, as I'm not necessarily sure our packager should cross language barriers (in trying to please everyone, we'll please no one). Instead, I imagine language or community-specific packagers will be built on top of our contract packaging system which best integrates with those community's customs and paradigms. I see scripts as one of the features of these higher level packaging systems.

Regarding the next two questions, I think we should distinguish between the manifest format the user interacts with and the serialization format stored on IPFS/Swarm. I assume humans won't be interacting with manifest files directly stored on IPFS/Swarm; instead, they'll likely only interact with the file that exists on their filesystem (i.e., we don't interact with contracts via the on-chain binary, but instead via contract abstractions and language constructs). This means that we have the opportunity to choose a usage format that is different from the serialization format, and add potential cost savings to storing that package on the network. For a JS-style packager, JSON should likely be the interaction format; but it could just as easily be YAML for a ruby-style packager. The packager could easily convert between the usage format and the serialization format (aka MessagePack, or similar) when packages are published and consumed.

All this said, I assume the idea of a contract vs. full packager to be pretty controversial. I'm trying to prevent having @pipermerriam fight for what's best for Python while being bombarded by a bunch of JS cheerleaders (which I am one of ;) ). Similarly, I'm not sure we know what's best for every community, and so should leave it up to that community to build packagers on top of a shared base. What were you guys thinking in this case?

@redsquirrel
Copy link

Are we creating a package manager for full projects including frontend and library code, images, assets, etc?

I sure hope not!

Or are we simply creating a package manager for contracts and contract metatdata?

I doubt it will be simple, but that sounds like what we should be shooting for.

I'm curious whether the package manager will be specific to Solidity, or agnostic to the EVM programming language.

@mhhf
Copy link
Collaborator

mhhf commented Oct 4, 2016

I'm currently leaning toward us only creating a contract packager, for now, as I'm not necessarily sure our packager should cross language barriers (in trying to please everyone, we'll please no one). Instead, I imagine language or community-specific packagers will be built on top of our contract packaging system which best integrates with those community's customs and paradigms. I see scripts as one of the features of these higher level packaging systems.

👍 Totally agree here, In my opinion with this we should build only the "contract packager" standard but with extendability in mind.

Which format should the user interact with when creating and consuming packages?

I'd argue that this should be open for the tool/ framework. But I'd also in favor of JSON. In case the standardization of this is in scope of this discussion we should reach out to other teams who are working on different stacks to gather opinions on the proposed file format and data. e.g. Chis from the C++ Team, blockapps (Haskell), ether.camp (Java) and the geth people
Also @chriseth was mentioning that he wants to generate metadata on swarm inside the solidity compiler at devcon. So maybe we should reach out to him as this might overlap with our goal.

And which format should be the serialization format stored on IPFS/Swarm, if any?

In case we settle down with JSON, I'd highly suggest using something like http://json-schema.org/ as specification tool.
Also If I'm remembering it right then Swarm guys mentioned during devcon2 that they will use the same serialization and hashing format as IPFS to make both compatible. Also IPFS stores a ProtoBuf serialization of a JSON. What about settling down with this?

Also here are some random thoughts about this topic:

Layers/ Levels

Another question we should answer is wether we want to have one manifest, or distribute the data into different layers: e.g. having one minimal package header with all basic informations and export the rest in other linked data. Depending on the use case, there are several benefits in splitting e.g. higher metadata, such as name and contract metadata.
If we settle down with Swarm/IPFS, we could use http://ipld.io/ for this.

GroupID/ names

Here we should think carefully wether

  1. we want to allow different packages with the same name and if not, how we will resolve it
  2. wether the name in the manifest should reflect the name on a name registry (e.g. ENS or @pipermerriam's suggestion) and how we are going to verify this.
    however, I think we shouldn't be to restrictive on names and let users chose whatever name they want and solve naming on a different level.

manifest data

As for the manifest data: I'd either argue about not including deployed address in a package at all, or thinking very carefully about how to do this: e.g. what about the address disappears because of suicide? What about this package is deployed to another address and the new address becomes the de-facto standard? What if a package is solely written for morden, or for a private chain?

key/value pairs of the sha3 hash of events this contract can emit with the abi definition of that event (i.e., the contract metadata should contain information for all events that can be triggered by this contract, including those triggered by linked libraries)

This is redundant information as this can be generated out of the abi.

Reconstructable Object

In general, I view a package as a set of data decorated with metadata which contains all necessary information to reproduce a contract on chain, preferably dev tool and compiler agnostic. Therefore we should also think about how we will include things like solidity files.

@tcoulter
Copy link
Contributor Author

tcoulter commented Oct 4, 2016

Also IPFS stores a ProtoBuf serialization of a JSON. What about settling down with this?

Could work. I'm mostly looking for something that has an efficient binary encoding to reduce costs; if ProtoBuf is more efficient than JSON, that would work.

Here we should think carefully whether [...] we want to allow different packages with the same name and if not, how we will resolve it

I generally don't like the idea of a group id - I'm prone to something like npm. Considering that if there were a single package registry this could attract squatters; instead, in a true decentralized fashion, package maintainers could release their own registry in the event of name conflicts, much like Ubuntu's packaging system. Package consumers could then tell their package manager to include that registry.

I'd either argue about not including deployed address in a package at all,

Then you can't have packages that refer to live code. i.e., you can't have the Ethereum equivalent of Stripe having a package that interacts with their service. It also prevents library reuse, and instead encourages the same code to be deployed to multiple locations on the same chain simply because we couldn't find a good way to manage it.

what about the address disappears because of suicide?

This is up to the package maintainer, and it's easily routed around if the contract suicides. For one, the community can choose not to rally around packages that suicide, as that's setting ourselves up for failure. Second, the code exists on chain whether or not the packages suicides, so we can always upload a new version in the case it no longer exists. The data won't necessarily exist, but given we'll eventually have binary verification people should be able to see the code and make decisions about whether or not they want to run that risk.

What about this package is deployed to another address and the new address becomes the de-facto standard?

This is the point of forking. People can fork a package whenever they like, and deploy a new version if they choose. This already happens in the real world, and I can (for instance) copy Dapple's code and deploy a new version on github. But Dapple's users won't suddenly move to my version correct? Same for deployed addresses. If someone deploys a new version, it's just another package; if that package does better than the old one, then it must be for better reasons other than it was simply deployed twice.

As far as another version being deployed by the same maintainer: This is just versioning. If the contract isn't upgradable through some custom mechanism, then the deployed address will be replaced with each version. In which case the previous address will still work until the user upgrades their package version.

What if a package is solely written for morden, or for a private chain?

This shouldn't matter correct? If it's solely written for morden, then package maintainers shouldn't publish that package on the live net.

Therefore we should also think about how we will include things like solidity files.

Ya, I decided not to touch these yet (though they do need to be discussed). Eventually we need to figure out contract verification, which in the end requires uncompiled code.

@chriseth
Copy link

chriseth commented Oct 4, 2016

I'm pretty sure there is not a single package manager "to rule them all". I guess at least two different packages make sense:

  1. information about a library and how to use it (like npm plus address on the blockchain)
  2. information about an application itself including how to interact with contracts

The first does not even necessarily need an address on the blockchain, in the worst case, you can just recompile and redeploy. By the way, the metadata we are working on is described here: https://pad.riseup.net/p/7x3G896a3NLA
but please do not edit that pad, I will try to rewrite the description in the next week and create an actual issue in the solidity repository.

@nmushegian
Copy link

  1. information about a library and how to use it (like npm plus address on the blockchain)
  2. information about an application itself including how to interact with contracts

@chriseth I think this distinction is almost right, but it should be "does this code have an address or is it just code" (whether it is a library or a regular contract with state is NOT the key difference) or another way to phrase this is "are there more than 0 objects (including "libraries") described in this package"

@nmushegian
Copy link

To put it another way, there needs to be library-like "linking" for things that aren't solidity "libraries".

If there isn't, you would expect a lot of "libraries" to ship that are thing wrappers around a single address, and you'd need a new one per chain, or detect your chain in the library. Environments and deployment scripting are how we "link" things now, but a lot of the time the environment constants are only used to instantiate other contracts which then store a fixed reference

@nmushegian
Copy link

see ethereum/solidity#242

@AFDudley AFDudley mentioned this issue Oct 12, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants