Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stability of metadata.scale format #591

Closed
alistair-singh opened this issue Jun 29, 2022 · 7 comments
Closed

Stability of metadata.scale format #591

alistair-singh opened this issue Jun 29, 2022 · 7 comments
Assignees

Comments

@alistair-singh
Copy link

We have an issue where by the same build of our parachain on two different machines produced metadata.scale files with different hashes. This has caused our application built with subxt-codegen to fail due to Error: Metadata(IncompatibleMetadata). I understand this error is produced when the hash of the metadata that was used to generate the client is not the same as the metadata on the parachain node. Getting the latest metadata from the parachain indeed shows a difference between the hashes.

Originally we thought that the metadata was changed so we updated the metadata.scale file. This fixed the issue on our side, but on some machines the metadata was still out of date.

I have attached a diff of the metadata files and the generated rust sources.
https://drive.google.com/file/d/1-TtoQkEZCTCoNWqtz1rkVv3K3GEZd7K3/view?usp=sharing

The parachain code was identical in both builds.

  • ours.metadata.scale - built on Ubuntu 20.04 AWS instance
  • theirs.metadata.scale - built on Ubuntu 22.04 AWS instance

Code used:
Snowbridge -> a65d9118dd4b2277eb7a8513c6f9d7273f277fc2
Polkadot -> a7e188cd9665c735f4b9d5a58cdbc4dd1850eae0 - release-v0.9.23

Looking at the diff file diff.xxd.1col.patch it does not look like new metadata was added. It looks like it was scale encoded differently.

Taking a look at the first difference:

--- ours.1col.xxd       2022-06-23 13:38:08.405761839 +0000
+++ theirs.1col.xxd     2022-06-23 13:38:21.021826697 +0000
@@ -500,28 +500,14 @@
 00
 04
 01
-58
+20
 5b
 75
 38
 3b
 20
-2f
-2a
-c2
-ab
-2a
-2f
-20
 33
 32
-20
-2f
-2a
-c2
-bb
-2a
-2f
 5d
 00
 00

ours.xxd:

000001c0: 5400 0124 6d61 6e64 6174 6f72 7920 0104  T..$mandatory ..
000001d0: 5400 0020 0000 0506 0024 083c 7072 696d  T.. .....$.<prim
000001e0: 6974 6976 655f 7479 7065 7310 4832 3536  itive_types.H256
000001f0: 0000 0400 0401 585b 7538 3b20 2f2a c2ab  ......X[u8; /*..
00000200: 2a2f 2033 3220 2f2a c2bb 2a2f 5d00 0028  */ 32 /*..*/]..(
00000210: 0000 0208 002c 1028 7370 5f72 756e 7469  .....,.(sp_runti
00000220: 6d65 1c67 656e 6572 6963 1864 6967 6573  me.generic.diges
00000230: 7418 4469 6765 7374 0000 0401 106c 6f67  t.Digest.....log

theirs.xdd:

000001c0: 5400 0124 6d61 6e64 6174 6f72 7920 0104  T..$mandatory ..
000001d0: 5400 0020 0000 0506 0024 083c 7072 696d  T.. .....$.<prim
000001e0: 6974 6976 655f 7479 7065 7310 4832 3536  itive_types.H256
000001f0: 0000 0400 0401 205b 7538 3b20 3332 5d00  ...... [u8; 32].
00000200: 0028 0000 0208 002c 1028 7370 5f72 756e  .(.....,.(sp_run
00000210: 7469 6d65 1c67 656e 6572 6963 1864 6967  time.generic.dig
00000220: 6573 7418 4469 6765 7374 0000 0401 106c  est.Digest.....l
00000230: 6f67 7330 013c 5665 633c 4469 6765 7374  ogs0.<Vec<Digest

In ascii: ours X[u8; /*..*/ 32 /*..*/] vs theirs [u8; 32].

Both these files produce the same rust code during code-gen but with different hashes which match their respective files. The fact that the code is the same apart from hashes shows the metadata has not changed in a meaningful way.

subxt codegen -f scale-issue/ours.metadata.scale | rustfmt --edition=2018 --emit=stdout > ours.rs
subxt codegen -f scale-issue/theirs.metadata.scale | rustfmt --edition=2018 --emit=stdout > theirs.rs

We need assistance with the following:

  1. What would make this metadata.scale format unstable between builds? Are there other things that can change the metadata without being related to metadata? e.g. Genesis hash, code hash?
  2. Is there a way to disable the hash check? In a scenario where we upgrade our parachain by doing a forkless upgrade with new metadata, services built with the old metadata would fail on start up. This is not ideal as it would require coordination of deployments, potentially across products which use our parachain. For some services the new metadata might not even be relevant to its function .i.e. a new pallet was added that the service does not use.
@niklasad1
Copy link
Member

It could also be the constants hash check breaks things for you which was fixed quite recently #587.

I ran into it myself, worth a shot to test.

@alistair-singh
Copy link
Author

alistair-singh commented Jun 29, 2022

It could also be the constants hash check breaks things for you which was fixed quite recently #587.

I ran into it myself, worth a shot to test.

Thanks for the information. I will update our repo to use the latest subxt as this is definitely a welcome addition that makes the metadata more stable. In our case the constants were not changed. So I am not sure if that is the issue.

@lexnv
Copy link
Collaborator

lexnv commented Jun 29, 2022

I suspect the hash change is due to an attribute of a type from the metadata, which has changed without implying a change in functionality. I remember it was part of the design discussions, mostly if we want to add the type' path: moving a type alias (ie AccountID) to a different crate would change the hashing without its functionality.

I'll have a look after the Polkadot Decoded event and compare the metadata,

Thanks for reporting this!

@lexnv lexnv self-assigned this Jun 29, 2022
@lexnv
Copy link
Collaborator

lexnv commented Jun 29, 2022

  • inspecting the .json format of the provided ours/theirs metadata
  "id": 120,
  "type": {
    "path": [
      "sp_arithmetic",
      "per_things",
      "Perbill"
    ],
    "def": {
      "composite": {
        "fields": [
          {
            "type": 4,
            "typeName": "/*«*/ u32 /*»*/" <-> "typeName": "u32"
          }
        ]
      }
    }
  }

The ours-metadata blob contains unsupported bytes "/*«*/ u32 /*»*/", presumably coming from sp_arithmetic::per_things::Perbill of the system pallet fill_block call.

  • As this might be related to the scale-info crate, have tried to reproduce it with manually constructed Metadata. Unfortunately, I could not reproduce this behavior.

  • Looking at the Snowbridge repository, two interesting things are discovered:

    • There is a go project which fetches the metadata: ./relayer/chain/parachain/connection.go:54: meta, err := api.RPC.State.GetMetadataLatest()
    • There is a hardcoded metadata at snowbridge/parachain/tools/query-events/metadata.scale which contains the same "/*«*/ u32 /*»*/".

Are you parsing by any means the metadata and feeding back the parsed metadata from your nodes?

@alistair-singh
Copy link
Author

  • Looking at the Snowbridge repository, two interesting things are discovered:

    • There is a go project which fetches the metadata: ./relayer/chain/parachain/connection.go:54: meta, err := api.RPC.State.GetMetadataLatest()

This is unrelated to the issue we are having. In this service we have types which are manually created and kept updated.

  • There is a hardcoded metadata at snowbridge/parachain/tools/query-events/metadata.scale which contains the same "/*«*/ u32 /*»*/".

We use this hardcoded metadata to build a tool which queries for specific events on our parachain. This file is the ours.metadata.scale file from the archive I shared. This is the tool in question that is failing. Is committing (hardcoding) this file not recommended practice?

I will be updating our subxt version we are using in the repo, but as this is related to types with /*«*/ u32 /*»*/ as opposed to u32 this failure would still happen as subxt will hash type information.

Based on your findings this seems like a metadata or scale encoding bug as opposed to a subxt bug. Why does the metadata have these extra bytes in the type information sometimes?

Are you parsing by any means the metadata and feeding back the parsed metadata from your nodes?

I do not think so. If this was the case the diffs would show more changes and the hardcoded metadata file in the repo would never be stable. The hardcoded file currently is stable on most machines. Even though the query-events tool exists in the same set of projects as our parachain, it is a stand alone console app. And there are no dependencies from the parachain to the query-events tool or vice versa.

Suggestion:
I think subxt code-gen should have a feature which disables this hash check. Another potential check would be to use the spec_version of the runtime being greater than some version. A parachain maintainer would take care not to break backwards compatibility between versions. You would also not want to break downstream dependencies based on an upgrade which might not even be relevant to the downstream service.

@jsdw
Copy link
Collaborator

jsdw commented Jul 4, 2022

I'm in favour of being able to make an unchecked request in the case that subxt detects incompatibilities, but you are happy ignoring those and submitting the thing anyway. Offhand I've no idea where the /***/ bits are coming from in the metadata; it's not something I've come across offhand.

In get_field_hash we do hash the fields typeName as well as the field name and type. It makes sense to hash the field name (if the name changes, it could indicate a semantic difference in the types), but hashing the type name may be redundant since we hash the type information anyway (and if the type is a variant or composite or whatever, we'll be hashing the actual names of the structs and variants via that anyway). So I think we can probably just remove that hash and maybe it'll fix this.

@alistair-singh if you'd like to test this fix locally, just remove lines 75-77 from here:

if let Some(name) = field.type_name() {

I'd love to know whether that resuled in things working for you or not!

@alistair-singh
Copy link
Author

I'm in favour of being able to make an unchecked request in the case that subxt detects incompatibilities, but you are happy ignoring those and submitting the thing anyway.

In our case the events we are looking for have been fairly stable and are modified with extreme care so we would be happy to ignore mismatches. The tool is read only and covered by a good test suite so it lessens the need for strong checks such as the hash checks.

That being said this issue only happens on some machines and If I see it again I will try the modification you suggested above. The root cause seems to be with the metadata as opposed to subxt so I am happy to close this issue as this is not blocking us at the moment. Provided that it will be addressed in #595.

If I run into another case then I will re-open this issue and post the findings.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants