Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CIP0006 Stake pool extended metadata #15

Merged
merged 11 commits into from
Mar 8, 2021

Conversation

papacarp
Copy link
Contributor

Initial proposal for extended metadata that can be a basis for dialog

@cardanians
Copy link

Happy to see our work as a baseline for this task!

Well-processed structure, etc., I would just like to disagree:

  • is really good idea have there email contacts (abuse+support) ? not sure if is it good way (spam, viruses,..); I see more negatives here than real benefits

  • missing "telegram-admin-handle" (array) - on the one hand, it can replace the previous problem + we can connect bots with this handle as verified pool-admin, notifications only for pool-verified-telegram-admins and so.

  • I don't quite understand "pools" and combination with saturated_recommend; i mean [root][pools][...] is great okay, but saturated_recommend array should be in [root][saturated_recommend] (pool-ids of friends/owns pool which are recommended, when main pool is saturated)

Otherwise great, thanks.

CIP6/CIP6.md Outdated Show resolved Hide resolved
@gufmar
Copy link
Contributor

gufmar commented Aug 23, 2020

  • is really good idea have there email contacts (abuse+support)

Abuse contact information is a historically grown form and method for quick urgency communication between decentralized and often before unknown entities. So it sounds also appropriate as an optional but recommended contact information, based on a very common and diffused contact medium: E-mail

Examples
https://www.ripe.net/support/abuse
https://www.apnic.net/manage-ip/using-whois/abuse-and-spamming/
https://support.google.com/domains/answer/6022413?hl=en
https://en.wikipedia.org/wiki/SOA_record#Background

also email abuse reports already have some standardized format
https://en.wikipedia.org/wiki/Abuse_Reporting_Format

@papacarp
Copy link
Contributor Author

  • missing "telegram-admin-handle" (array) - on the one hand, it can replace the previous problem + we can connect bots with this handle as verified pool-admin, notifications only for pool-verified-telegram-admins and so.

Sure, I didn't understand the purpose of that or how it was different from the marketing accounts. makes sense to me to add that back in.

  • I don't quite understand "pools" and combination with saturated_recommend; i mean [root][pools][...] is great okay, but saturated_recommend array should be in [root][saturated_recommend] (pool-ids of friends/owns pool which are recommended, when main pool is saturated)

The "recommend if full" idea is a good concept, but I'm unclear how best to implement this in practice and it brings up some strategic questions.

Is extended metadata meant to be different for each pool registered on chain? or is the concept to have one extended metadata file for each pool you have registered. It seems like your metadata concept was allowing multiple pools to share the same extended metadata so I headed down that path. I think we all have to agree on this fundamental question first before we sort out the rest.

CIP6/CIP6.md Outdated Show resolved Hide resolved
CIP6/CIP6.md Outdated Show resolved Hide resolved
CIP6/CIP6.md Outdated

## Motivation

As the ecosystem around Cardano stake pools proliferate so will the desire to slice, organize and search pool information dynamically. Currently the metadata referenced on chain provides 512 bytes that can be allocated across the four information categories ([delegation-design-specification Section 4.2)](https://hydra.iohk.io/build/790053/download/1/delegation_design_spec.pdf):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that the 512 bytes limitation only exists in softwares currently dealing with metadata (i.e. cardano-wallet and SMASH). Having a max size that is rather constrained prevents some DOS attacks on softwares consuming metadata. Yet, it should be possible to increase that limit to makes it possible to include more information, while still keeping a reasonable size.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that the 512 bytes limitation only exists ... it should be possible to increase that limit to makes it possible to include more information, while still keeping a reasonable size.

it is an interesting option indeed, that should be considered as a possible way to go. In that case it any change requires a re-registration on chain with the new hash. Having it only linked from main to the extended json file makes it more flexible but also a little bit less trusted.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The limit for on chain data is great for those of us parsing, doubling it would be fine but I do think keeping it constrained and allowing the extended metadata to be the place for flexible and fast changes is the right tradeoff.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The limit for on chain data is great for those of us parsing, doubling it would be fine but I do think keeping it constrained and allowing the extended metadata to be the place for flexible and fast changes is the right tradeoff.

Agreed, it's way better to have a strictly defined set of mandatory metadata and more flexible set of extended metadata, while mandatory metadata may be subject to extensive CIP review process, extended metadata provide the ability to innovate with after-the-fact formalization thru the CIP process for greater interoperability.

CIP6/CIP6.md Outdated Show resolved Hide resolved
@cardanians
Copy link

  • missing "telegram-admin-handle" (array) - on the one hand, it can replace the previous problem + we can connect bots with this handle as verified pool-admin, notifications only for pool-verified-telegram-admins and so.

Sure, I didn't understand the purpose of that or how it was different from the marketing accounts. makes sense to me to add that back in.

  • I don't quite understand "pools" and combination with saturated_recommend; i mean [root][pools][...] is great okay, but saturated_recommend array should be in [root][saturated_recommend] (pool-ids of friends/owns pool which are recommended, when main pool is saturated)

The "recommend if full" idea is a good concept, but I'm unclear how best to implement this in practice and it brings up some strategic questions.

Is extended metadata meant to be different for each pool registered on chain? or is the concept to have one extended metadata file for each pool you have registered. It seems like your metadata concept was allowing multiple pools to share the same extended metadata so I headed down that path. I think we all have to agree on this fundamental question first before we sort out the rest.

Thank you. Extended is ideally unique for every pool (that was plan), like basic meta.json. Is true, someone can use one extended for more than one pool, but main target is 1=1.

I can imagine that in time we will come up with something new here that will already strictly require 1 = 1.


@gufmar yes, abuse is great use-case for a lot of services (for. ex. when you are sending spams, when you have attacking script / backdoor on webhosting, when your client does something he doesn't, something illegal for removal...), but here in this case I just do not see a specific use. Do we have a specific case where it would really make sense more than, for example, an official iohk newsletter?

CIP6/CIP6.md Outdated Show resolved Hide resolved
@gufmar
Copy link
Contributor

gufmar commented Aug 25, 2020

abuse is great use-case for a lot of services (for. ex. when you are sending spams, when you have attacking script / backdoor on webhosting, when your client does something he doesn't, something illegal for removal...), ... an official iohk newsletter?

we should define the intention and use case for abuse contacts.
I would not like to see it used as usual info addresses, nor for newsletters, marketing, promotion, campaigns...
In any case it's an optional (especially a correct working e-mail address) and when provided it should allow some urgent messaging because something unexpected/unwanted happened, from the senders point of view.

@papacarp
Copy link
Contributor Author

Thank you. Extended is ideally unique for every pool (that was plan), like basic meta.json. Is true, someone can use one extended for more than one pool, but main target is 1=1.

I can imagine that in time we will come up with something new here that will already strictly require 1 = 1.

Ok, I would recommend that we keep it 1 = 1 for now then. The specific pool attributes would make more sense then inside the info category in my opinion, but it seems like you think pool should be a root attribute so how about something like this:

{
    "serial": 2020072001,
    "itn": { ... },
    "info": { ... },
    "pool": {
            "id": "0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f",
            "country": "UK",
            "os": "LINUX",
            "infrastructure": "AWS",
            "status": "act",
            "my_other_pools":["0a0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f"],
            "saturated_recommend":["0a0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f"]
      }
    
}

I'd prefer not to do this as I think its confusing, but could also change

"pool":{ ... }

to

"pool":[{ ... }]

if we wanted to build in support for multiple pools from the start.

CIP6/CIP6.md Outdated Show resolved Hide resolved
Copy link
Contributor

@dcoutts dcoutts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is unclear to me what the security model is here.

With the on-chain referenced metadata, the checksum is on chain and signed by the pool operator and owners. So we know the metadata has not been tampered with and that they endorse the metadata content.

With the current proposed extended metadata, there's an additional level of indirection and no checksum. This has the benefit of not having to re-register the pool cert to change the metadata, but it also means we loose the tamper-resistance and loose assurance that the metadata content is endorsed by the operator or owners. We know they point to a URL but there are many points of weakness (like DNS hijacking, proxy or origin server hacking) so it's hard to trust the content.

What is the intended security model? What are we supposed to be able to rely on?

Is the extra level of indirection deliberate to avoid having to re-register the pool? Or is it because you think you cannot put more data into the on-chain referenced metadata?

If we want the same security model as the on-chain referenced metadata, but for performance or other reasons want to have more data in a separate file, we can just use the same trick of a URL + content hash.

So I'm not asking for any specific design: you propose what you want to propose. I'm asking for these security questions to be addressed in the proposal.

@mark-stopka
Copy link
Contributor

mark-stopka commented Aug 25, 2020

@papacarp it is true that since PoolTool does not anymore provide a link to inicial metadata JSON file it is hard to verify root of trust chain -> metadata -> extended metadata, before when you could click on a link that pointed to the on-chain registered metadata file, you could verify e.q. that both metadata and extended metadata live on a same domain with the same TLS certificate, this is not possible on PoolTool and I am not sure it was ever available on ADA Pools.

With that said, when using site such as PoolTool or ADA Pools, you are explicitly trusting a centralized counterparty to safely retrieve, process and display the embeded metadata, however there should probably be a recommendations on secure retieval and processing of extended metadata including e.q. how often are the data consumers expected to update such metadata. When on-chain registered metadata are updated, there is an on-chain event to trigger, when extended metadata are updated, there is no on-chain event to trigger on.

@cardanians
Copy link

cardanians commented Aug 26, 2020

Is the extra level of indirection deliberate to avoid having to re-register the pool? Or is it because you think you cannot put more data into the on-chain referenced metadata?

Yes, that is the main purpose of the creation this - re-registering is not required.

I can imagine that re-registering in addition can cause a lot of operators a lot of unexpected problems.

@gufmar
Copy link
Contributor

gufmar commented Aug 27, 2020

It is unclear to me what the security model is here.

to be defined (or redefined)

With the on-chain referenced metadata, the checksum is on chain and signed by the pool operator and owners. So we know the metadata has not been tampered with and that they endorse the metadata content.

another variant by keeping the whole metadata secure verifiable and trustworthy is to not have any checksum on chain, but adding a signed witness to the metadata file. then the pool owner can alter the metadata file whenever he want without adding any load to the chain.
Might only require a serial field in the metadata file similar to DNS zone root files (YYYYMMDDxx) in order to make it easier for applications to detect updates.
If we can redefine the security model this way, I see no benefit in splitting up metadata in one signed and one linked only file.

With the current proposed extended metadata, ... we loose the tamper-resistance and loose assurance that the metadata content is endorsed by the operator or owners. We know they point to a URL but there are many points of weakness (like DNS hijacking, proxy or origin server hacking) so it's hard to trust the content.

the mentioned weaknesses seem putting in question more than half of the existing internet I would say. As it is already an existing requirement to provide the metadata.json over HTTPS only, same should be obligatory for an extended file of course. Might - as an additional requirement - it even needs to be the same hostname as the main metadata file, but as the linked extended URL is signed by the pool owner it should be safe to trust. Then DNS- and Proxy-based MITM attacks are the challenge for an attacker, in order to modify the extended data like twitter handle or the owner logo (nothing fund or delegation related)

If we want the same security model as the on-chain referenced metadata, but for performance or other reasons want to have more data in a separate file, we can just use the same trick of a URL + content hash.

Can you explain please?
From my understanding if I modify the extended file, I need to update the hash in the main metadata file, and then I also need to re-register with an updated metadata hash on chain.

So I'm not asking for any specific design: you propose what you want to propose. I'm asking for these security questions to be addressed in the proposal.

definitively as it's important.
So in order to keep the whole resulting constellation as less complex as possible, I wonder first if there is any KO reason to not have the signed witness inside the metadata file, verifiable with the public owner key stored on-chain?
If there is a good technical/security reason, we should think about the pros/cons of having partial data in a more trustable main metadata and additional fields in a little bit less trustable file, not requiring re-registration. In this case we can also define what additional fields should become part of the main metadata file (same as the existing homepage URL) and what
can stay in the extended file.
Also, keep in mind any linked resource file like png, svg, ... has the same security model related questions. So if our conclusion is all in one metadata file with signed witness or on-chain hash, then all additional such resource files, will not be possible.

@mark-stopka
Copy link
Contributor

mark-stopka commented Aug 27, 2020

I wonder first if there is any KO reason to not have the signed witness inside the metadata file, verifiable with the public owner key stored on-chain?

Not a KO, but it is susceptible to version downgrade / rollback / replay for anyone bootstraping his / her app from chain genesis and fetching metadata as pool certificates apear on the chain.

The replay can be partially mitigated by using timestamping instead of serials and forced metadata update period just like we do with KES I suppose.

@papacarp
Copy link
Contributor Author

papacarp commented Sep 8, 2020

In preparation for our meeting in a few hours I've updated. There are a few conversations I'm hoping we can close out during our meeting:

  1. Security Model: As per @dcoutts request to clarify, we are proposing a model that does NOT require an update to the on chain registration currently. It seems like we should have a conversation together about whether this is the right strategy. If security is required then we could expand the max size of the on chain data, or use the witness strategy to validate the extended json. If security is not required we can simply include a link to extended metadata from the on chain registration.

  2. We have an open dialog still about whether abuse email contacts in the metadata are useful

  3. The structure of the JSON is a little arbitrary as it started with the adapools.org implementation. I tried to incorporate existing keys and hierarchy as much as possible although from a green field perspective I'm not sure I would have implemented that way. However, in the interest of bringing this to closure I think the proposed hierarchy is adequate.

CIP6/CIP6.md Outdated
},
"operator": {
"country": "UK",
"sex": "2"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably this should be "gender" instead

@shawnim
Copy link
Contributor

shawnim commented Sep 8, 2020

Why is gender (sex) of the operator included at all?
Seems like personally identifying information like that should go in the about me secion and be completely optional.

@shawnim
Copy link
Contributor

shawnim commented Sep 9, 2020

I just read this proposal in detail and have a few comments, questions and suggestions.

  1. Restructure the content to more clearly show what attributes belong to what objects. Main objects being pool, operator and owner. See example.
  2. Drop "itn" section. Seems like irrelevant history.
  3. Change CI to "media_assets".
  4. Rename media assets to have consistent naming scheme.
  5. Replace "color_main" with "color_fg" and "color_bg".
  6. Remove redundant "_handle" from all social attributes.
  7. For "social" how will new platforms be added?
  8. "contact" mixes types of comms ("abuse" and "support" with channels of comms "telegram_admin".
  9. There seems to be an arbitrary distinction between "social" and "contact".
  10. If the idea of having the address for a company is to have a mailing address or locate it on a map, it seems you might need "state_or_region" and "postal_code".
  11. Is "company_id" and "vat_id" something potentially used for tax purposes or government verification?
  12. "about" is an unnecessary layer where the attributes should be placed under the appropriate top level objects instead.
  13. How do you add to the list of "os" and "infrastructure" values?
  14. "infrastructure" mixes types of infrastructure with vendors.
  15. Why are status values only 3 letters? Why not "active" instead of "act"?
  16. Are all elements optional except "serial"?
  17. I don't understand the difference between "telegram_handle" and "telegram_admin". What about adding "telegram_channel"?
  18. I think it would be good to add "primary" to contact so you know which channel is the desired primary contact method.
  19. How about adding "phone_call" and "phone_text" to "contact"?
  20. Spell out "address" instead of "addr".
  21. I think we need to decide whether including personally identifying traits like "gender" is desirable. They can be used to promote or discriminate. If we want them then why not add others like "ethnicity"?
  22. "gender" should be spelled out as "female", "male" or "other".
  23. I would change "team_affiliation" to "affiliations" since they may not be teams.
  24. If we have "affiliations" why not add "supporting" since that is currently a distinguishing factor for many pools.
  25. Maybe "owner" should be an array of "owners".
  26. Add multiple countries for node locations.
  27. Change "saturated_recommend" to "saturated_recommend_id" because we may want to add "saturated_recommend_ticker" or "saturated_recommend_portfolio" later.

This turned out to be a lot but I hope it is helpful.
If people like this overall, I can help update the schema if desired.

Example


{
    "serial": 2020072001,
    "pool": {
        "id": "0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f",
        "status": "active",
        "saturated_recommend_id":"0a0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f",
        "contact": {
            "primary": "email",
            "discord": "coolpool",
            "email": "help@pooldomain.org",
            "facebook": "coolpool",
            "github": "coolpool",
            "phone_call": "+44 123456789",
            "phone_text": "+44 123456789",
            "rss": "https://mycoolpool.com/xml/poolrss.xml",
            "telegram": "coolpool",
            "telegram_channel":"https://t.me/coolchannel",
            "twitch": "coolpool",
            "twitter": "coolpool",
            "youtube": "coolpool"
        },
        "technology": {
            "description": "We have a high availability setup with 2 block producers and 4 relays in 2 different data centers.",
            "os": "Linux",
            "infrastructure": "cloud",
            "node_countries": [
                "DE",
                "UK"
            ]
        },
        "media_assets": {
            "icon_png_64x64_url": "https://mycoolpool.com/icon.png",
            "logo_png_url": "https://mycoolpool.com/logo.png",
            "logo_svg_url": "https://mycoolpool.com/logo.svg",
            "color_fg": "#RRGGBB",
            "color_bg": "#RRGGBB"
        },
        "affiliations": [
            "ISPPA",
            "Cardano Ambassador"
        ],
        "supporting": [
            "10% of fees donated to Save the Frogs.",
            "Some proceeds used to sponsor local Cardano meetup."
        ]
    },
    "operator": {
        "description": "Cool Ops operates pools for people.",
        "name": "Juanita Lopez",
        "company_name": "Cool Ops LLC",
        "company_id": "123456789",
        "vat_id": "GB123456789",
        "address": "101 Main St., Apt. 3",
        "city": "London",
        "state_or_region": "",
        "country": "UK",
        "postal_code": "123456",
        "gender": "",
        "ethnicity": ""
    },
    "owner": {
        "description": "I am a podcaster and believer in Cardano.",
        "name": "Ramesh Patel",
        "company_name": "",
        "company_id": "",
        "vat_id": "",
        "address": "",
        "city": "Vancouver",
        "state_or_region": "BC",
        "country": "CA",
        "postal_code": "12345",
        "gender": "male",
        "ethnicity": "Indian"
    }
}


@papacarp
Copy link
Contributor Author

Wow, thank you for your thoughtful feedback. I generally have no problem with the structure you propose. As mentioned earlier, the structure I put in was evolved from the adapools implementation in an attempt to make this all as easy as possible. I'd be fine switching to your structure if we can all agree.

  1. Drop "itn" section. Seems like irrelevant history.

It does not seem irrelevant yet. We are still at 50% staked in Cardano, so that's a lot of stake still looking for homes.

  1. For "social" how will new platforms be added?

I assume another CIP?

  1. If the idea of having the address for a company is to have a mailing address or locate it on a map, it seems you might need "state_or_region" and "postal_code".

My goal was simply a region defined by an internationally recognized country code. The address details came from adapools.

  1. Is "company_id" and "vat_id" something potentially used for tax purposes or government verification?

adapools added that. I'm not sure where they were going with it.

  1. How do you add to the list of "os" and "infrastructure" values?

I assume another CIP. I'm not sure how the process would work for lookup details like that.

  1. "infrastructure" mixes types of infrastructure with vendors.

Feel free to create a better list. These are the lists we used in pooltool.

  1. Why are status values only 3 letters? Why not "active" instead of "act"?

Its arbitrary. If you think it makes more sense as active then go for it. I envision these will end up translated so wanted a code to abstract a bit from the word.

  1. Are all elements optional except "serial"?

Seems like it. If you add a pool, then the pool ID is required, but I could envision someone wanting to "erase" all metadata and they could do so by submitting an empty extended json with a new serial.

  1. I don't understand the difference between "telegram_handle" and "telegram_admin". What about adding "telegram_channel"?

again, adapools. I'm fine with those changes. I know telegram_admin is referring to a private, direct, contact point wheras telegram_handle is likey a public one or more specifically a channel as you point out.

  1. I think we need to decide whether including personally identifying traits like "gender" is desirable. They can be used to promote or discriminate. If we want them then why not add others like "ethnicity"?

Well I think its an important distinguishing strategy for pools that we should capture IF the pool wants to market themselves that way. I also know CF has placed an emphasis on promoting gender in blockchain. Ethnicity would be the same, and leaving it out was not so much an oversight as just a decision to keep this incremental. Feel free to add in ethnicity if you can find a standard designator set for them (IEC? ISO?)

  1. "gender" should be spelled out as "female", "male" or "other".

we used a the ISO standard for this. Since gender identity has evolved considerably over the last 20 years I think its best to just allow the standards bodies to create the identifiers and we just follow that. Look up ISO/IEC 5218 sex

  1. Add multiple countries for node locations.

in pooltool we gave you two options. owner location and server location. Many of us have nodes spread all over the world and the public facing ones are often registered and easily traceable. Again, the goal here is to give the operators a country to affiliate with not necessarily document or capture reality. So I'm fine if you want to make the nodes an array, but it wasn't really my intention in capturing the data.

This turned out to be a lot but I hope it is helpful.

Its very helpful. I really appreciate your thoughtful ideas and practical recommendations.

If people like this overall, I can help update the schema if desired.

That would be great! I commented on the changes I had feedback on above. If I didn't comment on it, then I'm fine with your recommendation. My goal is not so much to control this standard, but to have something in place so we can start sharing data.

@gufmar
Copy link
Contributor

gufmar commented Sep 10, 2020

2. Drop "itn" section. Seems like irrelevant history.

First, thank you for this feedback. I believe this also makes evident how and why CIP efforts can help to end up with better, combined results.

The ITN section might seem irrelevant, but often the desired effect is invisible. In this case this ITN ticker proof has (and still does) prevent from imitating duplicates. At least I'm not aware of any duplicate tickers for all those who published their itn proof.

@shawnim
Copy link
Contributor

shawnim commented Sep 11, 2020

Thanks for clarifying those points.

Regarding ethnicity, I did not find any international standard.
I did find a list on wikipedia but it is dynamic and would be a lot to encode.
I think we can leave ethnicity as something that people can put in their description if they want.

I think both "operator" and "owner" could have the info separated into "person" and "organization". You could have either or both.
"person" is the human point of contact and "organization" is the company or group.

Here is an updated example:


{
    "serial": 2020072001,
    "pool": {
        "id": "0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f",
        "country": "DE",
        "status": "act",
        "saturated_recommend_id":"0a0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f",
        "contact": {
            "primary": "email",
            "discord": "coolpool",
            "email": "help@pooldomain.org",
            "facebook": "coolpool",
            "github": "coolpool",
            "phone_call": "+44 123456789",
            "phone_text": "+44 123456789",
            "rss": "https://mycoolpool.com/xml/poolrss.xml",
            "telegram": "coolpool",
            "telegram_channel":"https://t.me/coolchannel",
            "twitch": "coolpool",
            "twitter": "coolpool",
            "youtube": "coolpool"
        },
        "technology": {
            "description": "We have a high availability setup with 2 block producers and 4 relays in 2 different data centers.",
            "os": "Linux",
            "infrastructure": "cloud"
        },
        "media_assets": {
            "icon_png_64x64": "https://mycoolpool.com/icon.png",
            "logo_png": "https://mycoolpool.com/logo.png",
            "logo_svg": "https://mycoolpool.com/logo.svg",
            "color_fg": "#RRGGBB",
            "color_bg": "#RRGGBB"
        },
        "affiliations": [
            "ISPPA",
            "Cardano Ambassador"
        ],
        "supporting": [
            "10% of fees donated to Save the Frogs.",
            "Some proceeds used to sponsor local Cardano meetup."
        ],
        "itn": {
            "owner": "ed25519_pk1...",
            "witness": "ed25519_sig1..."
        }
    },
    "operator": {
        "description": "Cool Ops operates pools for people.",
        "person": {
            "name": "Juanita Lopez",
            "address": "101 Main St., Suite 3",
            "city": "London",
            "state_or_region": "",
            "postal_code": "123456",
            "country": "UK",
            "gender": "2"
        ],
        "organization": {
            "name": "Cool Ops LLC",
            "government_id": "123456789",
            "vat_id": "GB123456789",
            "address": "101 Main St.",
            "city": "London",
            "state_or_region": "",
            "postal_code": "123456",
            "country": "UK"
        ]
    },
    "owner": {
        "description": "I am a podcaster and believer in Cardano.",
        "person": {
            "name": "Ramesh Patel",
            "address": "",
            "city": "Vancouver",
            "state_or_region": "BC",
            "postal_code": "",
            "country": "CA",
            "gender": "1"
        ]
    }
}

@shawnim
Copy link
Contributor

shawnim commented Sep 11, 2020

A couple questions about the lists like "os" and "infrastructure".

  1. It looks like the schema has the values as examples. Does that mean people can put whatever they want? For example, can I put "Ubuntu", "OpenVMS", or "secret message here"? Does the "$id" value of "...anyOf..." limit the choice?
  2. Are the values all caps for a reason?
    Sorry for naive questions, I have only written schemas for xml and databases and the json schema is new to me.

For the "os" and "infrastructure" lists I would suggest:


"os": {
    "$id": "#/properties/pools/items/anyOf/0/properties/os",
    "type": "string",
    "title": "Pool Operating System",
    "description": "Pool Operating System",
    "default": "",
    "examples": [
        "Linux",
        "macOS",
        "Windows",
        "BSD",
        "Other",
        "Undisclosed"
    ]
},

"infrastructure": {
    "$id": "#/properties/pools/items/anyOf/0/properties/infrastructure",
    "type": "string",
    "title": "Pool Infrastructure",
    "description": "Pool infrastructure Platform",
    "default": "",
    "examples": [
        "cloud",
        "hosted bare metal",
        "local bare metal",
        "other",
        "undisclosed"
    ]
},

@crptmppt
Copy link
Contributor

crptmppt commented Oct 8, 2020

@papacarp - post-Editors meeting we hope you get to add changes as desired in the PR (in the next two weeks would be great!), we'd like to formally move into draft once that happens.
@gufmar - any outcome re: outreach to stakepools?

@cardanians
Copy link

I agree with all changes, thanks to all.

From my view only:

| extended | A url for extended metadata| Optional, 64 Characters Maximum, must be a valid URL |

Please extend this to 128 Characters Max, I believe we really dont want have there another redirect-mania, 128 should be okay for github raws etc.

@mark-stopka
Copy link
Contributor

@papacarp can you please rename the PR to reflect what is the CIP about?

CIP6/CIP6.md Outdated
"server": "long description of server details",
"company": "long description of company details"
},
"rss": "https://mycoolpool.com/xml/poolrss.xml"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RSS as well as Atom are very painful standards that are becoming obsolete, they have been discontinued by most browsers already. We might want to investigate alternatives, such as JSON Feed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

interesting

indeed RSS seems has slowly died out in last 15 years (https://trends.google.com/trends/explore?date=all&q=rss)

I had to lookup what https://www.jsonfeed.org/ is.
Exists since 2017
has surprisingly many supported plugins and libraries https://jsonfeed.org/code/

here is the github home: https://github.com/manton/JSONFeed

https://en.wikipedia.org/wiki/JSON_Feed
https://en.wikipedia.org/wiki/Comparison_of_feed_aggregators

we might need to understand first how SPos want to use news feeds (beside the social networks and chat services)

@gufmar
Copy link
Contributor

gufmar commented Nov 6, 2020

Please extend this to 128 Characters Max, I believe we really dont want have there another redirect-mania, 128 should be okay for github raws etc.

we need two URLs (data and hash file)
it shouldn't be a problem to allow up to 128 bytes
but then it makes sense to increase the max size of the main metadata file to 1024 bytes

@gufmar
Copy link
Contributor

gufmar commented Nov 6, 2020

@cardanians @papacarp @SebastienGllmt @ashisherc @dmitrystas

based on the CIP-Editor call conversations, I'm going to refine the proposal.

One question towards known metadata consumers (portal operators) is about the absolutely required fields.
This CIPs initial extended metadata should include what we know is required and will effectively be used by SPOs, not more.

The current state of the proposed JSON structure is

{
	"serial": 2020072001,
	"pool": {
		"id": "0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f",
		"country": "DE",
		"status": "act",
		"saturated_recommend_id": "0a0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f",
		"contact": {
			"primary": "email",
			"discord": "coolpool",
			"email": "help@pooldomain.org",
			"facebook": "coolpool",
			"github": "coolpool",
			"phone_call": "+44 123456789",
			"phone_text": "+44 123456789",
			"feed": "https://mycoolpool.com/xml/poolrss.xml",
			"telegram": "coolpool",
			"telegram_channel": "https://t.me/coolchannel",
			"twitch": "coolpool",
			"twitter": "coolpool",
			"youtube": "coolpool"
		},
		"technology": {
			"description": "We have a high availability setup with 2 block producers and 4 relays in 2 different data centers.",
			"os": {
				"$id": "#/properties/pools/items/anyOf/0/properties/os",
				"type": "string",
				"title": "Pool Operating System",
				"description": "Pool Operating System",
				"default": "",
				"examples": [
					"Linux",
					"macOS",
					"Windows",
					"BSD",
					"Other",
					"Undisclosed"
				]
			},
			"infrastructure": {
				"$id": "#/properties/pools/items/anyOf/0/properties/infrastructure",
				"type": "string",
				"title": "Pool Infrastructure",
				"description": "Pool infrastructure Platform",
				"default": "",
				"examples": [
					"cloud",
					"hosted bare metal",
					"local bare metal",
					"other",
					"undisclosed"
				]
			}
		},
		"media_assets": {
			"icon_png_64x64": "https://mycoolpool.com/icon.png",
			"logo_png": "https://mycoolpool.com/logo.png",
			"logo_svg": "https://mycoolpool.com/logo.svg",
			"color_fg": "#RRGGBB",
			"color_bg": "#RRGGBB"
		},
		"affiliations": [
			"ISPPA",
			"Cardano Ambassador"
		],
		"supporting": [
			"10% of fees donated to Save the Frogs.",
			"Some proceeds used to sponsor local Cardano meetup."
		],
		"itn": {
			"owner": "ed25519_pk1...",
			"witness": "ed25519_sig1..."
		}
	},
	"operator": {
		"description": "Cool Ops operates pools for people.",
		"person": {
			"name": "Juanita Lopez",
			"address": "101 Main St., Suite 3",
			"city": "London",
			"state_or_region": "",
			"postal_code": "123456",
			"country": "UK",
			"gender": "2"
		},
		"organization": {
			"name": "Cool Ops LLC",
			"government_id": "123456789",
			"vat_id": "GB123456789",
			"address": "101 Main St.",
			"city": "London",
			"state_or_region": "",
			"postal_code": "123456",
			"country": "UK"
		}
	},
	"owner": {
		"description": "I am a podcaster and believer in Cardano.",
		"person": {
			"name": "Ramesh Patel",
			"address": "",
			"city": "Vancouver",
			"state_or_region": "BC",
			"postal_code": "",
			"country": "CA",
			"gender": "1"
		}
	}
}

A second question is related to the RRS feed technology.
@mmahut brought up if RRS-feeds, as proposed by @dcoutts, are still commonly used, or if we can look into something else (see #15 (comment))

some thoughts on required fields and potential risks
I wonder if we really want to have complete postal addresses of operators and owners. In my opinion this is even a security risk. (where are the keys?)

Do we need a pool.country field in addition to owner.person and operator.organisation ? In the meaning of do we want to see claims like owner from London UK and operator from Amsterdam NL define their Pool is located in Ghana Africa?

Generally I see 3 category of "claims" an SPO can make here

  1. pool.id or pool.contact.telegram or pool.media_assets.logo_png Here it is in the full interest of SPOs to set the right values.
  2. pool.technology.description is not verifiable as literally any text. There is no other way, but the consumer of this information also understands that it is a subjective claim of the SPO.
  3. I believe we should think twice what can go wrong with the remaining fields such as pool.technology.os or pool.supporting or pool.country because the consumer might see this as reliable and verified information, while the SPO can claim literally anything here.

@dcoutts
Copy link
Contributor

dcoutts commented Nov 17, 2020

Folks, can we please concentrate on the mechanism for the extended metadata, and postpone all the bikeshedding about the content of the extended metadata for later?

The mechanism is what needs to be carefully described and agreed for us to be able to implement anything. Once we have that in place we can discuss additions to the schema with relatively little technical risk. Any time we spend now on the schema delays getting the mechanism in place.

So I'd again recommend that we go with a absolutely minimal metadata schema (e.g. with one single example uncontroversial entry), and firm up the description of the mechanism for the extended metadata. Once that's agreed we can open the door to bikeshedding about what new metadata we want to add, and that can be done bit by bit, one thing at a time so we don't have to have uncontroversial items block on controversial items.

@gufmar
Copy link
Contributor

gufmar commented Dec 7, 2020

Folks, can we please concentrate on the mechanism for the extended metadata, and postpone all the bikeshedding about the content of the extended metadata for later?

My recent commit b53da37 describes one possible mechanism. it actually seems to need a small extension of the cli tool to calculate the signature of any json schema. Or we have to build the schema in a similar way as the current signature commands already expect.

There would be an alternative way for the signature by using TX metadata. The drawback is, a consumer would need a fully synched chain and probably a proper db-sync instance to have access to this validation data. On the other hand, it would be an elegant technique using the chain itself. We could even think about designing the whole thing according to the current DID design (https://www.w3.org/TR/did-core/)

Based on the feedback I have received from who is already using the existing, non-standardised extended metadata, I see the need to already include the most frequently used fields (e.g. logo, contact handles) in order to encourage rapid and significant adoption.

Copy link
Contributor

@dcoutts dcoutts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is getting much better.

I'd still like to see more explicit and precise details: say what exactly the new fields in the main metadata are. What is their format exactly. There's some confusion of hashes vs signatures to clear up.

You've sort-of described how an operator can create the files, but not exactly and not clearly.

What steps do tools need to do to validate the extended metadata? I.e. explain what has to be downloaded, what signatures have to be checked with what keys, don't just assume everyone understands the scheme already.

Then a new (not available yet) `cardano-cli` command generate the signed hash (`extData.sign`) .

```shell
cardano-cli shelley stake-pool rawdata-hash
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
cardano-cli shelley stake-pool rawdata-hash
cardano-cli shelley stake-pool rawdata-sign

| `homepage` | A website URL for the pool| 64 Characters Maximum, must be a valid URL |
| `name` | A name for the pool | 50 Characters Maximum |
| `extDataUrl` | A URL for extended metadata | optional, 128 Characters Maximum, must be a valid URL |
| `extHashUrl` | A URL with the extended metadata hash | optional, 128 Characters Maximum, must be a valid URL |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you don't mean extHashUrl but rather extSigUrl. This file contains the signature of the data file, which can be verified with the extVkey.

Comment on lines 71 to 77
The operator now:

- has the `extData.json` and `extData.sign` files
- will publish them at some https:// URL (probably same host as the main metadata)
- use the `extData.vkey` string and the two extend file URLs to re-register the main metadata

This re-registration of the main metadata file with the `extData.vkey` and the two URLs is only necessary once. Afterwards, the operator can update his extended metadata at any time, generate the new signature and put both files online.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I understand, but the text is not clear.

Make clear which fields in the main metadata file these things correspond to. Say that you need the URL of the extended metadata json file, and signature file, that these are to fill in the extDataUrl and extSigUrl fields.

And what format exactly is the extVkey? Bech32 I presume? What prefix?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gufmar - can we get this addressed to have it merged next meeting?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about poolmd_vk for the bech32 prefix?

Copy link
Contributor

@crptmppt crptmppt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please pick a title to better reflect what this CIP is about (SPOs...) - Currently too vague

CIP-0006/CIP-0006.md Outdated Show resolved Hide resolved
@dcoutts dcoutts changed the title CIP6 - WIP CIP0006 Stake pool extended metadata Feb 9, 2021
@gufmar
Copy link
Contributor

gufmar commented Feb 9, 2021

Todo:
add a step-by-step description on how a 3rd party can implement the pk/signature verification

Copy link
Contributor

@dcoutts dcoutts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good now.

Just would like a simple clear section that says explicitly what software implementing this spec needs to do to verify the extended metadata.

So it should say that the extVkey is an ordinary 32byte ed25519 verification key. That the extSigUrl resolves to a raw 64byte ed25519 signature. What is the signature of? Is it the raw data we find at the end of extDataUrl or is it the hash of that data? Everywhere else in Cardano that we use signatures we sign hashes only, not variable-sized raw data. We should do the same here. So sign the Blake 2b 255bit hash of the raw data we find at extDataUrl.

Thus the verification of the extended metadata is simply to do an ed25519 verification of the signature found at extSigUrl using the vkey from the main metadata, over the hash of the data found at extDataUrl.

Comment on lines +25 to +26
| `description` | Pool Description. Text that describes the pool | 50 Characters Maximum |
| `homepage` | A website URL for the pool | 64 Characters Maximum, must be a valid URL |
Copy link

@pintaric pintaric Feb 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this really accurate? As far as I can tell, the Design Specification (Section 4.2) doesn't impose an explicit character limit on the "description" and "homepage" metadata fields. All it requires is a total size of the on-chain metadata of 512 bytes or less.

Edit: To be clear, I am talking about the current metadata specification here. CIP0006 is proposing to shorten the "description" character limit to 50 and the "homepage" character limit to 64, correct?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point. I'm not sure what is effectively true, but I'm aware of multiple different origins and specifications.

For example the incentivised testnet metadata registry
https://github.com/cardano-foundation/incentivized-testnet-stakepool-registry#submission-well-formedness-rules
afaik this was used as a template for the mainnet metadata registration file.

I also remember another proposal/definition/example I can't find now.
And there is what currently is implemented by the SMASH server as input validation
https://github.com/input-output-hk/smash/blob/479fc6d8fa62537cd6c8560176d6a8a7ffda6e9e/smash-servant-types/src/Cardano/SMASH/Types.hs#L212-L252

With the additional fields, we need to increase the current max size from 512 to (proposed) 1024 bytes
By not specifying additional max sizes for individual fields would make it very unpredictable (for example for UI design)
One could consciously use the shortest possible URLs, and fill an unlimited description or name field with >900 characters.

So I propose to align to what Smash server currently has implemented and is also using in his SQL DB schema.

@pintaric
Copy link

In my opinion, what is missing from the proposed extended metadata specification is a way to specify the fingerprint of a public PGP key, or alternatively a download URL for the public key. Something like this:

{
    "email_pgp_key_fingerprint": "EDBB2C22DA2BC47A4B5D94EDD8EB88D3167C82B8",
    "email_pgp_key_url": "https://keys.openpgp.org/vks/v1/by-fingerprint/EDBB2C22DA2BC47A4B5D94EDD8EB88D3167C82B8"
}

@crptmppt crptmppt merged commit a0eab92 into cardano-foundation:master Mar 8, 2021
KtorZ added a commit that referenced this pull request Nov 23, 2021
  This overrides the previous schema and gives priority to the newest version that is in the README. It corresponds to the schema of the latest commit from the PR #15:

  4006d06

  Fixes #142.
KtorZ added a commit that referenced this pull request Nov 23, 2021
  This overrides the previous schema and gives priority to the newest version that is in the README. It corresponds to the schema of the latest commit from the PR #15:

  4006d06

  Fixes #142.
crptmppt pushed a commit that referenced this pull request Dec 7, 2021
* Fix CI build for authors without email.

* CIP-0006: Move README's schema as separate schema.json

  This overrides the previous schema and gives priority to the newest version that is in the README. It corresponds to the schema of the latest commit from the PR #15:

  4006d06

  Fixes #142.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.