Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

metadata format #140

Closed
moedn opened this issue Jun 8, 2020 · 6 comments
Closed

metadata format #140

moedn opened this issue Jun 8, 2020 · 6 comments

Comments

@moedn
Copy link

moedn commented Jun 8, 2020

The issue

How is metadata actually delivered? OKH suggests a 'manifest file' (see 4.3 here) that is

  1. structured in compliance with the template in the standard
  2. located in the root folder of the repo (as required by the standard)
  3. named according to the method stated in the standard (okh-thingname.yml)
  4. stored in a certain file format (YAML in that case)

I generally like this approach for it simple & effective nature, it appears stable and decentralised for me. However, some details:

  • stating the working title in the file name is
    • redundant
    • inconvenient for long working titles
    • → I would cross out this requirement and keep the file name as short & intuitive as possible …e.g. META-MOD for OSH modules or META-COM for components
  • using YAML as file format triggered a huge discussion in the group, which was never resolved (find a summary here)
    • @hoijui any suggestions? Creating standard metadata shouldn't require any extra tool or knowledge and the content should be easily convertible into RDF/TTL by e.g. a crawler; people mentioned JSON and TOML

The background

Just created a new task within the new research project Open.Choice (still just a proposal ← the project, not the task). The project aims to enable decentralised production, maintenance and modification of COVID-19-related hardware. This metadata standard and the OSHI appear as important building blocks for that.

@hoijui
Copy link
Collaborator

hoijui commented Jun 8, 2020

* stating the working title in the file name is
  
  * redundant
  * inconvenient for long working titles
  * **→ I would cross out this requirement and keep the file name as short & intuitive as possible** …e.g. META-MOD for OSH modules or META-COM for components

Very much agree!
I never put the working tittle there. ;-)
(all the ones I created are just okh.yml)
This also helps in crawling and other automated processes.

I would keep the file-extension though.

* using YAML as file format triggered a huge discussion in the group, which was never resolved (find a summary [here](https://app.standardsrepo.com/MakerNetAlliance/OpenKnowHow/issues/269))
  
  * @hoijui any suggestions? Creating standard metadata shouldn't require any extra tool or knowledge and the content should be easily convertible into RDF/TTL by e.g. a crawler; people mentioned JSON and TOML

From a purely technical point of view, it should be TTL only.
It is by far the cleanest and most powerful, and in my eyes also the most comfortable to use, as it provides a built-into-the-format way to link to the documentation, plus auto-verification, plus built-in versioning.

Apart from that, there are different alternatives:

  • supporting multiple formats at the same time (e.g. TTL, TOML, YAML & JSON)
  • supporting only one primary format
  • supporting only one primary format, but providing auto-converters back and forth into the others
  • we could focus on having one or more tools to generate the files (GUIs, TUIs, CLIs), and then we could simply use the technically best format
  • ... probably more?

It is a hard decision. From my experience, people will never end up choosing TTL, out of fear of non-acceptance. I think that is bad, but ... does it make sense for me to try to push it against the majorities will? no.
It is partly a philosophical question, but it can be decided upon rationally, by looking at evidence in history (roughly: cases of competing technologies, what happened, which ones were chosen, ...)
as said, usually the technically inferior ones gt chosen, and somehow.. the world continues spinning. what if, for once, we try something else? ;-)

@moedn
Copy link
Author

moedn commented Jun 8, 2020

Great reply, thanks!!

Apart from that, there are different alternatives:

  • supporting multiple formats at the same time (e.g. TTL, TOML, YAML & JSON)
  • supporting only one primary format
  • supporting only one primary format, but providing auto-converters back and forth into the others
  • we could focus on having one or more tools to generate the files (GUIs, TUIs, CLIs), and then we could simply use the technically best format
  • ... probably more?

The current metadata approach relies on linking between other metadata files. From my naive perspective of very limited coding experience I assume that this is possible Independently from the file format (as I could just place a link in a YAML file).

My suggestion based on your input:

  1. support a list of formats (e.g. TTL, TOML, YAML & JSON)
  2. convert all crawled files into TTL; for wikibase we'll use TTL only
  3. (optional, but recommended): provide a tool to generate the files (GUI/TUI/CLI); this tool will output TTLs, of course

→ for me that's (almost) enough to take a decision in our next meeting.

@hoijui any suggestions for widely accepted file formats that wouldn't be too much effort to convert to TTL when following a standard structure/template? (TOML, YAML, JSON,… ? :) )

@penyuan
Copy link

penyuan commented Jun 9, 2020

Sorry for my ignorance, but is "TTL" this format?

As for file format (TOML, YAML, JSON, etc.), I am aware of the ongoing discussions. Sorry I didn't follow all of it so it might have already been discussed, but:

From my experience it is useful to ease entry and onboarding, especially if one of our aims is to promote open source hardware. To that end, I think it would be good to use a file format that is more human-readable and easy to understand at-a-glance. An extreme example is that Markdown is much easier for a human to parse than dense HTML or even LaTeX.

I certainly don't think Markdown is the format to use in this case, but of the options already presented, is there one that is more human-readable? Is this factor worth considering? (honest question)

@moedn
Copy link
Author

moedn commented Jun 12, 2020

@penyuan thanks for reaching out!

  1. Please don't apologise for questions, suggestions or critique; it's an open source process and every bit (especially from skilled/knowledgeable people) is valuable. + I don't expect people to be always totally up to date with all ongoing discussions → it's simply too much sometimes, so let's just ask and support each other
  2. yes, we/I meant the turtle format
  3. yes, I totally agree about relying on a simple, intuitive way to provide this data. I won't expect from any OSH project to read into TTL so they can 'become part of our network'. No one will do that. However, we may want to use TTL in our database. Input from other file formats (hereinafter called 'distributed format') may get converted by our crawler into the actual input format so we can effectively use it in the Wikibase instance
  4. MD could be indeed another option for a distributed format, if we can provide a reasonable template
  5. the distributed format should be human-readable, the input format doesn't need to be as it's content is a) identical with the distributed format and b) can be queried by the Wikibase instance

@penyuan
Copy link

penyuan commented Jun 15, 2020

Thanks @moedn for your encouragement! 😃 Understood, makes sense.

I actually don't think Markdown is good as a data/metadata format, it was just meant as an extreme example of high human-readability. But yeah, maybe it'll be good to have a Markdown-formatted "for-humans" descriptive/summary document that accompanies the formal manifest file? If there is such a summary (filled out from a template, of course, as you suggested), then the manifest file itself doesn't have to be very human-readable.

That said, if the specification requires both a manifest file and a Markdown summary based on a template, it's more work for projects that want to use it...

moedn referenced this issue in OPEN-NEXT/OKH-LOSH Jun 17, 2020
@moedn
Copy link
Author

moedn commented Jul 9, 2020

In today's meeting we agreed on:
use TOML only for starters & keep it simple

@moedn moedn closed this as completed Jul 9, 2020
@hoijui hoijui transferred this issue from OPEN-NEXT/OKH-LOSH May 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants