add a metalad description #930

christian-monch · 2023-01-30T22:11:40Z

This PR contains a first draft of metalad documentation. It covers meta-add and meta-dump and the
transport of metadata between repositories.

This is a first draft of metalad documentation. It covers meta-add and meta-dump and the transport of metadata between repositories.

… of an extractor

adswa

Thanks much for this PR @christian-monch. I believe a section on metalad is hugely important. It's really really highly sought after by handbook readers, and I think there is not a single accessible documentation of it anywhere in the datalad ecosystem.
I've added a first round of rewriting and restructuring on top of this draft. I have also added a couple of TODOs that would make the section even better, if implemented.

I have a number of proposals I'd like to get your opinion on:

I believe it would be helpful to see actual output from the meta-* commands. For this, we could turn what's currently code blocks into runrecords.
Related to the suggestion above, I saw that you already created a test repo with metadata - maybe it is possible to shrink that down so that the output of a meta-dump would be manageable?
I think it would be cool to see two extractor-related small examples in the section
- how to use any of the built-in extractors
- how to built your own extractor (maybe @jsheunis has built one once that we could showcase?)
I'd like to include some technical notes on how metadata is stored internally, and why it is stored this way. If you have suggestions on how to write that up or which aspects would be the most important to add, I'd be all ears. I also haven't fully understood the meta-dump command's -r parameter and how the way meta data is internally represented interacts with it.
I'd like to include some example queries, e.g., with a simple pipe into jq such that there is some notion of what "querying" even means

adswa · 2023-02-09T08:18:44Z

I'd like to link relevant cross-references to existing documentation in this PR, too, as the office hour this week showed how hard of a time users have to find the relevant pieces. I'm thinking of including @mslw's coming guide to writing own extractors, and also mention @jsheunis's catalog workflow-pipeline (can I get a pointer where to find out more about it). What else could I link, @christian-monch, @mslw, @jsheunis ?

docs/beyond_basics/101-181-metalad.rst

mslw · 2023-02-09T13:22:07Z

Regarding external pieces, these are the things I had "saved":

For the datalad catalog workflow subcommand (FTR: what it currently does is to combine extraction with "metalad_core", "metalad_studyminimeta", "bids_dataset", & "datacite_gin" extractors if possible, translation of outputs, catalog create, and catalog add) I don't think there is a dedicated documentation other than examples in catalog README, catalog manpage, and the workflows source with docstrings?

For Metalad, I think this gist by @christian-monch used to be primary overview of the functionality (but probably this PR means to supersede it?)

For a manual brealdown of extract / translate / update catalog, I have @jsheunis's draft of a fairly big catalog workflow and my digest of it. Though keep in mind that metalad->catalog translation is being formalized in the catalog datalad/datalad-catalog#246

And there is quite a bit of catalog docs

jsheunis · 2023-02-09T13:35:05Z

@mslw Already added all the sources I wanted to list. And there are also this primer and tutorials: https://github.com/datalad/tutorials/blob/master/notebooks/catalog_tutorials/datalad_catalog_primer.ipynb

I will add that the "fairly big catalog workflow" might have some useful text descriptions but is itself outdated, and kind of superseded by datalad-catalog's workflow functionality.

I will add another point that what we want to describe in the handbook in terms of this type of workflow is pretty much the same as what we are working on now with the "non-portal" effort, which I think will deserve it's own chapter (if not more). So I would keep whatever we add to the metalad or catalog chapters minimal and eventually refer to the to be added workflow chapters

lastly:

I believe it would be helpful to see actual output from the meta-* commands. For this, we could turn what's currently code blocks into runrecords.

I agree!

Co-authored-by: Stephan Heunis <jsheunis@gmail.com>

jsheunis · 2023-02-09T15:12:56Z

BTW @christian-monch @adswa I think the way that metadata and its relation to primary data are described here is clear and intuitive, and I think there's some snippets that would be useful to port into the "non-portal" paper.

Co-authored-by: Stephan Heunis <jsheunis@gmail.com>

adswa · 2023-02-13T08:06:58Z

docs/beyond_basics/101-181-metalad.rst

Does anyone have complex dumps from the real-world? The docstring of meta-add contains the spec for this notation, but it has a complex-ish feel to it, so the more examples, the better :)

docs/beyond_basics/101-181-metalad.rst

adswa · 2023-02-13T08:09:58Z

docs/beyond_basics/101-181-metalad.rst

+^^^^^^^^^^^^^^^^^
+
+TODO: something more about meta-dump and concrete usage example with, e.g., ``jq``


Does anyone has an example in their shell history, by any chance? :)

This would create a list of all familyName values in the person lists that were extracted from .studyminimeta.yaml-files by metalad_studyminimeta-extractor:

> datalad meta-dump -r|jq '.extracted_metadata["@graph"][3]["@list"][].familyName'|sort|uniq

Something like this? (also studyforrest dataset)

Output array of all subdatasets:

$ datalad meta-extract -d . metalad_core > metadata_record_metalad_core.json $ datalad meta-add -d . metadata_record_metalad_core.json $ datalad meta-dump . | jq '.extracted_metadata["@graph"] | .[] | select(.["@type"] == "Dataset") | [.hasPart[]? | {"dataset_id": (.identifier | sub("^datalad:"; "")), "dataset_version": (.["@id"] | sub("^datalad:"; "")), "dataset_path": .name}]' [ { "dataset_id": "d5dd3da0-a631-4c0c-a4a9-de55dfc4620f", "dataset_version": "276ceffd8c11db1b99b30b4ae6965aac296fba02", "dataset_path": "artifact/3T_movie_eyetracking" }, { "dataset_id": "ad9b6c66-4413-4b4f-b6da-b7f25d0d6397", "dataset_version": "787741801ad6ff9e4d0aea7a1403bc483f9e1a74", "dataset_path": "artifact/3T_structural_mri" }, { "dataset_id": "6075c0fa-ab72-4bab-9888-3b597f0e63b1", "dataset_version": "64cf42ad00c707bb6533b5944041582c6bfdd384", "dataset_path": "artifact/3T_visuallocalizer" }, { "dataset_id": "d4759300-5563-467d-be5f-e5b164fb3060", "dataset_version": "7b6ec16910eb6a77cd0cc3d2945e7fb292fb9672", "dataset_path": "artifact/7T_audiomovie" }, { "dataset_id": "c08af312-e05b-43b3-b499-db0d2ad46bf6", "dataset_version": "7b58b38af7167d576489cc58e9ec4b3eee0822fc", "dataset_path": "artifact/7T_musicperception" }, { "dataset_id": "da15d84c-9c8b-11e9-a3fb-f0d5bf7b5561", "dataset_version": "f168e8e2b8bef21f373e8a5385e528130ca6339d", "dataset_path": "artifact/media" }, { "dataset_id": "126cd950-377c-4600-a921-045cf408bd9f", "dataset_version": "1b7cb46a97e75aeccab28ef7b61693f4f02a0f8d", "dataset_path": "artifact/movie_eyetracking" }, { "dataset_id": "0f66b1ba-e9a9-46fd-b9d9-2e64fe94d307", "dataset_version": "c74b66cf37c0d4ed8914296c6d7792b2d25696aa", "dataset_path": "code/conversion_qa" }, { "dataset_id": "7fcd8812-d0fe-11e7-8db2-a0369f7c647e", "dataset_version": "2ccaa115543c21e6658950d1cb8cc3038f14272f", "dataset_path": "derivative/aggregate_fmri_timeseries" }, { "dataset_id": "c8ec2919-493b-4af5-9271-cbe9ebd08c43", "dataset_version": "74cd7ec0538448b05fb4d5f91119b279c5e9ab04", "dataset_path": "derivative/aligned_mri" }, { "dataset_id": "3304e775-5f5f-435a-b68e-d98c9f5fb72a", "dataset_version": "aaac44e047d375cd8f791b1b6fe2b739f02c83b2", "dataset_path": "derivative/cortical_surfaces_freesurfer" }, { "dataset_id": "ceb007ac-ef05-4392-98d2-35c02a774a21", "dataset_version": "688d8d8558fe847f4c1b19aed579745bcd6c7744", "dataset_path": "derivative/image_space_transformations" }, { "dataset_id": "2d05f277-94b0-470b-8e11-4e56691d5b89", "dataset_version": "78e04a00fedbe3e055f86c2f9127aa48e1133d55", "dataset_path": "derivative/retinotopic_maps" }, { "dataset_id": "92e65958-4a5a-4c34-a4f4-ee070f7a123b", "dataset_version": "203aa983534fc2b823ff4777a85f4f80d7a68656", "dataset_path": "derivative/visual_areas" }, { "dataset_id": "5b1081d6-84d7-11e8-b00a-a0369fb55db0", "dataset_version": "b623351b43eb2715331ac59ad4cf41682e84ff7d", "dataset_path": "original/3T_multiresolution_fmri" }, { "dataset_id": "1882e2e6-fbbf-4ade-a65f-3a1615235f51", "dataset_version": "e5d2f8368fc5f6717d8ef131041c6d943298d0c7", "dataset_path": "original/3T_structural_mri" }, { "dataset_id": "3a8648b3-7df8-413f-8efb-4d39040ac174", "dataset_version": "10e23aafa8271f742e9022a3522d9a88d7fe30cf", "dataset_path": "original/7T_multiresolution_fmri" }, { "dataset_id": "5eaff716-54eb-11e8-803d-a0369f7c647e", "dataset_version": "72f835ada046bd0479009ea0ff933b30a95b0076", "dataset_path": "original/phase2" }, { "dataset_id": "4c536c4a-ec61-11e6-9440-00b56d060aa7", "dataset_version": "298e2a884d0598dd89753e9b5576bd93d19f335a", "dataset_path": "stimulus/computational_annotations" }, { "dataset_id": "45b9ab26-07fc-11e8-8c71-f0d5bf7b5561", "dataset_version": "29dcce2b9477537b433996ebc342c531139e1d87", "dataset_path": "stimulus/curated_annotations" } ]

Ooh I only saw your reply now @christian-monch. I think yours is a bit more user friendly :)

I think it is good example. Just wanted to mention that the first two lines can be combined to:

> datalad meta-extract -d . metalad_core | datalad meta-add -d . --json-lines -

And since -d . is equivalent to the default dataset location, the following would be identical:

> datalad meta-extract metalad_core | datalad meta-add --json-lines -

Thanks a lot! I think the family name example would work really well. I get a few errors/warnings by jq in @christian-monch's example - any idea what might be wrong?

datalad meta-dump -r | jq '.extracted_metadata["@graph"][3]["@list"][].familyName' | sort | uniq jq: error (at <stdin>:16): Cannot iterate over null (null) jq: error (at <stdin>:22): Cannot iterate over null (null) jq: error (at <stdin>:23): Cannot iterate over null (null) jq: error (at <stdin>:24): Cannot iterate over null (null)

This is yet unresolved, I'm merely hiding it in the output. If some has advice, let me know

docs/beyond_basics/101-181-metalad.rst

christian-monch · 2023-02-13T10:35:40Z

@adswa : I think it might be better to either include core metadata in the example or to use exclusively core metadata in the examples. WDYT?

adswa · 2023-02-14T14:08:19Z

Not everything in this PR is fully finalized, but its a very good start. I'll merge it to have metadata documentation available.

adswa · 2023-02-14T14:56:29Z

hey @all-contributors please add @christian-monch for content

allcontributors · 2023-02-14T14:56:32Z

@adswa

@christian-monch already contributed before to content

adswa · 2023-02-14T14:56:48Z

hey @all-contributors please add @jsheunis for review

allcontributors · 2023-02-14T14:56:57Z

@adswa

I've put up a pull request to add @jsheunis! 🎉

adswa · 2023-02-14T14:57:03Z

hey https://github.com/all-contributors please add @mslw for review

adswa · 2023-02-14T14:57:56Z

hey @all-contributors please add @mslw for review

allcontributors · 2023-02-14T14:57:59Z

@adswa

@mslw already contributed before to review

christian-monch and others added 7 commits January 30, 2023 23:09

add a metalad description

5ea0766

This is a first draft of metalad documentation. It covers meta-add and meta-dump and the transport of metadata between repositories.

add description of remote-dump

0a8801a

Tweak MetaLad intro: Fewer subheadings, typos, and language edits

ad0a3fb

More rewriting of the Metalad section, slight restructuring

5133434

Further restructuring, information on required fields and the concept…

693e818

… of an extractor

GLOSS: add ref to glossary

355e8b9

Another round of restructuring and rewriting, include next TODOs

c9863ba

adswa reviewed Feb 3, 2023

View reviewed changes

jsheunis suggested changes Feb 9, 2023

View reviewed changes

adswa and others added 3 commits February 9, 2023 15:31

better wording

8be9e3e

Co-authored-by: Stephan Heunis <jsheunis@gmail.com>

typo

31f940b

Co-authored-by: Stephan Heunis <jsheunis@gmail.com>

typo

8332b28

Co-authored-by: Stephan Heunis <jsheunis@gmail.com>

adswa and others added 7 commits February 13, 2023 07:43

link to code instead of docs

9600978

Co-authored-by: Stephan Heunis <jsheunis@gmail.com>

add a link for citation.cff

c9574d3

Actually execute the image metadata example

9b35531

Add link to user guide to writing own extensions

6b58b51

Add Stephan's proposed 'Using Metadata' paragraph

fb48264

Make other exampels runrecords as well

d8c8f46

add code-snippets for metalad section

49d244b

adswa reviewed Feb 13, 2023

View reviewed changes

docs/beyond_basics/101-181-metalad.rst Outdated Show resolved Hide resolved

adswa reviewed Feb 13, 2023

View reviewed changes

docs/beyond_basics/101-181-metalad.rst Outdated Show resolved Hide resolved

adswa reviewed Feb 13, 2023

View reviewed changes

docs/beyond_basics/101-181-metalad.rst Outdated Show resolved Hide resolved

Add a findoutmore on the metadata model

9124210

adswa added 6 commits February 13, 2023 16:12

add meta-extract example

507988a

add small note explaining -r a bit more

9c2c56d

add preliminary jq-based query example

1bfc6af

Update code snippet

f63bd9f

GLOSS: add pipe

53141d5

move section on dumping metadata across datasets

86b5a9f

adswa marked this pull request as ready for review February 13, 2023 15:20

adswa added 4 commits February 13, 2023 16:25

add left-over code snippet

f5131c3

add public link to studyminimeta schema

c10cea2

Hide jq warning in output

a85e1b5

Add uidmap to appveyor build environment

d98c5f0

adswa merged commit 703247d into datalad-handbook:main Feb 14, 2023

allcontributors bot mentioned this pull request Feb 14, 2023

docs: add jsheunis as a contributor for review #932

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add a metalad description #930

add a metalad description #930

christian-monch commented Jan 30, 2023

adswa left a comment

adswa commented Feb 9, 2023

mslw commented Feb 9, 2023

jsheunis commented Feb 9, 2023 •

edited

Loading

jsheunis commented Feb 9, 2023

adswa Feb 13, 2023

adswa Feb 13, 2023

christian-monch Feb 13, 2023

jsheunis Feb 13, 2023

jsheunis Feb 13, 2023

christian-monch Feb 13, 2023

adswa Feb 13, 2023

adswa Feb 14, 2023

christian-monch commented Feb 13, 2023 •

edited

Loading

adswa commented Feb 14, 2023

adswa commented Feb 14, 2023

allcontributors bot commented Feb 14, 2023

adswa commented Feb 14, 2023

allcontributors bot commented Feb 14, 2023

adswa commented Feb 14, 2023

adswa commented Feb 14, 2023

allcontributors bot commented Feb 14, 2023

		^^^^^^^^^^^^^^^^^

		TODO: something more about meta-dump and concrete usage example with, e.g., ``jq``

add a metalad description #930

add a metalad description #930

Conversation

christian-monch commented Jan 30, 2023

adswa left a comment

Choose a reason for hiding this comment

adswa commented Feb 9, 2023

mslw commented Feb 9, 2023

jsheunis commented Feb 9, 2023 • edited Loading

jsheunis commented Feb 9, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

christian-monch commented Feb 13, 2023 • edited Loading

adswa commented Feb 14, 2023

adswa commented Feb 14, 2023

allcontributors bot commented Feb 14, 2023

adswa commented Feb 14, 2023

allcontributors bot commented Feb 14, 2023

adswa commented Feb 14, 2023

adswa commented Feb 14, 2023

allcontributors bot commented Feb 14, 2023

jsheunis commented Feb 9, 2023 •

edited

Loading

christian-monch commented Feb 13, 2023 •

edited

Loading