Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a draft of a user guide on writing custom extractors #282

Merged
merged 11 commits into from
Feb 14, 2023

Conversation

mslw
Copy link
Contributor

@mslw mslw commented Sep 2, 2022

This PR introduces a "User guide" section to the docs, and adds a draft of a user guide on writing custom extractors, addressing #281

This is a rather rough draft, based on what I was able to piece together from reading or experimentation, with some statements being just my educated guesses. I also left a couple of TODOs in the text. Since I'll be away for a while, please feel free to hack away on the descriptions. I've allowed edits by maintainers, so you should be able to push changes.

As a side note, I formatted the text with 1 sentence per line (with exceptions) - for me it works well, because then whole sentences can be commented/edited.

Also introduces a "User guide" section to the docs.
Copy link
Contributor Author

@mslw mslw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's some more of my doubts / questions, which I didn't include in the text.

docs/source/user_guide/writing-extractors.rst Outdated Show resolved Hide resolved
docs/source/user_guide/writing-extractors.rst Show resolved Hide resolved
docs/source/user_guide/writing-extractors.rst Show resolved Hide resolved
@codecov-commenter
Copy link

codecov-commenter commented Sep 2, 2022

Codecov Report

Base: 87.33% // Head: 86.56% // Decreases project coverage by -0.77% ⚠️

Coverage data is based on head (1ead8ba) compared to base (9335f02).
Patch has no changes to coverable lines.

❗ Current head 1ead8ba differs from pull request most recent head 30e1c86. Consider uploading reports for the commit 30e1c86 to get more accurate results

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #282      +/-   ##
==========================================
- Coverage   87.33%   86.56%   -0.78%     
==========================================
  Files          67       88      +21     
  Lines        4288     4831     +543     
==========================================
+ Hits         3745     4182     +437     
- Misses        543      649     +106     
Impacted Files Coverage Δ
datalad_metalad/__init__.py 100.00% <0.00%> (ø)
datalad_metalad/extractors/tests/test_custom.py 100.00% <0.00%> (ø)
...talad/extractors/legacy/tests/test_datacite_xml.py 100.00% <0.00%> (ø)
...atalad_metalad/extractors/legacy/datalad_rfc822.py 96.36% <0.00%> (ø)
...lad_metalad/extractors/legacy/tests/test_rfc822.py 100.00% <0.00%> (ø)
...alad/extractors/legacy/frictionless_datapackage.py 91.66% <0.00%> (ø)
datalad_metalad/extractors/legacy/xmp.py 12.96% <0.00%> (ø)
...talad_metalad/extractors/legacy/tests/test_base.py 100.00% <0.00%> (ø)
datalad_metalad/extractors/legacy/__init__.py 100.00% <0.00%> (ø)
datalad_metalad/extractors/legacy/datalad_core.py 77.55% <0.00%> (ø)
... and 15 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@jsheunis jsheunis self-requested a review September 20, 2022 12:16
@jsheunis
Copy link
Member

Quick comment: I think it would be useful to also include a short description of the external_extractor (wrong name?) such that people have a good understanding of dataset-level vs file file-level vs alternative extraction processes, and how they can include their pre-existing "extractors" into the metalad ecosystem.

@christian-monch
Copy link
Collaborator

@mslw Thanks a lot for the description. It is great.

I just pushed some updates (more to come), please have a look.

@mslw
Copy link
Contributor Author

mslw commented Jan 16, 2023

It felt like I abandoned this PR, so I gave it another read and pushed a few small additions. From my POV, this is pretty much ready.

There are two things which I think would improve the article, but I need your help:

  • Explain how extractors can use extraction parameters passed to a calling command (I assume an extractor can have some kind of input arguments, but I wasn't able to quickly figure out their handling from reading the source for meta_extract)
  • Include a short description of how metalad_external_[dataset|file] extractors can be an alternative to writing custom extractors if an external process for generating metadata already exists (as suggested above by @jsheunis; I also think it's an important feature to address but I haven't used it myself).

I placed these two tings in a todo box at the bottom of the page. Currently don't feel competent to answer these questions, so if you don't want to add them now but like the overall content of this page, I would propose to merge the PR as is, leaving the todo box in.

@mslw mslw requested a review from christian-monch January 16, 2023 17:53
Copy link
Member

@jsheunis jsheunis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mslw this a great write-up, thanks for doing it! I have nothing to add.

@jsheunis
Copy link
Member

I agree, having the TODOs listed is a good first step, they can be added after this is merged.

@adswa
Copy link
Member

adswa commented Feb 13, 2023

I have linked this PR in an upcoming metalad section in the handbook 🚀

Copy link
Collaborator

@christian-monch christian-monch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great! Thanks a lot for the work.

Sorry for taking so long to review it.

docs/source/user_guide/writing-extractors.rst Outdated Show resolved Hide resolved
docs/source/user_guide/writing-extractors.rst Outdated Show resolved Hide resolved
docs/source/user_guide/writing-extractors.rst Outdated Show resolved Hide resolved
docs/source/user_guide/writing-extractors.rst Show resolved Hide resolved
docs/source/user_guide/writing-extractors.rst Outdated Show resolved Hide resolved
docs/source/user_guide/writing-extractors.rst Outdated Show resolved Hide resolved
Adds more detailed explanations

Co-authored-by: Christian Mönch <christian.moench@web.de>
@mslw
Copy link
Contributor Author

mslw commented Feb 13, 2023

Thanks @christian-monch - I approved all your suggestions and left two minor formatting ones for the changed fragments.

@christian-monch
Copy link
Collaborator

  • Explain how extractors can use extraction parameters passed to a calling command (I assume an extractor can have some kind of input arguments, but I wasn't able to quickly figure out their handling from reading the source for meta_extract)

Done in 1825f8a

@christian-monch
Copy link
Collaborator

  • Include a short description of how metalad_external_[dataset|file] extractors can be an alternative to writing custom extractors if an external process for generating metadata already exists (as suggested above by @jsheunis; I also think it's an important feature to address but I haven't used it myself).

Done in c666447

@christian-monch christian-monch merged commit d83bb72 into datalad:master Feb 14, 2023
@christian-monch
Copy link
Collaborator

Thanks everyone for the great work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants