Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DataLad as DVC for ML analysis #569

Merged
merged 24 commits into from
Sep 18, 2020
Merged

DataLad as DVC for ML analysis #569

merged 24 commits into from
Sep 18, 2020

Conversation

adswa
Copy link
Contributor

@adswa adswa commented Sep 4, 2020

This is a section that takes a quite elaborate DVC tutorial and recreates it with DataLad.
I've split the original tutorial into three "parts" (Versioning data, Sharing data, Analysis), and in each part, I execute the DVC workflow and a corresponding DataLad workflow.
Instead of adjusting the workflow into a YODA-principle or similar DataLad-centric workflow, I'm recreating the workflow as closely as I can with DataLad as for a tool comparison in the "domain" of DVC.

Both DVC and DataLad pipelines are executed and included with output. Apart from a slightly different order and skipping the tutorial section in which only scripts are executed without pipelining ("Version Datasets and Models"), I've recreated the tutorial very closely, and didn't make any code changes.

I would appreciate feedback on the following:

  • I don't want to throw shade at DVC, I just want to provide a comparison. I think it helps that it is a tutorial by them, but feedback on whether I am unfair or appear (overly) biased in the text around the code execution or in other ways would be appreciated.
  • Are there more elegant ways to recreate the DVC workflow? Something that is a bit ugly is the comparison of metrics across analysis since we have no ML specific reporting. This tutorial is easy, it has only two models to compare, and I can make a diff between the two refs. Is there a way to compare the state of a file (metrics/accuracy.json) across more than two refs?

CC @mih @yarikoptic

This adds a separate clean rule for non-narrative sections.
At the moment, it only contains DVC. This section does not have any
relation to the DataLad-101 narrative, and hence doesn't need
to be redone with a complete rebuild of the book.
Other such unrelated sections can be added to this rule (which could also
be renamed), but so far, this section is the only one in the book
(apart from usecases) that does not depend on the narrative.

Rebuilding DVC is also tedious, because this section also pushes back to
GitHub-based repos
@adswa
Copy link
Contributor Author

adswa commented Sep 4, 2020

Also, ideas on where to put such a comparison would be appreciated :) currently I just put it into the "clutter" chapter in "Beyond basics"

@adswa
Copy link
Contributor Author

adswa commented Sep 4, 2020

Preview link to the section: https://datalad-handbook--569.org.readthedocs.build/en/569/beyond_basics/101-168-dvc.html

Copy link
Contributor

@yarikoptic yarikoptic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some review on initial portion... yet to digest in full -- but looks great overall.

docs/beyond_basics/101-168-dvc.rst Outdated Show resolved Hide resolved
docs/beyond_basics/101-168-dvc.rst Outdated Show resolved Hide resolved
@adswa adswa changed the title DVC versus DataLad ML analysis comparison DataLad as DVC for ML analysis Sep 5, 2020
@adswa
Copy link
Contributor Author

adswa commented Sep 9, 2020

Took about a week longer than I wanted it to take, but this section is ready now ;-)

Copy link
Contributor

@yarikoptic yarikoptic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor recommended suggestions in the opening... yet to read more. So far so good ;)

docs/beyond_basics/101-168-dvc.rst Outdated Show resolved Hide resolved
docs/beyond_basics/101-168-dvc.rst Outdated Show resolved Hide resolved
docs/beyond_basics/101-168-dvc.rst Outdated Show resolved Hide resolved
docs/beyond_basics/101-168-dvc.rst Show resolved Hide resolved
docs/beyond_basics/101-168-dvc.rst Show resolved Hide resolved
docs/beyond_basics/101-168-dvc.rst Outdated Show resolved Hide resolved
docs/beyond_basics/101-168-dvc.rst Outdated Show resolved Hide resolved
docs/beyond_basics/101-168-dvc.rst Outdated Show resolved Hide resolved
docs/beyond_basics/101-168-dvc.rst Outdated Show resolved Hide resolved
docs/beyond_basics/101-168-dvc.rst Outdated Show resolved Hide resolved
Copy link
Contributor

@yarikoptic yarikoptic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've reached the end

docs/beyond_basics/101-168-dvc.rst Outdated Show resolved Hide resolved
docs/beyond_basics/101-168-dvc.rst Outdated Show resolved Hide resolved
docs/beyond_basics/101-168-dvc.rst Outdated Show resolved Hide resolved
docs/beyond_basics/101-168-dvc.rst Outdated Show resolved Hide resolved
docs/beyond_basics/101-168-dvc.rst Outdated Show resolved Hide resolved
docs/beyond_basics/101-168-dvc.rst Outdated Show resolved Hide resolved
docs/beyond_basics/101-168-dvc.rst Outdated Show resolved Hide resolved
docs/beyond_basics/101-168-dvc.rst Outdated Show resolved Hide resolved
docs/beyond_basics/101-168-dvc.rst Outdated Show resolved Hide resolved
docs/beyond_basics/101-168-dvc.rst Outdated Show resolved Hide resolved
@adswa
Copy link
Contributor Author

adswa commented Sep 9, 2020

Thanks much @yarikoptic. I'll add the remainder of your comments locally, and then I'll squash the past 20 commits.

DVC: Typo

Co-authored-by: Yaroslav Halchenko <debian@onerussian.com>

DVC: Typo

Co-authored-by: Yaroslav Halchenko <debian@onerussian.com>

DVC: Typo

Co-authored-by: Yaroslav Halchenko <debian@onerussian.com>

DVC: Typo

Co-authored-by: Yaroslav Halchenko <debian@onerussian.com>

DVC: tweak language

Co-authored-by: Yaroslav Halchenko <debian@onerussian.com>

DVC: tweak language

Co-authored-by: Yaroslav Halchenko <debian@onerussian.com>

DVC: tweak language

Co-authored-by: Yaroslav Halchenko <debian@onerussian.com>

DVC: tweak language

Co-authored-by: Yaroslav Halchenko <debian@onerussian.com>

DVC: Typo

Co-authored-by: Yaroslav Halchenko <debian@onerussian.com>

DVC: Typo

Co-authored-by: Yaroslav Halchenko <debian@onerussian.com>

DVC: tweak language

Co-authored-by: Yaroslav Halchenko <debian@onerussian.com>

DVC: tweak language

Co-authored-by: Yaroslav Halchenko <debian@onerussian.com>

DVC: tweak language

Co-authored-by: Yaroslav Halchenko <debian@onerussian.com>

DVC: tweak language

Co-authored-by: Yaroslav Halchenko <debian@onerussian.com>

DVC: tweak language

Co-authored-by: Yaroslav Halchenko <debian@onerussian.com>

DVC: tweak language

Co-authored-by: Yaroslav Halchenko <debian@onerussian.com>

DVC: tweak language

Co-authored-by: Yaroslav Halchenko <debian@onerussian.com>

DVC: tweak language

Co-authored-by: Yaroslav Halchenko <debian@onerussian.com>

DVC: Typo

Co-authored-by: Yaroslav Halchenko <debian@onerussian.com>

DVC: Typo

Co-authored-by: Yaroslav Halchenko <debian@onerussian.com>

DVC: tweak language

Co-authored-by: Yaroslav Halchenko <debian@onerussian.com>

DVC: tweak language

Co-authored-by: Yaroslav Halchenko <debian@onerussian.com>

DVC: tweak language

Co-authored-by: Yaroslav Halchenko <debian@onerussian.com>

DVC: tweak language

Co-authored-by: Yaroslav Halchenko <debian@onerussian.com>

DVC: tweak language

Co-authored-by: Yaroslav Halchenko <debian@onerussian.com>

DVC: tweak language

Co-authored-by: Yaroslav Halchenko <debian@onerussian.com>

DVC: Fix version specification

Co-authored-by: Yaroslav Halchenko <debian@onerussian.com>

DVC: tweak language

Co-authored-by: Yaroslav Halchenko <debian@onerussian.com>

DVC: tweak language

Co-authored-by: Yaroslav Halchenko <debian@onerussian.com>

DVC: tweak language

Co-authored-by: Yaroslav Halchenko <debian@onerussian.com>

DVC: Typo

Co-authored-by: Yaroslav Halchenko <debian@onerussian.com>

DVC: Typo

Co-authored-by: Yaroslav Halchenko <debian@onerussian.com>

DVC: Typo

Co-authored-by: Yaroslav Halchenko <debian@onerussian.com>

DVC: Publish -> push

Co-authored-by: Yaroslav Halchenko <debian@onerussian.com>

DVC: Typo

Co-authored-by: Yaroslav Halchenko <debian@onerussian.com>

DVC: Typo

Co-authored-by: Yaroslav Halchenko <debian@onerussian.com>

DVC: link more footnotes
@adswa
Copy link
Contributor Author

adswa commented Sep 18, 2020

@yarikoptic, it doesn't seem as if DVC people are giving feedback on this. Do you think this is ready to be merged? My general plan would be to use some of the content in this tutorial also for a stand-alone, DataLad-centric usecase (for the "DataLad for machine-learning" workshop), but if this is ready from your POV, I would be happy to reduce the number of open PRs by one. :)

@yarikoptic
Copy link
Contributor

yes, let's merge!!! Thank you for this section - I think it is great and will be a valuable addition to the handbook.

@adswa adswa merged commit 85135bb into master Sep 18, 2020
@adswa adswa deleted the dvc branch September 18, 2020 16:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants