Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Template for packaging spinalcordtoolbox datasets in pip. #1

Open
wants to merge 48 commits into
base: trunk
Choose a base branch
from

Conversation

kousu
Copy link
Contributor

@kousu kousu commented Apr 7, 2021

No description provided.

@kousu
Copy link
Contributor Author

kousu commented Apr 7, 2021

@joshuacwnewton

Copy link
Member

@joshuacwnewton joshuacwnewton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if we should address all of the existing TODOs/comments before making any new repos with this. 🤔

.github/workflows/build.yml Outdated Show resolved Hide resolved
.github/workflows/release.yml Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
setup.py Show resolved Hide resolved
setup.py Outdated Show resolved Hide resolved
setup.py Show resolved Hide resolved
setup.py Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
@taowa
Copy link

taowa commented Jul 30, 2021

yay!

@taowa
Copy link

taowa commented Jul 30, 2021

So that last commit is definitely kind of sinful. But I think that we should be doing /something/ to validate changes to the template, lest we change it in ways that break other things.

@taowa
Copy link

taowa commented Jul 30, 2021 via email

README.md Outdated Show resolved Hide resolved
@joshuacwnewton

This comment was marked as resolved.

Answer to TODO: No, steps cannot be parallelized.

However, *jobs* can run in parallel. But, to do so would require uploading
 files as artifacts, then spinning up entirely new VM instances, then downloading the artifacts.

 And, at that point, the setup of these VMs would probably take longer
 and use more resources than just running the steps sequentially.
The existing values were copied over from ivadomed. Here, in SCT,
I've added a temporary new token (until we can sort out NeuroPoly's
PyPI accounts).
I think this is meant to specify the target branch (i.e. `trunk`,
not the branch used for the PR.)
Comment on lines 30 to 38
- name: Build
run: |
python -m build --wheel --sdist
- name: Publish to Github
uses: softprops/action-gh-release@v1
with:
files: 'dist/*'
fail_on_unmatched_files: true
tag_name: 2.0.3 # ${{ github.event.inputs.version }} # in the workflow_dispatch case, make a new tag from the given input; in the published release case, this will be empty and will fall back to updating that release.
Copy link
Member

@joshuacwnewton joshuacwnewton Jun 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, in our setup.py we've set the option use_scm_version=True, which will use the most recent git tag as the package version number.

There's a subtle bug here, though: For the workflow_dispatch case, we create a tag in the Publish step. But, the publish step happens after the build step, so the chosen tag won't be present for python -m build to pick up.

I think this means we have to try creating a tag prior to the build step?

Copy link
Member

@joshuacwnewton joshuacwnewton Jun 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Specifically, this manifests in this PR as:

ERROR    HTTPError: 400 Bad Request from https://test.pypi.org/legacy/
         '0.1.dev1+g169fe48.d20220603' is an invalid value for Version. Error:  
         Can't use PEP 440 local versions. See                                  
        [ https://packaging.python.org/specifications/core](https://packaging.python.org/specifications/core-metadata)-metadata for m
         information.

The package is being built with 0.1.dev1, when we want it to be built using the dummy example tag name I chose (2.0.3).

Copy link
Member

@joshuacwnewton joshuacwnewton Jun 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh! Fascinating! use_scm_version=True does not do what I thought it did (CI run):

ERROR    HTTPError: 400 Bad Request from https://test.pypi.org/legacy/
         '2.0.4.dev0+g5929d76.d20220603' is an invalid value for Version. Error:
         Can't use PEP 440 local versions. See                                  
        [ https://packaging.python.org/specifications/core](https://packaging.python.org/specifications/core-metadata)-metadata for m
         information.

Setting the tag 2.0.3 prior to release will actually auto-generate a package with version number 2.0.4.dev0!

My best guess is that use_scm_version=True is intended for the workflow described in the README.md of the pypi-publish action:

For example, you could implement a parallel workflow that pushes every commit to TestPyPI or your own index server, like devpi. For this, you'd need to (1) specify a custom repository_url value and (2) generate a unique version number for each upload so that they'd not create a conflict. The latter is possible if you use setuptools_scm package but you could also invent your own solution based on the distance to the latest tagged commit.

In other words, setuptools_scm is meant for rapidly generating unique version numbers during development (e.g. dev0 -> dev1 -> dev2), rather than generating the version number for stable releases.

This means that we should probably stop using setuptools_scm for building stable releases! (Though, we could optionally continue to use it to automatically build test/dev releases, but that's not really what workflow_dispatch is for.)

Copy link
Member

@joshuacwnewton joshuacwnewton Jun 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right! So, performing a find and replace on the version="dev' string (7394563) gets us our desired result:

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That doesn't make sense to me. I've used it lots of times and its always behaved. Take a look at https://github.com/ufoym/imbalanced-dataset-sampler/actions/runs/2371746877 to see for yourself.

If you want to do a test deploy you should use just tag it 'v1.0.3rc0' or 'v1.0.3rc1' -- that'll get tagged as a test build too -- particularly we should include this line

https://github.com/ufoym/imbalanced-dataset-sampler/actions/runs/2371746877

I'm on my phone atm. I'll need some time to look deeper about why setuptools-scm isn't working. But I really think we should use it, it really improves the workflow.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We just need to ask people need to manually fill in the name.

Maybe we can write some awesome actions script that edits setup.py to sync it with the repo's name on every commit. That would be cool, and in line with using git as the single source of truth, but maybe a bit Extra, y'know? I think doing step manually is good enough because how many times are we really going to be making a new dataset (vs making a new release of that dataset)?

Copy link
Member

@joshuacwnewton joshuacwnewton Jun 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh of course. I missed the fact that editing setup.py means that the git repo is no longer in a clean state.

Thanks for catching that. 🤦

Copy link
Member

@joshuacwnewton joshuacwnewton Jun 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We just need to ask people need to manually fill in the name.

We already do! The perl edit step is only relevant for the data-template repo, and I only added it because otherwise the build on data-template (i.e. this PR) would fail due to (I think) an invalid repo URL in the metadata, hehe.

We can just straight-up remove the perl step, and allow the builds to fail on data-template? Since the problem goes away the moment people actually use the template then follow the README.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From @kousu:

Ah good. I was pretty sure we already did...

Copy link
Member

@joshuacwnewton joshuacwnewton Jun 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried out the template using PAM50, and all the steps went well: https://github.com/joshuacwnewton/data-PAM50/releases/tag/2022.6.6

I then tried installing the wheel and it behaved as expected.

So, as soon as I add back setuptools_scm, this (maybe possibly) is almost ready for merging?

* Stop repeating version number
* Use || to make fallback case more explicit
It's more straightforward to just have a single "correct" way of doing things, and the release UI has a lot of other nice usage benefits, too (e.g. checking if you're duplicating a tag name).
README.md Outdated Show resolved Hide resolved
@spinalcordtoolbox spinalcordtoolbox deleted a comment from kousu Jun 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants