Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Template for packaging spinalcordtoolbox datasets in pip. #1

Open
wants to merge 48 commits into
base: trunk
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
48 commits
Select commit Hold shift + click to select a range
ec0f257
Template for packaging spinalcordtoolbox datasets in pip.
kousu Apr 6, 2021
cbbac73
Update README.md
kousu Jul 13, 2021
dbcd1e3
Add files via upload
kousu Jul 13, 2021
7173954
Yep
Jul 13, 2021
24ba7f2
Update README.md
kousu Jul 13, 2021
68fb7ff
Update README.md
kousu Jul 13, 2021
305fb5e
Update README.md
kousu Jul 13, 2021
2b0291a
Update README.md
kousu Jul 13, 2021
f4d8f1e
Update setup.py
kousu Jul 13, 2021
b828e86
`setup.py`: Add link to front page for 'documentation'
joshuacwnewton Jul 13, 2021
2521744
Update setup.py
kousu Jul 13, 2021
f4beed7
Create LICENSE.txt
kousu Jul 13, 2021
97683bc
Update setup.py
kousu Jul 13, 2021
cc38cc6
add a missing comma to setup.py
taowa Jul 19, 2021
f45045b
replace the name string during builds for testing
taowa Jul 30, 2021
8dde7fd
replace google group email with a forwarder @neuropoly.org
taowa Sep 1, 2021
4410d0e
`README.md`: Fix numbering
joshuacwnewton May 24, 2022
4fe54cc
`README.md`: Improve readability of CLI steps
joshuacwnewton May 24, 2022
7bbc02d
`README.md`: Split "Remove this section" step into its own last step
joshuacwnewton May 24, 2022
3b2c4ef
`README.md`: Add clearer git commit messages
joshuacwnewton May 24, 2022
b9e31cb
`README.md`: Add clarifying indentation
joshuacwnewton May 24, 2022
0f1e9fa
`README.md`: Use date-based release versioning
joshuacwnewton May 24, 2022
c521774
`README.md`: Fixup `data-${dataset_name}` instances
joshuacwnewton May 24, 2022
19f2765
`README.md`: Use `perl -pi -e` in README.md, too
joshuacwnewton May 24, 2022
e74cf6a
`README.md`: Swap steps 1 and 2 in metadata
joshuacwnewton May 24, 2022
4a09c4d
`README.md`: Fix typos with `$dataset-name`/`$dataset_name`
joshuacwnewton May 24, 2022
2afa7c3
`README.md`: Improve readability of the `Troubleshooting` section
joshuacwnewton May 24, 2022
eaea4c2
`README.md`: Add missing `.git`
joshuacwnewton May 24, 2022
525ea88
`release.yml`: Remove "parallelized" TODO
joshuacwnewton Jun 3, 2022
fe246fc
`release.yml`: Update PYPI_API_TOKEN
joshuacwnewton Jun 3, 2022
65021c2
`release.yml`: Test publishing on PRs for `ng/template`
joshuacwnewton Jun 3, 2022
dfb1e5c
`release.yml`: Remove `branches: ng/template` from test
joshuacwnewton Jun 3, 2022
0053d78
`release.yml`: Add missing perl -pi -e
joshuacwnewton Jun 3, 2022
73f9b33
`release.yml`: Try tagging release prior to build
joshuacwnewton Jun 3, 2022
9d6f082
Revert "`release.yml`: Try tagging release prior to build"
joshuacwnewton Jun 3, 2022
7394563
`release.yml`: Explicitly set `version=` instead of using setuptools_scm
joshuacwnewton Jun 3, 2022
804b510
`release.yml`: Make `VERSION_NUMBER` more explicit
joshuacwnewton Jun 3, 2022
7ed3fe3
`release.yml`: Drop `workflow_dispatch`
joshuacwnewton Jun 3, 2022
3f67509
`release.yml`: Drop test build workflow, and use only 1 workflow
joshuacwnewton Jun 3, 2022
2fb6da0
README.md: Fix typo (missing `)
joshuacwnewton Jun 4, 2022
12a97d2
`release.yml`: Test version tag `r20220603`
joshuacwnewton Jun 4, 2022
ffeb997
`release.yml`: Test version tag `2022.06.03`
joshuacwnewton Jun 4, 2022
07ca856
`release.yml`: Use quotes for version string
joshuacwnewton Jun 6, 2022
eb91ae0
`release.yml`: Temporarily stop using `||`
joshuacwnewton Jun 6, 2022
a383f68
`README.md`: Fix typo (missing `)
joshuacwnewton Jun 6, 2022
1322a88
`release.yml`: Revert back to using release tag
joshuacwnewton Jun 6, 2022
6aae1ca
`release.yml`: Re-enable 'publish' if statements
joshuacwnewton Jun 6, 2022
3746bea
`README.md`: Clarify date-based version numbering
joshuacwnewton Jun 6, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 24 additions & 0 deletions .github/workflows/build.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
name: Test the Build

on:
# test on PRs and double-test on merges to master, or manually.
pull_request:
push:
branches:
- master
workflow_dispatch:

jobs:
build:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- name: Set up Python
uses: actions/setup-python@v1
- name: Build tools
run: |
python -m pip install --upgrade pip
pip install build
- name: Build
run: |
python -m build --wheel --sdist
kousu marked this conversation as resolved.
Show resolved Hide resolved
44 changes: 44 additions & 0 deletions .github/workflows/release.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
name: Publish

on:
# publish from the Releases page:
release:
types: [published]
joshuacwnewton marked this conversation as resolved.
Show resolved Hide resolved
# publish from the Actions page:
workflow_dispatch:
inputs:
version:
description: 'Version (e.g. 2.0.3)'
kousu marked this conversation as resolved.
Show resolved Hide resolved
required: true

jobs:
publish:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- name: Set up Python
uses: actions/setup-python@v1
- name: Build tools
run: |
python -m pip install --upgrade pip
pip install build
- name: Build
run: |
python -m build --wheel --sdist
### TODO: can the uploads be parallelized?
kousu marked this conversation as resolved.
Show resolved Hide resolved
- name: Publish to Github
uses: softprops/action-gh-release@v1
with:
files: 'dist/*'
fail_on_unmatched_files: true
tag_name: ${{ github.event.inputs.version }} # in the workflow_dispatch case, make a new tag from the given input; in the published release case, this will be empty and will fall back to updating that release.
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
- name: Publish to PyPI
uses: pypa/gh-action-pypi-publish@release/v1
with:
user: __token__
#password: ${{ secrets.PYPI_PASSWORD }}
# DEBUG:
password: ${{ secrets.TEST_PYPI_API_TOKEN }}
repository_url: https://test.pypi.org/legacy/
kousu marked this conversation as resolved.
Show resolved Hide resolved
6 changes: 6 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
dist/
build/

*.whl
*.egg-info
__pycache__
41 changes: 41 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
# <dataset> template

Part of [`spinalcordtoolbox`](https://github.com/neuropoly/spinalcordtoolbox).
joshuacwnewton marked this conversation as resolved.
Show resolved Hide resolved


## Usage

1. Clone this repo.

```
git mv src/spinalcordtoolbox/data/dataset src/spinalcordtoolbox/data/${dataset-name}
cp ${data_files} src/spinalcordtoolbox/data/${dataset_name}
vi setup.py # to edit name= and url=
vi README.md # to edit the title and *remove* this usage section
git add .
git commit
git remote add origin git@github.com:neuropoly/spinalcordtoolbox-data-${dataset_name}
git push
```

2. Go to https://github.com/neuropoly/spinalcordtoolbox-data-${dataset_name}/releases

1. Click "Draft Release"
2. Fill in a version tag. We use [this versioning policy](TODO)
joshuacwnewton marked this conversation as resolved.
Show resolved Hide resolved
3. Click "Publish Release"
4. Wait a few minutes;
5. Monitor the progress at https://github.com/neuropoly/spinalcordtoolbox-data-${dataset_name}/actions/workflows/release.yml
6. The release should appear on https://github.com/neuropoly/spinalcordtoolbox-data-${dataset_name}/releases
kousu marked this conversation as resolved.
Show resolved Hide resolved
kousu marked this conversation as resolved.
Show resolved Hide resolved
with the .tar.gz (sdist) and .whl (wheel) formats attached momentarily.

### Troubleshooting

You can test the repo locally with

```
pip install build &&
python -m build --wheel --sdist &&
pip install dist/*.whl
```

This should give you enough clues hopefully to track down any problems.
kousu marked this conversation as resolved.
Show resolved Hide resolved
3 changes: 3 additions & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# this ensures builds are done reliably
[build-system]
requires = ["setuptools>=40.8.0", "setuptools_scm[toml]", "wheel"]
44 changes: 44 additions & 0 deletions setup.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
from setuptools import setup, find_packages, find_namespace_packages
kousu marked this conversation as resolved.
Show resolved Hide resolved
import pathlib

here = pathlib.Path(__file__).parent.resolve()

# workaround a bug introduced by pyproject.toml
# https://github.com/pypa/pip/issues/7953#issuecomment-645133255
import site, sys; site.ENABLE_USER_SITE = True
kousu marked this conversation as resolved.
Show resolved Hide resolved

setup(
name='spinalcordtoolbox-data-<dataset>',
description='Part of https://github.com/neuropoly/spinalcordtoolbox',
long_description=(here / 'README.md').read_text(encoding='utf-8'),
long_description_content_type='text/markdown',
author='Neuropoly',
author_email='neuropoly@googlegroups.com',
url='https://spinalcordtoolbox.com/',
project_urls={
'Source': 'https://github.com/sct-data/<dataset>',
#'Documentation': '',
joshuacwnewton marked this conversation as resolved.
Show resolved Hide resolved
},
#license='CC-BY-NC', ??
#license_files=[ ... ] # TODO?
joshuacwnewton marked this conversation as resolved.
Show resolved Hide resolved

packages=find_namespace_packages('src/'),
joshuacwnewton marked this conversation as resolved.
Show resolved Hide resolved
package_dir={"":"src/"},
joshuacwnewton marked this conversation as resolved.
Show resolved Hide resolved

# with setuptools_scm, means it includes non-python files if they're under git
include_package_data=True,

# with setuptools_scm, get the version out of the most recent git tag.
# the tags must be formatted as semver.
use_scm_version=True,
joshuacwnewton marked this conversation as resolved.
Show resolved Hide resolved

# pyproject.toml::build-system.requires is supposed to supersede this, but it's still very new so we duplicate it.
setup_requires=[
'setuptools',
'setuptools_scm[toml]',
'wheel',
],

zip_safe=False, # guarantees that importlib.resources.path() is safe
)

2 changes: 2 additions & 0 deletions src/spinalcordtoolbox/data/dataset/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
# empty __init__.py to enable importlib.resources
# see < TODO >
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which TODO is this referring to?

Copy link
Contributor Author

@kousu kousu Apr 7, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's another terrible python wart I ran into when setting this up that I don't know how or where to explain.

The basic issue is, sys.path behaves like $PATH or like overlayfs: it implies a virtual filesystem made of the union of all the folders listed in it. But python only pretends to support this and needs lots of workarounds for it. Recently, https://www.python.org/dev/peps/pep-0420/ dropped the necessity of __init__.py to create a package, in order to make it easier for this virtual union filesystem thing to work out; otherwise, every package that did what I did here, share a folder (spinalcordtoolbox/data), would be have to ship a __init__.py and would therefore be fighting with each other. So, A+ to that; but for some reason in the process they never applied the same thinking to non-python files. There's even this comment: https://github.com/python/cpython/blob/30e507dff465a31901d87df791a2bac40dc88530/Lib/importlib/resources.py#L243-L247

By definition, namespace packages cannot have resources

So what this means is this terrible confusing situation where:

  • if you pip install everything, everything ends up in the same physical directories and things make sense
    • and you could distribute an __init__.py everywhere like you used to before PEP420, and just be careful not to create any conflicts, and then everything would work fine
  • if you pip install -e one or more of the packages, then you're in this virtual situation where the filesystem you see via import and importlib is not the same as on disk but you can access python code but you can't access the others.
    • and if you try to distribute __init__.pys in this situation, you can't access files from the other apps
      • specifically, spinalcordtoolbox/__init__.py shadows the entire spinalcordtoolbox/data/ subdirectory, so none of these packages will work until we remove that, but only if you've pip install -e spinalcordtoolbox'd. It will work if you pip install spinalcordtoolbox.

BUT they literally just fixed this a month ago: python/cpython#24670. Praise the python maintainers. So maybe that link is what can go in the TODO.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This tiny TODO took me like 4 hours to figure out. It definitely is where the bulk of my time working on this went and I don't know how to explain what I was doing because it's such a corner case.

And it has implications for the rest of sct. It means we need to delete all the init.pys from it. I checked and mostly they're empty anyway, but

aren't totally.

But (until python3.10 I guess, when we finally get python/cpython#24670) we'll have to keep these empty __init__.pys in the data folders but not in the code folders. Seems pretty fragile.

Or it looks like https://github.com/python/importlib_resources/ is still being maintained as a backport; I thought it was just for python3.6 but maybe it even covers newer versions?

Copy link
Member

@joshuacwnewton joshuacwnewton Apr 7, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And it has implications for the rest of sct. It means we need to delete all the init.pys from it. I checked and mostly they're empty anyway, but

Is this true? PEP420 only applies to namespace packages.

Regular packages will continue to have an __init__.py and will reside in a single directory.

Namespace packages cannot contain an __init__.py.

Namespace packages are defined as packages that are split up into multiple distributions. I don't think this describes the subpackages inside spinalcordtoolbox, does it? (Since we distribute everything as a single package, we're more like the first directory tree from that link, rather than the second.)

So, I think we should be able to leave the __init__.py files alone.

Copy link
Contributor Author

@kousu kousu Apr 7, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I almost want to throw out importlib.resources entirely and just go back to using __file__. If we put:

# __init__.py

def contents():
    dir = os.path.dirname(__file__)
    return [os.path.join(dir, file) for file in os.listdir(dir)]

in every __init__.py we would have the same results for both

import spinalcordtoolbox.data.something

importlib.resources.contents(spinalcordtoolbox.data.something)
spinalcordtoolbox.data.something.contents()

. It would mean we wouldn't be able to use virtual namespaces, so we couldn't make a multi-part spinalcordtoolbox-data-something package like python3.10 apparently can, but why would we want to?

...maybe we would want to do that to deal with splitting up data to fit online? Like a multi-part .rar or a replacement for ghsplit?

Copy link
Member

@joshuacwnewton joshuacwnewton Apr 7, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

import spinalcordtoolbox.data.something

importlib.resources.contents(spinalcordtoolbox.data.something)
spinalcordtoolbox.data.something.contents()

This does feel a lot more readable. I was worried about importlib not being very self-evident. I also like how this can be handled on the template level.

Looking at the current usage, though (e.g. sct_deepseg with its use of folder) I feel like exposing the base directory might line up more with how it parses for files.

Copy link
Contributor Author

@kousu kousu Apr 7, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay actually I think this provides a solution:

So we can just:

install_requires=[
  ...
  importlib_resources>=4
  ...
],

and

import importlib_resources as resources
# from importlib import resources # uncomment when https://github.com/python/cpython/pull/24670 is available in python3.10

and delete all __init__.pys and tell people to stop making new ones.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still kind of like just using __file__. But resources is the Way Of The Future so maybe we should go with that.