Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add recurse-submodules option #36

Merged
merged 5 commits into from
Apr 20, 2024
Merged

Conversation

gmgunter
Copy link
Contributor

Closes #34

Adds a command line option --recurse-submodules that passes the corresponding flag to the git ls-files command. The option is disabled by default.

I found it surprisingly tricky to find an example repo to use in a unit test that includes submodules and has no inconsistencies between the sdist contents vs files tracked by Git. (I gave it a try with nanobind and a few of the scikit-hep repos, among others.) Eventually, I gave up and used my own repo, but am very happy to replace this with a more prominent exemplar, if we can find one.

@henryiii
Copy link
Owner

Boost-histogram should do this. But I’m fine with any test. Is there a reason we can’t always pass the flag? (Maybe if git is new enough)?

@gmgunter
Copy link
Contributor Author

Is there a reason we can’t always pass the flag? (Maybe if git is new enough)?

I made it an option in order to avoid breaking compatibility with older Git versions. But maybe it's not worth complicating the interface?

Checking git version is an option, but might be too janky/complicated if we need to worry about pre-releases or local versions. I could add something simple like this if you want:

import re
import subprocess
from dataclasses import dataclass
from pathlib import Path

@dataclass(order=True)
class Version:
    major: int
    minor: int
    patch: int

def git_version() -> Version:
    res = subprocess.run(
        ["git", "version"],
        text=True,
        capture_output=True,
        check=True,
    )
    regex = re.compile(r"^git version (\d+)\.(\d+)\.(\d+)")
    if (match := regex.match(res.stdout)) is None:
        raise RuntimeError(...)
    return Version(*map(int, match.groups()))

def git_files(source_dir: Path) -> frozenset[str]:
    cmd = ["git", "ls-files", "--cached"]
    if git_version() >= Version(2, 11, 0):
        cmd.append("--recurse-submodules")
    ...

Alternatively, we could just drop support for older Git versions (Git has supported this option since Q4 2016).

What's your preference?

Boost-histogram should do this

Here's the (abridged) result I'm getting with the latest boost-histogram tag (v1.4.0).

$ git clone https://github.com/scikit-hep/boost-histogram.git --branch v1.4.0 --recurse-submodules
$ check-sdist --source-dir boost-histogram --recurse-submodules
SDist does not match git

SDist only:


Git only:
  .gitattributes
  .gitignore
  .gitmodules
  docs/_src/images/.gitignore
  extern/assert/.drone.jsonnet
  extern/assert/.drone/drone.bat
  extern/assert/.drone/drone.sh
  extern/assert/.gitattributes
  extern/assert/.travis.yml
  extern/assert/CMakeLists.txt
  extern/assert/README.md
  extern/assert/appveyor.yml
  extern/assert/doc/.gitignore
  extern/assert/doc/Jamfile
[...]

Looking at MANIFEST.in, it looks like only a subset of the contents from each submodule are included in the sdist.

@henryiii
Copy link
Owner

Ah, sorry, forgot this was waiting on me. I'm fine to drop support for older versions, I think, 2016 sounds fine. I'd be very much fine with an opt-out flag instead.

Yes, boost-histogram only includes the files that are related to the parts it needs.

@gmgunter
Copy link
Contributor Author

No worries. Changing the behavior from opt-in to opt-out sounds good to me.

In the latest update, we recursively check submodule contents by default and the --recurse-submodules command-line flag was replaced with a --no-recurse-submodules flag that disables this behavior.

@henryiii
Copy link
Owner

henryiii commented Apr 20, 2024

What about making it configurable in pyproject.toml instead of or in addition to a command line option? You'd know if you need it for your project beforehand, I'd think. That might make the default-on a bit nicer, if a project can opt out always.

(Sorry about flopping around, I volunteer to add it if you need me to, and it can be a follow up)

@gmgunter
Copy link
Contributor Author

Ah yeah, good idea. Opting out via pyproject.toml feels a lot nicer.

Since the other config options in pyproject.toml have no command-line equivalent, I decided to remove the CLI flag now (but can add it back if you prefer).

@henryiii henryiii merged commit 10bcc37 into henryiii:main Apr 20, 2024
10 checks passed
@henryiii
Copy link
Owner

henryiii commented Apr 20, 2024

Thanks!!! Ping me if I forget to release this by next week!

@gmgunter gmgunter deleted the git-submodules branch April 20, 2024 07:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support for Git submodules
2 participants