Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

consider git metadata on 'create' #210

Closed
refack opened this issue Apr 8, 2019 · 11 comments
Closed

consider git metadata on 'create' #210

refack opened this issue Apr 8, 2019 · 11 comments

Comments

@refack
Copy link

refack commented Apr 8, 2019

Refs: https://mobile.twitter.com/izs/status/1115332210381299712

When checking out a git repository in Windows, some FS metadata is lost due to mismatch between the WIN32 and POSIX APIs. Specifically the file mode (but also uid, gid, and some of the time stamps).

refael@refaelux MINGW64 /d/code/node (mode-test)
$ ls -la r.sh
-rw-r--r-- 1 refael 197609 0 Apr  8 15:57 r.sh
^^^^^^^^^^

But git still stores this info

refael@refaelux MINGW64 /d/code/node (mode-test)
$ git ls-files r.sh --stage
100755 e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 0       r.sh
^^^^^^

The code at https://github.com/npm/node-tar/blob/d7268f65123780031e58120f55f28b87ae0c9608/lib/write-entry.js#L331-L334
Could consider this info to "enrich" the created tarball.

@refack
Copy link
Author

refack commented Apr 8, 2019

P.S. the most interesting use case is losing the +x from script files when doing npm publish from a worktree checked outed on Windows.

@boneskull
Copy link

boneskull commented Apr 8, 2019

This bit Mocha yesterday when we published from Windows for the first time. Someone using npm install mocha --no-bin-links found that the executable files no longer had the executable flag. My workaround was to re-publish on macOS.

@isaacs
Copy link
Owner

isaacs commented Apr 8, 2019

If anyone wants to help move this along, the next step is to research and document how git stores this. Is there a specific file in the .git folder we could check for this info?

(Calling git cli in a child process is completely out of the question.)

@refack
Copy link
Author

refack commented Apr 8, 2019

Big 🔫s - https://github.com/isomorphic-git/isomorphic-git
heat-seeking-🚀 - https://github.com/sbp/gin/blob/master/gin

Seems like .git/index is a fairly simple binary stream. Pulling a snippet out of gin

    with open(.git/index, "rb") as f:

        def read(format):
            # "All binary numbers are in network byte order."
            # Hence "!" = network order, big endian
            format = "! " + format
            bytes = f.read(struct.calcsize(format))
            return struct.unpack(format, bytes)[0]

        index = collections.OrderedDict()

        # 4-byte signature, b"DIRC"
        index["signature"] = f.read(4).decode("ascii")
        check(index["signature"] == "DIRC", "Not a Git index file")

        # 4-byte version number
        index["version"] = read("I")
        check(index["version"] in {2, 3},
            "Unsupported version: %s" % index["version"])

        # 32-bit number of index entries, i.e. 4-byte
        index["entries"] = read("I")

        yield index

        for n in range(index["entries"]):
            entry = collections.OrderedDict()
            entry["entry"] = n + 1
            entry["ctime_seconds"] = read("I")
            entry["ctime_nanoseconds"] = read("I")
            entry["mtime_seconds"] = read("I")
            entry["mtime_nanoseconds"] = read("I")
            entry["dev"] = read("I")
            entry["ino"] = read("I")
            # 4-bit object type, 3-bit unused, 9-bit unix permission
            entry["mode"] = read("I")
            if pretty:
                entry["mode"] = "%06o" % entry["mode"]
            entry["uid"] = read("I")
            entry["gid"] = read("I")
            entry["size"] = read("I")
            entry["sha1"] = binascii.hexlify(f.read(20)).decode("ascii")
            entry["flags"] = read("H")
...

then just need to seek till the end of the entry.
The magic sauce is python's struct

@refack
Copy link
Author

refack commented Apr 8, 2019

read("I") -> buf.readInt32LE()
read("H") -> buf.readInt16LE()

@isaacs
Copy link
Owner

isaacs commented Oct 4, 2019

Working JS hacky example, could definitely do with a cleanup, but this seems like it works: https://gist.github.com/isaacs/55ad9e3338912c008804f69dbe855ed1

@boneskull
Copy link

@isaacs I know you won't be interested in pulling it in to this project, but Kaitai provides a declarative format for parsing binary blobs which can then generate code. I've used it before, and it seems suitable for something like this (in fact, a definition file might already exist for the git index format). the docs are a bit incomplete, but still might be fun to play with if you have nothing better to do.

advice, if you do look at it: stay away from the js implementation of the tool; it's incomplete, just use the reference (scala?) implementation

@boneskull
Copy link

imo the project would be better if it just output some sort of object code format that could then be piped into custom code generators, b/c the generated JS looks like what you'd get if you asked an career java dev to sit down and write JS

@boneskull
Copy link

here's the description file I created to parse multimeter output over serial

@isaacs
Copy link
Owner

isaacs commented Oct 5, 2019

@boneskull yeah... I'm probably not gonna use that.

Re: #210 (comment), I think npm probably ought to chmod the bins on install (or better yet, just update their mode at unpack time before they're written to disk), even if --no-bin-links is set. That might be hairy to get into v6, but I can make that change in Pacote v10 for npm 7 pretty easily.

isaacs added a commit that referenced this issue Oct 5, 2019
Fix #235

The `mode` value in `portable` mode will be set to a reasonable default
across Unix systems.

First, the user must always have read and write permissions.  (Ie, a
mode of 0o044 becomes 0o644.)

Next, the mode is masked against 0o022.  (Ie, a file of 0o777 becomes
0o755.  0o666 becomes 0o644.)

In practice, this means that all entries will usually have a mode of
either 0644 or 0o755, depending on whether the executable bit was set.

This avoids a situation where an archive created on a system with a
umask of 0o2 creates an archive with files having modes 0o775 and 0o664,
whereas an archive created on a system with a umask of 0o22 creates an
archive with files having modes 0o755 and 0o644, for the same content.
(Especially, for a check-out of the same git repository.)  Without this
consistency, it's impossible to honor the contract that the same content
will produce the same archive, which is necessary for npm to create
reprodicible build artifacts that match both in CI and development.

This unfortunately still does not address windows, where the executable
bit is never set on files.  (This bit the mocha project when a publish
from a Windows machine did not archive binaries with executable flags
set: #210 (comment))
isaacs added a commit to npm/pacote that referenced this issue Oct 6, 2019
This addresses the problem brought up by @boneskull on
isaacs/node-tar#210 (comment)

If a package is created on a Windows system, the file will not have the
executable bit set, as Windows determines executable status by the
filename extension.

Thus, installing with '--no-bin-links' produces a package that can't be
used as expected.

This change does the following:

- While extracting, if the manifest has been loaded, then the mode of
  any bin targets is made executable.
- If the manifest is not loaded, then AFTER extraction, the mode of all
  bin targets is set appropriately.  (Only relevant for the FileFetcher
  class, as the others will all have access to the manifest earlier in
  the process.)
- When packing, all bin targets are given an executable mode in the
  archive.

Thus, newer npm will properly handle archives created without proper
modes, and will always produce proper modes for the benefit of
other/older clients regardless of what fs.stat says.

Strict in what we publish, liberal in what we install.
isaacs added a commit to npm/pacote that referenced this issue Oct 6, 2019
This addresses the problem brought up by @boneskull on
isaacs/node-tar#210 (comment)

If a package is created on a Windows system, the file will not have the
executable bit set, as Windows determines executable status by the
filename extension.

Thus, installing with '--no-bin-links' produces a package that can't be
used as expected.

This change does the following:

- While extracting, if the manifest has been loaded, then the mode of
  any bin targets is made executable.
- If the manifest is not loaded, then AFTER extraction, the mode of all
  bin targets is set appropriately.  (Only relevant for the FileFetcher
  class, as the others will all have access to the manifest earlier in
  the process.)
- When packing, all bin targets are given an executable mode in the
  archive.

Thus, newer npm will properly handle archives created without proper
modes, and will always produce proper modes for the benefit of
other/older clients regardless of what fs.stat says.

Strict in what we publish, liberal in what we install.
isaacs added a commit to npm/pacote that referenced this issue Oct 6, 2019
This addresses the problem brought up by @boneskull on
isaacs/node-tar#210 (comment)

If a package is created on a Windows system, the file will not have the
executable bit set, as Windows determines executable status by the
filename extension.

Thus, installing with '--no-bin-links' produces a package that can't be
used as expected.

This change does the following:

- While extracting, if the manifest has been loaded, then the mode of
  any bin targets is made executable.
- If the manifest is not loaded, then AFTER extraction, the mode of all
  bin targets is set appropriately.  (Only relevant for the FileFetcher
  class, as the others will all have access to the manifest earlier in
  the process.)
- When packing, all bin targets are given an executable mode in the
  archive.

Thus, newer npm will properly handle archives created without proper
modes, and will always produce proper modes for the benefit of
other/older clients regardless of what fs.stat says.

Strict in what we publish, liberal in what we install.
@isaacs
Copy link
Owner

isaacs commented Oct 6, 2019

Ok, so, npm v7 will make the Mocha-published-from-windows-installed-without-binlinks a total non-issue.

Having looked into this git stuff, I'm kind of uncomfortable with the added complexity and fs ops required to treat the git dir cache as a source of truth. It was a fun little hack to learn how to parse it, but it feels like a wart in node-tar.

  • git-archive already exists if you want to put git info into a tarball.
  • the facts "on the ground" of the file system might not match what's in the git index and that may be what you want a lot of the time!
  • the solution to the original npm footgun is now solved in an npm library (albeit behind a breaking change, but this would've likely been, anyway).
  • .git folders can appear anywhere within the tree, the path parsing is relative to the .git folder rather than the root of the archive, etc, etc. It's all solveable, but it adds a lot of stuff.

So I'm gonna close this issue for now.

If anyone is really desperate for this kind of functionality (informing a tarball's header info based on a git index), you can get at the entry stats on the way into the archive by adding a filter option to tar.c. (That's how pacote is setting the executable bit on bins now.)

@isaacs isaacs closed this as completed Oct 6, 2019
isaacs added a commit to npm/pacote that referenced this issue Oct 6, 2019
This addresses the problem brought up by @boneskull on
isaacs/node-tar#210 (comment)

If a package is created on a Windows system, the file will not have the
executable bit set, as Windows determines executable status by the
filename extension.

Thus, installing with '--no-bin-links' produces a package that can't be
used as expected.

This change does the following:

- While extracting, if the manifest has been loaded, then the mode of
  any bin targets is made executable.
- If the manifest is not loaded, then AFTER extraction, the mode of all
  bin targets is set appropriately.  (Only relevant for the FileFetcher
  class, as the others will all have access to the manifest earlier in
  the process.)
- When packing, all bin targets are given an executable mode in the
  archive.

Thus, newer npm will properly handle archives created without proper
modes, and will always produce proper modes for the benefit of
other/older clients regardless of what fs.stat says.

Strict in what we publish, liberal in what we install.
isaacs added a commit to npm/pacote that referenced this issue Oct 29, 2019
This addresses the problem brought up by @boneskull on
isaacs/node-tar#210 (comment)

If a package is created on a Windows system, the file will not have the
executable bit set, as Windows determines executable status by the
filename extension.

Thus, installing with '--no-bin-links' produces a package that can't be
used as expected.

This change does the following:

- While extracting, if the manifest has been loaded, then the mode of
  any bin targets is made executable.
- If the manifest is not loaded, then AFTER extraction, the mode of all
  bin targets is set appropriately.  (Only relevant for the FileFetcher
  class, as the others will all have access to the manifest earlier in
  the process.)
- When packing, all bin targets are given an executable mode in the
  archive.

Thus, newer npm will properly handle archives created without proper
modes, and will always produce proper modes for the benefit of
other/older clients regardless of what fs.stat says.

Strict in what we publish, liberal in what we install.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants