Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make packages fetched by zig set the executable bit based on ELF header and shebang line #17463

Closed
andrewrk opened this issue Oct 9, 2023 · 16 comments · Fixed by #19554
Closed
Labels
bug Observed behavior contradicts documented or intended behavior contributor friendly This issue is limited in scope and/or knowledge of Zig internals. enhancement Solving this issue will likely involve adding new logic or components to the codebase. os-windows standard library This issue involves writing Zig code for the standard library. zig build system std.Build, the build runner, `zig build` subcommand, package management
Milestone

Comments

@andrewrk
Copy link
Member

andrewrk commented Oct 9, 2023

Extracted from #17392.

Depends on #17462.

This issue is to eliminate this TODO comment:

zig/src/Package/Fetch.zig

Lines 1058 to 1063 in f7bc55c

// TODO: we would like to set this to executable_bit_only, but two
// things need to happen before that:
// 1. the tar implementation needs to support it
// 2. the hashing algorithm here needs to support detecting the is_executable
// bit on Windows from the ACLs (see the isExecutable function).
.mode_mode = .ignore,

Currently, all the mode bits of all files in a package are ignored and a default mode is used instead. This is desired, except the executable bit should be set based on whether the file is an ELF file or has a shebang line, when unpacking on a posix operating system.

@andrewrk andrewrk added bug Observed behavior contradicts documented or intended behavior enhancement Solving this issue will likely involve adding new logic or components to the codebase. contributor friendly This issue is limited in scope and/or knowledge of Zig internals. standard library This issue involves writing Zig code for the standard library. os-windows zig build system std.Build, the build runner, `zig build` subcommand, package management labels Oct 9, 2023
@andrewrk andrewrk added this to the 0.13.0 milestone Oct 9, 2023
@andrewrk andrewrk mentioned this issue Oct 9, 2023
11 tasks
@ianprime0509
Copy link
Sponsor Contributor

For any potential contributors, the change to respect executable status when fetching Git packages can be made here:

.file => {
var file = try dir.createFile(entry.name, .{});
defer file.close();
try repository.odb.seekOid(entry.oid);
var file_object = try repository.odb.readObject();
if (file_object.type != .blob) return error.InvalidFile;
try file.writeAll(file_object.data);
try file.sync();
},

The flag entry.executable is already being parsed from the tree, but currently it isn't being used in this code for compatibility with tar fetching behavior.

@travisstaloch
Copy link
Sponsor Contributor

travisstaloch commented Oct 9, 2023

i attempted this in #15382

this is the relevant code from tar.zig#L1028 in that pr. i tried linking to it but the file was too big.

if (options.mode_mode == .executable_bit_only) {
    if (std.fs.has_executable_bit) {
        // TODO - not sure using file.mode() is correct but it seems to
        //        match gnu tar behavior on linux while using
        //        header.mode does not
        const mode = try file.mode();
        var modebits = std.StaticBitSet(32){ .mask = @intCast(mode) };
        // copy the user exe bit to the group and other exe bits
        // these bit indices count from the right:
        //   u   g   o
        //   rwx rwx rwx
        //   876_543_210
        // 0b000_000_000
        const has_owner_exe_bit = modebits.isSet(6);
        modebits.setValue(3, has_owner_exe_bit);
        modebits.setValue(0, has_owner_exe_bit);
        log.debug("mode old={o} new={o}", .{ mode, modebits.mask });
        try file.chmod(modebits.mask);
    }
}

@andrewrk
Copy link
Member Author

andrewrk commented Oct 9, 2023

Note that std.fs.has_executable_bit is false on Windows.

@ianic
Copy link
Sponsor Contributor

ianic commented Mar 4, 2024

Please help me understand why we need to read executable bit from the file system?

We have information whether the file is executable while unpacking archive (both tar and git) we can pass list of executables or list of all files with modes to the hasher. That way hasher doesn't need to walk (if it has list of all files) and extract attributes from disk. We can set executable on file systems which support that, and skip on others while having same hash on both.

@andrewrk
Copy link
Member Author

andrewrk commented Mar 5, 2024

Zig also needs the ability to compute the hash from a directory of files on disk.

  • zig fetch [path] feature
  • A future feature for verifying hash integrity of cached packages
  • Generally, avoiding unnecessary design limitations

Windows has had executable bit support for decades:

tsmanner: My understanding is that the NTFS extended attributes was added for Windows to be able to mount the WSL volumes and understand the rwx bits. I should have said NTFS extended attributes, the implication is that only Windows editions that support WSL would be able to unpack things and set an executable bit.
tsmanner: In thier Windows NTFS file systems, not in the WSL vm
andrewrk: NTFS extended attributes sounds like it's worth exploring. let's have a peek at which windows version introduced that
andrewrk: https://learn.microsoft.com/en-us/windows/win32/fileio/file-access-rights-constants
andrewrk: FILE_EXECUTE For a native code file, the right to execute the file. This access right given to scripts may cause the script to be executable, depending on the script interpreter.
andrewrk: this looks old as fuck
andrewrk: yeah this API has been available since Windows XP
andrewrk: lib/std/os/windows.zig:pub const FILE_EXECUTE = 0x00000020;
andrewrk: oh look at that we already have a zig constant for it
tsmanner: Yeah, says since XP there
andrewrk: there you have it. set that bit when unpacking; read that bit when calculating the hash. that's the task

@ianic
Copy link
Sponsor Contributor

ianic commented Mar 6, 2024

I tried to explain to myself how could we implement execute bit on windows.
I made small example to experiment.

NTFS has ACL (access control list) for each file. Entries in ACL can grant or deny access to the file. They are usually inherited from parent folders but can be also set on individual files. Each ACL entry (ACE) is linked to some trustee (user or group) and has access mask. Many times that access mask is set to FullControl which toggles most of the bits on in that mask. Including file_execute bit.

Here is example of reading file ACL. Initial ACL on my machine was:

tamp/foo
         inherited: true, access_mask: 1f01ff 111110000000111111111, file_execute: true, sid: /Everyone
         inherited: true, access_mask: 1f01ff 111110000000111111111, file_execute: true, sid: NT AUTHORITY/SYSTEM
         inherited: true, access_mask: 1f01ff 111110000000111111111, file_execute: true, sid: BUILTIN/Administrators
         inherited: true, access_mask: 1f01ff 111110000000111111111, file_execute: true, sid: IGORANIC5700/ianic

Access mask bits, zig constants

ACL has 4 entries each of them with full control access mask. In this case all entries are inherited from parent folder. To somehow store execute bit on file system I can add one more entry to the file ACL. Second part of the example does that:

tmp/foo with one more entry
 ==>     inherited: false, access_mask: 100020 100000000000000100000, file_execute: true, sid: /Everyone
         inherited: true, access_mask: 1f01ff 111110000000111111111, file_execute: true, sid: /Everyone
         inherited: true, access_mask: 1f01ff 111110000000111111111, file_execute: true, sid: NT AUTHORITY/SYSTEM
         inherited: true, access_mask: 1f01ff 111110000000111111111, file_execute: true, sid: BUILTIN/Administrators
         inherited: true, access_mask: 1f01ff 111110000000111111111, file_execute: true, sid: IGORANIC5700/ianic

Code is adding entry for Everyone with just file_execute bit set.

When I need to read executable bit I should read file ACL iterate over entries, find entry with some magic trustee (Everyone or current user) and access mask which has only file execute bit set and others unset. For files which has executable bit set this way that entry will probably be first in the list, for all others we need to iterate through list.

We can hack NTFS this way to store that bit but it seams to me soooo hacky, fragile and overly complicated.
Can we agree that something like this is not a way to go?

@binarycraft007
Copy link
Contributor

Note that cygwin supports fchmod with acl:
https://github.com/cygwin/cygwin/blob/main/winsup/cygwin/fhandler/disk_file.cc#L712

https://cygwin.com/cygwin-ug-net/ntsec.html

@ianic
Copy link
Sponsor Contributor

ianic commented Mar 7, 2024

Thanks @binarycraft007 for pointing to Cygwin.

Cygwin tries to map posix file permissions to Windows ACL. It uses combination of allow and deny ACL entries in specific order. It is also little hacky because Windows prefer to have all deny ACL entries before all allow, and file explorer permissions interface will complain if it finds different order and offer to reorder, which will break Cygwin logic.

In my example when I create file with Zig on my Windows system it initially inherits this ACL:

  ALLOW inherited: true, access_mask: 1f01ff 000111110000000111111111, file_execute: true , sid: /Everyone
  ALLOW inherited: true, access_mask: 1f01ff 000111110000000111111111, file_execute: true , sid: NT AUTHORITY/SYSTEM
  ALLOW inherited: true, access_mask: 1f01ff 000111110000000111111111, file_execute: true , sid: BUILTIN/Administrators
  ALLOW inherited: true, access_mask: 1f01ff 000111110000000111111111, file_execute: true , sid: IGORANIC5700/ianic

Then in Cygwin I inspect that file and remove executable flag:

$ ls -al foo
-rwx---rwx+ 1 ianic None 0 Mar  7 15:30 foo

$ whoami
ianic

$ id ianic
uid=197608(ianic) gid=197121(None) groups=197121(None),114(Local account and member of Administrators group),544(Administrators),545(Users),2(NETWORK),11(Authenticated Users),15(This Organization),113(Local account),4094(OtherSession),262154(NTLM Authentication),405504(High Mandatory Level)

$ chmod -x foo
$ ls -al foo
-rw----rw-+ 1 ianic None 0 Mar  7 15:30 foo

That results in ACL:

  DENY  inherited: false, access_mask: 020040 000000100000000001000000, file_execute: false, sid: /NULL SID
  ALLOW inherited: false, access_mask: 1f019f 000111110000000110011111, file_execute: false, sid: IGORANIC5700/ianic
  DENY  inherited: false, access_mask: 000020 000000000000000000100000, file_execute: true , sid: NT AUTHORITY/SYSTEM
  DENY  inherited: false, access_mask: 000020 000000000000000000100000, file_execute: true , sid: BUILTIN/Administrators
  ALLOW inherited: false, access_mask: 120080 000100100000000010000000, file_execute: false, sid: IGORANIC5700/None
  ALLOW inherited: false, access_mask: 1201bf 000100100000000110111111, file_execute: true , sid: NT AUTHORITY/SYSTEM
  ALLOW inherited: false, access_mask: 1201bf 000100100000000110111111, file_execute: true , sid: BUILTIN/Administrators
  DENY  inherited: false, access_mask: 00015f 000000000000000101011111, file_execute: false, sid: IGORANIC5700/None
  ALLOW inherited: false, access_mask: 12019f 000100100000000110011111, file_execute: false, sid: /Everyone

Cygwin has moved all entries from inherited to file specific. It relies on order and logic that entries are processed top down until explicit deny or allow for user or group in which is user is found. If I open and save this file permission in file explorer it will move all deny to the top.

@andrewrk
Copy link
Member Author

andrewrk commented Mar 7, 2024

Can we agree that something like this is not a way to go?

The requirement is that a fetched package can only fall into one of two categories:

  1. it unpacks successfully, and calculating the package hash from the file system directory is always correct, or
  2. unpacking fails because the OS cannot represent it faithfully.

here's the reasoning:

  1. any observable state must be captured in the package hash
  2. executable bit is observable state on posix systems; therefore it must either be captured in the package hash, or unpacking must always set executable bit to 0
  3. executable bit being always set to 0 would be very annoying on posix, so we want to preserve it when unpacking
  4. a fetched package must only fall into one of two categories: it either fails to unpack, or it unpacks successfully, and calculating the package hash from the file system directory is always correct
  5. therefore, unpacking on windows must either fail to unpack if any executable bit is set; or it must set some file system state so that the hash can be computed correctly

Given this, consider consider that not supporting executable bit on Windows file systems would mean that unpacking zig packages on posix systems would either mark all files as executable, or none of them as executable. With that in mind, do you still make the same recommendation (honest question)?

@ianic
Copy link
Sponsor Contributor

ianic commented Mar 7, 2024

Thanks for your time. Sorry for wasting it in persuading me.

My first pragmatic recommendation is to relax 2. and remove the executable bit from the hash calculation. We will set it on posix and ignore on non-posix systems. The only downside I can imagine is that changing just the executable on some files will not make a different package and the package manager will skip replacing it with the new version.

The second idea, hopefully, ticks all five points. We could save all extra information along with unpacked files. For example, if package files are stored in <hash> folder we could make one more <hash>.db (for example) and save a manifest file there. That manifest file will have a path|mode entry for each file (folder, symlink) from the package folder. That way all state is preserved in the file system.

Besides solving this problem it will lead to some interesting optimization. I'm thinking about collecting file content hash during unpacking. There we already have all bytes piping to the file system. Unpacker (tar, git, file, ...) will produce a list with path|kind|mode|hash for each unpacked file. Later in the process filter will remove unwanted entries from the list. Hasher will use just a list, without walking the file system to calculate the resulting package hash. When we move files to the <hash> folder we will also save the manifest to the <hash>.db folder.
I'm aware that we currently have tar, git, and folder as sources, and possibly others in the future. We will need to change the hashing algorithm a little for this to work.
This way we will program in Zig instead of wrestling with Windows API.

@andrewrk
Copy link
Member Author

andrewrk commented Mar 7, 2024

Neither of these tradeoffs are acceptable to me.

any observable state must be captured in the package hash

This is an ironclad rule which must not be broken.

We could save all extra information along with unpacked files

This does not work because it is required to compute a hash based on a user's directory that does not have extra information stored. Also, I am vetoing this extra layer of complexity. I've seen this pattern many times; it creates more problems than it solves.

This leaves only the set of following options to solve the problem:

  • Reflect the executable bit in the Windows file system.
  • Set the executable bit to 0 on posix for all files when unpacking.
  • Set the executable bit to 1 on posix for all files when unpacking.
  • Fail package extraction on Windows if any files have executable bit set.

Thank you for making the example code to explore the API. I agree it is unwieldy.

I also have one more bit of data: using explorer.exe, I see that the "Deny Read & Execute" checkbox, when clicked, also checks the "Deny Read" checkbox which seems to imply that you cannot solely deny execution permissions.

However, it does not look like this is how it works in the API, so I'm a bit confused on this point.

Anyway, if we can't wrangle the Windows API, then I think the path forward is the second one: set the executable bit to 0 on posix for all files when unpacking. Nobody gets executable files in zig packages.

The only use case I can think of this harming is shipping executables in zig packages. It would mean they could not be executed directly from the global package cache.

Now that I'm thinking about it, I can think of a 5th option to add to the list: set the executable bit to 0 on posix for all files when unpacking, however, set the executable bit to 1 for all files that have the ELF magic header. The executable bit does not go into the hash, and is ignored when unpacking. I actually can't think of any use cases that would be harmed by this strategy - can anyone else?

@slimsag
Copy link
Sponsor Contributor

slimsag commented Mar 7, 2024

Now that I'm thinking about it, I can think of a 5th option to add to the list: set the executable bit to 0 on posix for all files when unpacking, however, set the executable bit to 1 for all files that have the ELF magic header. The executable bit does not go into the hash, and is ignored when unpacking. I actually can't think of any use cases that would be harmed by this strategy - can anyone else?

This sounds perfect to me.

Anyway, if we can't wrangle the Windows API, then I think the path forward is the second one: set the executable bit to 0 on posix for all files when unpacking. Nobody gets executable files in zig packages.

The only use case I can think of this harming is shipping executables in zig packages. It would mean they could not be executed directly from the global package cache.

The only real use case I can think of would be e.g. a package that wants to run a bash script as part of build logic, but that is non-portable anyway. Presumably the package's build.zig could mark the file as executable before running it, if needed, anyway.

I can't think of a good reason to have executable files in packages at all, to be honest with you. My only care is to ensure the package hash is cross-platform, so any behavior that works everywhere I think is great.

@andrewrk
Copy link
Member Author

andrewrk commented Mar 7, 2024

I can't think of a good reason to have executable files in packages at all, to be honest with you. My only care is to ensure the package hash is cross-platform, so any behavior that works everywhere I think is great.

Perhaps for example a binary clang package, in the future so that the Zig compiler does not need to link LLVM.


The more I think about it, the more I am convinced this is the path forward:

  • Continue to ignore the executable bit when unpacking tarballs or from git. Don't bother implementing executable bit on Windows.
  • Ignore the executable bit when computing the package hash.
    • Keep hashing false in the same position to avoid breaking existing hashes.
  • When unpacking on posix, detect ELF header or shebang line, and make those files executable. This makes executable bit deterministic based on file contents rather than metadata.

I think this solves all the problems. Just like you would not expect to have a directory missing the executable bit, you would not expect to see an ELF file missing the executable bit.

@andrewrk andrewrk changed the title make packages fetched by zig retain the executable bit make packages fetched by zig set the executable bit based on ELF header and shebang line Mar 7, 2024
@ianic
Copy link
Sponsor Contributor

ianic commented Mar 9, 2024

I explored Windows ACL a little bit more.

Until now when creating files on Windows we didn't set ACL, file is inheriting it from parent. Parent usually has FullControl with all bits in access mask set.

I modified createFile to accept posix like mode and create ACL rights from mode:

  • read bit results in GENERIC_READ right
  • write bit in GENERIC_WRITE
  • execute bit in GENERIC_EXECUTE

I create ACL entry by this mapping for current user, user default group and everyone. Resulting in three ACL entries. Those entries are set in createFileW.

Now files has nice posix like permissions.

createFile with mode 0o644 results in this permissions:

  ALLOW inherited: false, access_mask: 12019f 000100100000000110011111, file_execute: false, sid: IGORANIC5700/ianic
  ALLOW inherited: false, access_mask: 120089 000100100000000010001001, file_execute: false, sid: IGORANIC5700/None
  ALLOW inherited: false, access_mask: 120089 000100100000000010001001, file_execute: false, sid: /Everyone
foo effective access mask: 12019f

owner (current user) has GENERIC_READ | GENERIC_WRITE set
user default group and Everyone Windows group have GENERIC_READ right set

This looks nice in Cygwin:

$ ls -al foo
-rw-r--r-- 1 ianic None 0 Mar  9 20:05 foo

createFile with mode 0o751 results in this permissions:

  ALLOW inherited: false, access_mask: 1201bf 000100100000000110111111, file_execute: true , sid: IGORANIC5700/ianic
  ALLOW inherited: false, access_mask: 1200a9 000100100000000010101001, file_execute: true , sid: IGORANIC5700/None
  ALLOW inherited: false, access_mask: 1200a0 000100100000000010100000, file_execute: true , sid: /Everyone
bar effective access mask: 1201bf

user: GENERIC_READ | GENERIC_WRITE | GENERIC_EXECUTE
group: GENERIC_READ | GENERIC_EXECUTE
everyone: GENERIC_EXECUTE

In Cygwin:

$ ls -al bar
-rwxr-x--x 1 ianic None 0 Mar  9 20:05 bar

For the package manager we can write files with mode 644 as on other systems, and executable with 755 and we can then easily distinguish between the two and read executable bit when needed.

Note:
Logic is little different from posix where user is getting one set of flags either owner/group or others. On Windows user is effective (compilation) rights from all his entries. Everyone is, for examples, always applied.

@silversquirl
Copy link
Contributor

Keep hashing false in the same position to avoid breaking existing hashes.

Would be nice to come back to this at some point and remove that legacy. The purist in me doesn't like the idea of that making it into 1.0! :)

@andrewrk
Copy link
Member Author

andrewrk commented Apr 2, 2024

Moving to 0.12 since it's needed to solve #16272.

@andrewrk andrewrk removed this from the 0.13.0 milestone Apr 2, 2024
@andrewrk andrewrk added this to the 0.12.0 milestone Apr 2, 2024
ianic added a commit to ianic/zig that referenced this issue Apr 5, 2024
Based on file content. Detects elf magic header or shebang line.

Executable bit is ignored in hash calculation, as it was before this. So
packages hashes are not changed.

Reference:
ziglang#17463 (comment)

Fixes: 17463

Test is here:
https://github.com/ianic/zig-fetch-test/blob/7c4600d7bb263f9b72fe3d0b70071f42be89e25c/src/main.zig#L307
(if ziglang#19500 got accepted I'll move this test to the Fetch.zig)
andrewrk pushed a commit that referenced this issue Apr 6, 2024
Based on file content. Detects elf magic header or shebang line.

Executable bit is ignored in hash calculation, as it was before this. So
packages hashes are not changed.

Reference:
#17463 (comment)

Fixes: 17463

Test is here:
https://github.com/ianic/zig-fetch-test/blob/7c4600d7bb263f9b72fe3d0b70071f42be89e25c/src/main.zig#L307
(if #19500 got accepted I'll move this test to the Fetch.zig)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Observed behavior contradicts documented or intended behavior contributor friendly This issue is limited in scope and/or knowledge of Zig internals. enhancement Solving this issue will likely involve adding new logic or components to the codebase. os-windows standard library This issue involves writing Zig code for the standard library. zig build system std.Build, the build runner, `zig build` subcommand, package management
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants