Skip to content

Commit

Permalink
Tar.create: include entries for non-empty directories (#106)
Browse files Browse the repository at this point in the history
Issue #103 points out that omitting directory entries for non-empty
directories can confuse some tools that consume tarballs, including
docker, which applies overly restrictive permissions to directories
which are not explicitly included in a tarball.

This commit changes `Tar.create` and `Tar.rewrite` to produce tarballs
which include explicit directory entries for all (non-root) directories.
This changes Tar.jl's "canonical format", which is, by design one-to-one
with git tree hashes. However, it does not seem like anyone currently
depends on that exact reproducibility and it seems worth making this
change in order to avoid confusing external consumers of our tarballs.

Closes #103.
  • Loading branch information
StefanKarpinski committed Apr 22, 2021
1 parent 9316af8 commit ada4f12
Show file tree
Hide file tree
Showing 4 changed files with 21 additions and 17 deletions.
17 changes: 9 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -293,14 +293,6 @@ and it still records user and group information, modification times and exact
permissions. Coaxing command-line `tar` programs to omit this non-portable
information and use a portable (and `git`-compatible sort order) is non-trivial.

Another difference from command-line `tar`: non-empty directories are also
omitted from the tarballs that `Tar` creates since no metadata is recorded about
directories aside from the fact that they exist and the existence of non-empty
directories is already implied by the fact that they contain something else. If,
in the future, the ability to record metadata about directories is added,
tarballs will record entries for non-empty directories with non-default
metadata.

On the extraction side of things, doing `Tar.extract(tarball, dir)` is roughly
equivalent to the following commands:
```sh
Expand Down Expand Up @@ -472,6 +464,15 @@ have the same `git` tree hash, but produce different tarballs. Two _identical_
file trees will always produce identical tarballs, however, and that tarball
should remain stable in future versions of the `Tar` package.

**Note:** the canonical tarball format was [changed] slightly in the 1.10
release of the package. Since that release, the canonical format _includes_ all
directories in the canonical tarball format, whereas previously non-empty
directories were omitted since their existence is implied by their contents. The
absence of explicit directory entries in tarballs confused some external
tooling, so it was deemed worth a small format change to avoid such problems.

[changed]: https://github.com/JuliaIO/Tar.jl/pull/106

The `tree_hash` function can be used to compute a git-style tree hash of the
contents of a tarball (without needing to extract it). Moreover, two tarballs
created by the `Tar` package will have the same hash if and only if they contain
Expand Down
18 changes: 10 additions & 8 deletions src/create.jl
Original file line number Diff line number Diff line change
Expand Up @@ -80,24 +80,26 @@ function write_tarball(
tar_path::String = ".";
buf::Vector{UInt8} = Vector{UInt8}(undef, DEFAULT_BUFFER_SIZE),
)
w = 0
hdr, data = callback(sys_path, tar_path)
if hdr.type == :directory
data isa Union{Nothing, AbstractDict{<:AbstractString}} ||
error("callback must return a dict of strings, got: $(repr(data))")
data !== nothing && for name in sort!(collect(keys(data)))
sys_path′ = data[name]
tar_path′ = tar_path == "." ? name : "$tar_path/$name"
w += write_tarball(callback, tar, sys_path′, tar_path′, buf=buf)
end
else
data isa Union{Nothing, AbstractString, IO, Tuple{IO,Integer}} ||
error("callback must return nothing, string or IO, got: $(repr(data))")
end
if hdr.type != :directory || w == 0
w = 0
if tar_path != "."
w += write_tarball(tar, hdr, data, buf=buf)
end
data isa AbstractDict && for name in sort!(collect(keys(data)))
sys_path′ = data[name]
tar_path′ = tar_path == "." ? name : "$tar_path/$name"
w += write_tarball(callback, tar, sys_path′, tar_path′, buf=buf)
end
if tar_path == "." && w == 0
w += write_tarball(tar, hdr, data, buf=buf)
end
@assert w > 0
return w
end

Expand Down
1 change: 1 addition & 0 deletions test/runtests.jl
Original file line number Diff line number Diff line change
Expand Up @@ -212,6 +212,7 @@ end
# check rewrite
tarball′ = Tar.rewrite(tarball)
@test Tar.list(tarball′) == [
Tar.Header("dir", :directory, 0o755, 0, "")
Tar.Header("dir/file", :file, 0o755, 0, "")
Tar.Header("file", :file, 0o755, 0, "")
]
Expand Down
2 changes: 1 addition & 1 deletion test/setup.jl
Original file line number Diff line number Diff line change
Expand Up @@ -135,7 +135,7 @@ function make_test_dir(gen_skip::Bool=false)
return dir
end

const test_dir_paths = ["dir/file", "empty", "file", "link"]
const test_dir_paths = ["dir", "dir/file", "empty", "file", "link"]
Sys.iswindows() && pop!(test_dir_paths)

# uses Tar.list(callback, tarball) API
Expand Down

0 comments on commit ada4f12

Please sign in to comment.