Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extensible snapshots: snapshot packages not shared between snapshots, nor fully between packages #3431

Closed
dbaynard opened this issue Sep 13, 2017 · 17 comments

Comments

@dbaynard
Copy link
Contributor

dbaynard commented Sep 13, 2017

Stack no longer shares packages built using different snapshots.

Even with the same resolver, git packages are re-downloaded, too — though not rebuilt! — and in the local package directory, not the global snapshot directory. The snapshot packages are shared.

(Incidentally, I wonder if there's an issue I should follow concerning extensible snapshots. It's a feature I've been anticipating for a while, and I very much liked some of the previous behaviour — in particular the package sharing).

Steps to reproduce

  1. First, try to install pandoc using the global-project.

    Use resolver: https://gist.github.com/dbaynard/9693107a883a6282f52f22785d0072e2/raw/cbbfa7251704add22d145dc14da9b278fdac2c77 in stack.yaml

    stack install pandoc (though it probably works for any package; the point is, this is a downstream package)

  2. Change to resolver: https://gist.github.com/dbaynard/9693107a883a6282f52f22785d0072e2/raw/eeb1f7a905e06bb22c641f30923a91f96bc4ead6 which has one change (that of pandoc).

    stack install pandoc

  3. Then, enter a directory containing a stack.yaml pointing to the latter resolver, and no .stack-work

    stack build --dependencies-only

Expected

  1. Stack builds whatever dependencies it needs, then builds pandoc.
  2. Stack recognises it has already built all the dependencies and then begins downloading and building pandoc.
  3. Stack recognises it has already built some of the dependencies and then gives messages about reusing existing packages. It builds the remaining dependencies.

Actual

  1. Stack builds all the dependencies (then fails to build pandoc; this is not the issue here)
  2. Stack begins rebuilding all the same dependencies it built the first time.
  3. Stack downloads the snapshot packages provided as git commits, into the local .stack-work directory, but does not rebuild them. It then builds the remaining dependencies.

Stack version

$ stack --version
Version 1.5.1, Git revision 2b0392468a03174a7c5d317291c62452d771b055 (5182 commits) x86_64 hpack-0.18.1

Configuration just sets up some personal information for new projects, and allows newer packages and system ghc.

Method of installation

Bootstrapped the git repo with stack install stack from the most recent stable version.

@dbaynard
Copy link
Contributor Author

Workaround for downloading the git repos: manually rsync the .stack-work/downloaded directory.

@decentral1se
Copy link
Member

I suppose support here requires a \cc to @snoyberg who brought related changes in #3249.

@dbaynard
Copy link
Contributor Author

dbaynard commented Sep 17, 2017

I've since discovered a further issue with git repos in snapshots.

Stack downloads the repo, identifies the correct package version from the cabal file within, but then complains the package is not present in the index and fails to install it.

@mgsloan
Copy link
Contributor

mgsloan commented Sep 18, 2017

@dbaynard The first snapshot does not work for me. I get

WARNING: Ignoring out of range dependency (trusting snapshot over Hackage revisions): time-1.8.0.2. directory requires: >=1.4 && <1.8
WARNING: Ignoring out of range dependency (trusting snapshot over Hackage revisions): time-1.8.0.2. unix requires: >=1.2 && <1.8
...
    [1 of 2] Compiling Main             ( /tmp/stack31887/unix-2.7.2.1/Setup.hs, /tmp/stack31887/unix-2.7.2.1/.stack-work/dist/x86_64-linux/Cabal-1.24.2.0/setup/Main.o )
    [2 of 2] Compiling StackSetupShim   ( /home/mgsloan/.stack/setup-exe-src/setup-shim-mPHDZzAJ.hs, /tmp/stack31887/unix-2.7.2.1/.stack-work/dist/x86_64-linux/Cabal-1.24.2.0/setup/StackSetupShim.o )
    Linking /tmp/stack31887/unix-2.7.2.1/.stack-work/dist/x86_64-linux/Cabal-1.24.2.0/setup/setup ...
    Configuring unix-2.7.2.1...
    setup: Encountered missing dependencies:
    time >=1.2 && <1.8 && ==1.8.0.2

Even with the same resolver, git packages are re-downloaded, too — though not rebuilt! — and in the local package directory, not the global snapshot directory. The snapshot packages are shared.

That seems surprising. Note that for projects in different directories, downloaded git repos are never shared. This restriction could possibly be lifted, perhaps if we ascertained that it is indeed using a stable SHA and not some other variety of ref.

@snoyberg
Copy link
Contributor

snoyberg commented Sep 19, 2017 via email

@dbaynard
Copy link
Contributor Author

@mgsloan did you enable AllowNewer? Please excuse my non-minimal example. I suspect a custom snapshot with no changes moving to a snapshot with one change may share packages correctly, though I haven't had a chance to test this (and won't get one for the next fortnight). Likewise, install a downstream package from a snapshot changing a single upstream package from the stackage resolver, then add an extra downstream package to the snapshot, and I would expect with the current behavior all the packages compiled the first time would be recompiled.

Regarding rebuilding git packages, this behavior is not taking advantage of the immutable snapshot.

@snoyberg if I install something into global-project it stores the git repos in ~/.stack/global-project/.stack-work; it seems this behavior hasn't been changed for repos provided in the snapshot because that matches what previously happenes when a repo is defined in the global stack.yaml as an extra-dep. It would make sense to store the repo in the snaphsot directory.

Furthermore, it's a shame the full repo has to be downloaded. It would be nice if one could somehow alias a git package so if the repo is already on my machine, it checks out a new worktree. But that's a substantial addition.

I'll add any further idiosyncrasies I notice to this issue. Thanks.

@mgsloan
Copy link
Contributor

mgsloan commented Sep 19, 2017

I'm in favor of having a dir in STACK_ROOT for git repos, that makes sense to me. Question is, should package sharing also work for git repos? Could be feasible to do this based on the full SHA.

Heck, it might make sense to use the full SHA to determine the hash used in the git repo location. Unless there's a good way to do this remotely, this seems to imply cloning the repo, checking out the commit, querying the SHA for HEAD, and then moving the repo.

@dbaynard
Copy link
Contributor Author

If a git dependency is in a snapshot it absolutely must be shared. Otherwise what's the point of the snapshot - it might as well go in a stack.yaml.

Supposedly some servers enable uploadArchive.allowUnreachable for grabbing an individual commit but I don't know how that works.

@snoyberg why not host the cabal file in the snapshot.yaml? It should not change for a given fully specified dependency e.g. git commit or hashed hardball. Then add, for example, a stack snapshot command to allow a snapshot creator to update them automatically when that snapshot is created.

@snoyberg
Copy link
Contributor

snoyberg commented Oct 3, 2017

I still don't have much time to get to implementation, but I'm getting confused about what's going on here. As far as I understand it, we have two issues:

  1. A bandwidth optimization would be to clone repos to ~/.stack instead of .stack-work, so that they aren't redownloaded in separate projects. That overall seems fine.
  2. A bug in the caching of Git-provided packages between snapshots, which @mgsloan and I have both had trouble reproing.

If this is correct, can we:

  • Separate out the bandwidth optimization to another issue
  • Focus on getting a repro going in this issue

@dbaynard
Copy link
Contributor Author

dbaynard commented Oct 4, 2017

@snoyberg I've just submitted my PhD thesis so I now have time to investigate.

I'd initially thought the git behavior was linked to the change to extensible snapshots. Would you open a separate issue, then.

Meanwhile, I wasn't aware the repro issue concerned the bug report rather than using my custom snapshot. I'll investigate further.

@dbaynard
Copy link
Contributor Author

dbaynard commented Oct 4, 2017

Actually, my mistake, there are more issues than that. I'll summarize them once I've tried again to reproduce.

@dbaynard
Copy link
Contributor Author

The main issue is package sharing. I updated a single package in the snapshot and it downloaded and built all the needed packages in the snapshot, again. It should say 'using precompiled package' for at least the unchanged packages. Weirdly, it didn't happen every time.

Manually linking the ~/.stack/global-project/.stack-work/downloaded directory fixed the git issue.

@snoyberg
Copy link
Contributor

I've been able to reproduce this with a more minimal case. I've provided it at: https://github.com/snoyberg/stack-issue-3431.

snoyberg added a commit that referenced this issue Oct 17, 2017
The commit ed9ccc0 changed the behavior
of the precompiled cache to store relative instead of absolute paths.
That seems to have broken the ability to properly check if a library
exists, thereby defeating precompiled caching.

The issue that discovered this is #3431, and ascribed this to extensible
snapshots. I _think_, though I'm not 100% certain, that this bug reaches
much farther than extensible snapshots, and in fact breaks all
precompiled caches.
@snoyberg
Copy link
Contributor

Alright, I've opened up a PR (#3494) which I believe addresses this issue. The only concern I have is that this problem would not be limited to extensible snapshots, but would have affected all snapshots, and for over a year (since v1.2.0 of Stack). Since there haven't been other reports, I'm worried that I may be mistaken.

Nonetheless, I can no longer reproduce the bug reported here with that PR. @dbaynard if you could test that branch and confirm that it fixes your problem, that would be great.

@dbaynard
Copy link
Contributor Author

@snoyberg I didn't notice the issue in other stable versions of stack. I have been a keen user of the custom snapshots feature in part due to precompilation (my laptop has a 64GB hard drive of which I can only use a small fraction so I'm keen to minimize installed library versions) so I suspect I may have been both a) likely to see any issues if they'd occurred previously, and b) acutely sensitive to such a change.

I shall test the PR branch with the minimal example, and if (as I suspect) it fixes the issue I'll try it with the more complicated cases.

Incidentally, the rest of the extensible snapshots framework seems to work well (minus the git issue previously mentioned). I like the resulting changes to the stack.yaml syntax. It would be helpful to have an automated way of updating old style to new style stack.yaml files. I've stored my own extended snapshots as gists; it might be nice to have a stackage based url shortener to make this a little prettier.

snoyberg added a commit that referenced this issue Oct 18, 2017
snoyberg added a commit that referenced this issue Oct 18, 2017
The commit ed9ccc0 changed the behavior
of the precompiled cache to store relative instead of absolute paths.
That seems to have broken the ability to properly check if a library
exists, thereby defeating precompiled caching.

The issue that discovered this is #3431, and ascribed this to extensible
snapshots. I _think_, though I'm not 100% certain, that this bug reaches
much farther than extensible snapshots, and in fact breaks all
precompiled caches.
@dbaynard
Copy link
Contributor Author

I can confirm #3494 fixes the issue of not sharing binary packages.

As for sharing git sources, is that still being tracked here, is there a new issue, or is it being dropped?
To clarify: if a git repo is part of a snapshot I would expect it to be managed in ~/.stack and shared in the same manner as a binary package.

@mgsloan
Copy link
Contributor

mgsloan commented Oct 31, 2017

@dbaynard Great! I think git packages are shared by extensible snapshots, but I haven't tested it. I've opened #3535 about making this more reliable.

It is also implicitly tracked by #3330 . But if I'm wrong and git stuff isn't shared, then that probably should belong in its own issue, as it's a step along the path to #3330

@mgsloan mgsloan closed this as completed Oct 31, 2017
tswelsh pushed a commit to tswelsh/stack that referenced this issue Nov 7, 2017
tswelsh pushed a commit to tswelsh/stack that referenced this issue Nov 7, 2017
The commit ed9ccc0 changed the behavior
of the precompiled cache to store relative instead of absolute paths.
That seems to have broken the ability to properly check if a library
exists, thereby defeating precompiled caching.

The issue that discovered this is commercialhaskell#3431, and ascribed this to extensible
snapshots. I _think_, though I'm not 100% certain, that this bug reaches
much farther than extensible snapshots, and in fact breaks all
precompiled caches.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants