-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
standalone manifests and storage and internal_location
- what do we recommend?
#3008
Comments
Hmm, do we want standalone manifest to be a thing? Or should they be paired with a I like the so in this case you would call or, maybe... a standalone manifest is a collection? This way it ties manifest + storage, and the command would be (originally posted in #3009, moved comment here) |
In |
some hot-take thoughts (so, may be quite dumb, but I've got to get them out of my head ;) - standalone manifests are already a thing, of course, so 🤷 I'm thinking of tackling the MultiIndex/pathlists-load-everything-all-together problem by using @bluegenes genius solution in the branchwater plugin, in which we load the pathlist twice - once to generate a manifest, and then again when we actually need the sketches. This will also have the beneficial effect of subtly deprecating pathlists in favor of standalone manifests, for performance and flexibility reasons, without actually breaking pathlists. It feels like we should implement a focused could a simple hack that addresses the storage idea be to support a generic Or something that fits better with the way we're doing save/load and save/load plugins - what if we introduce a new URI-focused scheme (beta-tested via a plugin) per #2258, and then use that to interpret manifest paths somehow? There are some interesting thoughts on It feels like there is a possible convergence now that we have the ability to really use the Rust |
Restating what I think @luizirber said, but having thought about it some more, esp in terms of implementation and CLI - What if we add a It will default to But you could also specify:
then we could have storage plugins separate from save/load plugins, too. A key thing each storage would have is a I think this pretty closely matches what luiz said above :) |
the branchwater search index (server) is initialized like this:
that
the sourmash/src/core/src/storage.rs Lines 141 to 154 in 874de99
Both sourmash/src/core/src/storage.rs Lines 101 to 123 in 874de99
Yup. =] |
while debugging #3053 I realized that we have some inconsistencies in the code -
I am slowly coming around to the idea that loading things relative to the manifest path is correct:
So in PR #3054 I'm trying out the following:
|
internal_location
- what do we recommend?
…` and `sig collect` (#3054) This PR updates `sig collect` and `sig check` so that they can produce standalone manifests that work properly with default sourmash loading behavior. The default behavior produces broken manifests in some situations and is not changed, but will be deprecated in v5. ## Details Currently, `sig collect` and `sig check` default to producing standalone manifests with internal path locations relative to the current working directory. This conflicts with the default `StandaloneManifest` behavior implemented in `save_load.py` that loads path locations relative to the manifest location. As a result, whenever the manifest was in a subdirectory, the standalone manifests output by `sig check` and `sig collect` were broken. The only way to make good manifests in this situation was to use `sig collect --abspath`, but `sig check` didn't support `--abspath`, and using absolute paths is brittle in situations where you want to distribute manifests. This PR adds `--relpath` to both `sig check` and `sig collect`, and adds `--abspath` to `sig check`. It also demonstrates the bad behavior in tests and annotates the tests appropriately. See #3008 (comment) for more detailed discussion of why I think `--relpath` is the right behavior for the future. - [x] adds `--abspath` and `--relpath` to `sig check`, to properly support relative paths; - [x] adds `--relpath` to `sig collect`, to properly support relative paths; - [x] documents this behavior properly for creating standalone manifests; - [ ] create issue to change default `sig check` and `sig collect` behavior for v4, and disable cwd behavior. Techie TODO: - [x] explicitly test `relpath` and `abspath` behavior in `sig check`; - [x] explicitly test `relpath` behavior in `sig collect` - [x] write some tests for `sig check` and `sig collect` to explore the relative path loading issue, with all three combinations of relpath: mf in cwd, sigs in subdir; mf in subdir, sigs in cwd; mf in subdir, sigs in subdir. Related issues: * Addresses #3008 * Addresses issues in #3048 by updating `sig check` to support `--relpath`; * Fixes #3053 - `--relpath` again --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
over in In sourmash-bio/sourmash_plugin_branchwater#213, I tried to describe standalone manifests and their benefits for lazy loading. I wrote:
And when I actually had to create a standalone manifest for my own use (for benchmarking) I was careful to use absolute paths.
Which made me think, ok, I just made a manifest, and it's got absolute paths to a bunch of sig files in it, and that's nice and robust. But it means if I move the directory full of sig files, I have to regenerate the manifest from scratch. Now, I could use my own special knowledge of manifest files and go in and change all the path names manually. But maybe it would be better to better support relative paths? Not sure.
A few things I could imagine:
Related issues:
-o
output. #1902internal_location
in manifests - should it be a URI? #2258The text was updated successfully, but these errors were encountered: