Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

multisearch continues past "no query signatures loaded, exiting" message #280

Closed
ctb opened this issue Mar 19, 2024 · 6 comments · Fixed by #430
Closed

multisearch continues past "no query signatures loaded, exiting" message #280

ctb opened this issue Mar 19, 2024 · 6 comments · Fixed by #430

Comments

@ctb
Copy link
Collaborator

ctb commented Mar 19, 2024

when I run:

sourmash scripts multisearch zip-without-manifest.zip zip-with-manifest.zip -o zzz.csv

I get:

=> sourmash_plugin_branchwater 0.9.3.dev0; cite Irber et al., doi: 10.1101/2022.11.02.514947

ksize: 31 / scaled: 1000 / moltype: DNA / threshold: 0.01
searching all sketches in 'zip-without-manifest.zip' against 'zip-with-manifest.zip' using 8 threads
Reading query(s) from: 'zip-without-manifest.zip'
Sketch loading error: No such file or directory (os error 2)
WARNING: could not load sketches from path 'PK'
No valid signatures found in query pathlist 'zip-without-manifest.zip'
WARNING: 1 query paths failed to load. See error messages above.
No query signatures loaded, exiting.
Reading search(s) from: 'zip-with-manifest.zip'
Loaded 1 search signature(s)
DONE. Processed 0 comparisons
...multisearch is done! results in 'zzz.csv'

it seems a little weird it tells me that "No query signatures loaded, exiting." and then ...keeps on going.

What's even weirder is that I can't figure out where the message is coming from - the strong "No query signatures loaded" doesn't seem to show up in either the branchwater plugin repo, or the sourmash repo!?

@ctb
Copy link
Collaborator Author

ctb commented Mar 19, 2024

ahh, found it in utils.rs:

    // Validate sketches                                                        
    if collection.is_empty() {
        eprintln!("No {} signatures loaded, exiting.", report_type);
        return Ok(());
    }
    eprintln!("Loaded {} {} signature(s)", collection.len(), report_type);
    Ok(())

@bluegenes
Copy link
Contributor

Right, this is related to #207.

I can make a new issue to better reflect the new goal, which is to actually err from here, rather than returning Ok.

@ctb
Copy link
Collaborator Author

ctb commented Mar 19, 2024

I was thinking of linking that issue in, but I feel like this is slightly different, because this is a specific error that shouldn't be bypassed. I'll muse more.

@bluegenes
Copy link
Contributor

ok, sure - I see it. Here you're specifically thinking about zip errors.

So the reason errors are not automatically propagated up is because of way I changed sequential loading to allow manifests (in load_collection). Previously, if the extension was 'zip', we tried loading as a zipfile. If not, we tried as sig first and then fell back to pathlists. Since manifests and pathlists are so similar, I was having trouble integrating manifests into the same strategy.

Now, we try each loading function (zip > manifest > signature > pathlist). If the file can be loaded, we will load, even if that collection is empty. If we encounter an error, we track it and report the final error. I thought this was working well for reporting the right errors, but apparently not when the zip fails?

I'm definitely open to better sequential loading / error propagation strategies.

A simple zipfile fix would be to only use the zip loading for '.zip' and report errors directly. For the rest, I'm not sure how to better manage loading functions without enforcing file extensions or otherwise specifying which type of input we have (and therefore which loading methods we should try).

@ctb
Copy link
Collaborator Author

ctb commented Aug 24, 2024

There's actually two different problems here - one is the confusion from the printed out messages during load_collection, and the other is that multisearch continues past load_sketches, when load_sketches ends up returning 0 actual sketches that match (b/c of selection, or whatever).

Tackling this in #430.

@ctb
Copy link
Collaborator Author

ctb commented Aug 24, 2024

@ctb ctb closed this as completed in #430 Oct 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants