Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

should we add an Index.get method (or something similar) to retrieve signatures? #1848

Open
ctb opened this issue Feb 20, 2022 · 2 comments
Open

Comments

@ctb
Copy link
Contributor

ctb commented Feb 20, 2022

In #1837, we change sourmash sig extract to identify signatures to extract using manifest rows, and then have to convert the manifest rows into a manifest and then from there into a picklist in order to actually extract the sketches. This seems circuitous.

It also means that sourmash sig extract --picklist does not work on certain database types that do not support multiple picklists - LCA DBs, SBTs, and zipfiles w/o a manifest, for example.

Two ideas, not mutually exclusive -

one, we could have Index classes provide a signature getter that works on internal locations in manifests.

two, we could directly provide a method for retrieving many signatures, given a manifest (or, really, just a list of internal locations).

What I don't remember offhand is whether all Index classes support internal locations. If not, that would be a problem.

@ctb
Copy link
Contributor Author

ctb commented Apr 6, 2022

Some things going on the SqliteIndex PR #1808 make me think that we should enable individual retrieval via manifest row. That gives storages the ability to figure out what collection of information is best, include off-label manifest row columns like primary keys in sqlite databases...

@ctb
Copy link
Contributor Author

ctb commented Jan 29, 2024

over in calc-full-gather https://github.com/ctb/2024-calc-full-gather ref sourmash-bio/sourmash_plugin_branchwater#187, I wrote a generic function that used manifest rows to load specific sketches from a zip file:

def zipfile_load_ss_from_row(db, row):
    data = db.storage.load(row['internal_location'])
    sigs = sourmash.signature.load_signatures(data)

    return_sig = None
    for ss in sigs:
        if ss.md5sum() == row['md5']:
            assert return_sig is None # there can only be one!
            return_sig = ss

    if return_sig is None:
        raise ValueError("no match to requested row in db")
    return return_sig

Curious how this approach would generalize to all Index classes and also how it would interact with Rust Collection layer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant