Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sparse similarity/containment matrix from sourmash compare #1750

Open
taylorreiter opened this issue Oct 8, 2021 · 1 comment
Open

sparse similarity/containment matrix from sourmash compare #1750

taylorreiter opened this issue Oct 8, 2021 · 1 comment

Comments

@taylorreiter
Copy link
Contributor

Suggested by @rob-p on twitter: https://twitter.com/nomad421/status/1444837857382764551
Is there anyway in sourmash (compare) to get a sparse similarity/containment matrix? Say I'm only interested in query pairs (i, j) where containment(i, j) > 0.9 or containment(j,i) > 0.9 — is there a way to avoid writing out what I'm not interested in?

@luizirber replied: heh, I did discuss something similar with @ReiterTaylor a few weeks ago (building a kNN graph, which also would be sparse). Biggest issues in current compare is building a dense matrix (very large) and then discarding a lot of results. What I was planning is using search for finding top k matches (or, in your case, C > 0.9) and build the sparse matrix on the fly. The Counter approach in greyhound and newer sourmash versions works great for that. If done in Rust, can also do in parallel by opening the index as read-only and doing many searches in parallel. I'll try to concoct something, but an issue on sourmash like @ctitusbrown mentioned would be great to keep track =]

Separately, I ran into a problem where I have 19,000 signatures that I'm comparing, and I run out of ram when i've given the job 900Gb.

@luizirber
Copy link
Member

worth taking a closer look: https://github.com/dcjones/turbocor

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants