You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Suggested by @rob-p on twitter: https://twitter.com/nomad421/status/1444837857382764551
Is there anyway in sourmash (compare) to get a sparse similarity/containment matrix? Say I'm only interested in query pairs (i, j) where containment(i, j) > 0.9 or containment(j,i) > 0.9 — is there a way to avoid writing out what I'm not interested in?
@luizirber replied: heh, I did discuss something similar with @ReiterTaylor a few weeks ago (building a kNN graph, which also would be sparse). Biggest issues in current compare is building a dense matrix (very large) and then discarding a lot of results. What I was planning is using search for finding top k matches (or, in your case, C > 0.9) and build the sparse matrix on the fly. The Counter approach in greyhound and newer sourmash versions works great for that. If done in Rust, can also do in parallel by opening the index as read-only and doing many searches in parallel. I'll try to concoct something, but an issue on sourmash like @ctitusbrown mentioned would be great to keep track =]
Separately, I ran into a problem where I have 19,000 signatures that I'm comparing, and I run out of ram when i've given the job 900Gb.
The text was updated successfully, but these errors were encountered:
Suggested by @rob-p on twitter: https://twitter.com/nomad421/status/1444837857382764551
Is there anyway in sourmash (compare) to get a sparse similarity/containment matrix? Say I'm only interested in query pairs (i, j) where containment(i, j) > 0.9 or containment(j,i) > 0.9 — is there a way to avoid writing out what I'm not interested in?
@luizirber replied: heh, I did discuss something similar with @ReiterTaylor a few weeks ago (building a kNN graph, which also would be sparse). Biggest issues in current compare is building a dense matrix (very large) and then discarding a lot of results. What I was planning is using search for finding top k matches (or, in your case, C > 0.9) and build the sparse matrix on the fly. The Counter approach in greyhound and newer sourmash versions works great for that. If done in Rust, can also do in parallel by opening the index as read-only and doing many searches in parallel. I'll try to concoct something, but an issue on sourmash like @ctitusbrown mentioned would be great to keep track =]
Separately, I ran into a problem where I have 19,000 signatures that I'm comparing, and I run out of ram when i've given the job 900Gb.
The text was updated successfully, but these errors were encountered: