Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MRG: panic when FSStorage::load_sig encounters more than one Signature in a JSON record #3333

Merged
merged 51 commits into from
Nov 4, 2024

Conversation

ctb
Copy link
Contributor

@ctb ctb commented Sep 17, 2024

This PR was originally about debugging sourmash-bio/sourmash_plugin_branchwater#445, but that's going to require more work to fix properly. For now, I would like to nominate it for merge because sourmash fails silently in this situation, and that's Bad.

In brief, the main thing this PR does is panic with an unimplemented! when FSStorage::load_sig encounters more than one Signature in a JSON record.

This PR also adds a bit of documentation to InnerStorage, per the bottom of this comment.


The problem at hand: when loading a SigStore/Signature from a Storage, sourmash only loads the first one and ignores any others.

fn load_sig(&self, path: &str) -> Result<SigStore> {
let raw = self.load(path)?;
let sig = Signature::from_reader(&mut &raw[..])?
// TODO: select the right sig?
.swap_remove(0);

This results from the concept of a Signature as containing one or more sketches; the history of this is described here, and it leads to some interesting silliness in the Python layer.

The contrapositive is that, in Rust, a single Signature can include multiple sketches, e.g. with different ksizes. So this works fine for the wort case where we have a single .sig file with k=21, k=31, k51.

Note that the Python layer (and hence the entire sourmash CLI) fully supports multiple Signatures in JSON: this is well tested and well covered behavior. The branchwater plugin runs into it because it is using the Rust layer and the API is not fully fleshed out there.


@ctb ctb changed the title EXP: debug multisigfile loading WIP: explicitly note that loading more than one Signature from a .sig file is unimplemented currently Oct 27, 2024
@ctb ctb changed the title WIP: explicitly note that loading more than one Signature from a .sig file is unimplemented currently WIP: panic when FSStorage::load_sig encounters more than one Signature in a JSON record Oct 27, 2024
@ctb ctb added the rust label Oct 27, 2024
@ctb
Copy link
Contributor Author

ctb commented Oct 27, 2024

@luizirber @bluegenes thoughts welcome!

@ctb ctb requested review from luizirber and bluegenes October 27, 2024 19:14
@ctb ctb changed the title WIP: panic when FSStorage::load_sig encounters more than one Signature in a JSON record MRG: panic when FSStorage::load_sig encounters more than one Signature in a JSON record Oct 27, 2024
@ctb
Copy link
Contributor Author

ctb commented Nov 1, 2024

@luizirber bump!

Copy link
Contributor

@bluegenes bluegenes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a good plan to me. @luizirber, did you have any thoughts?

@ctb
Copy link
Contributor Author

ctb commented Nov 4, 2024

🙏

@ctb ctb merged commit c9fb078 into latest Nov 4, 2024
42 of 44 checks passed
@ctb ctb deleted the debug_multisigfile branch November 4, 2024 16:57
ctb added a commit that referenced this pull request Nov 5, 2024
## [0.17.0] - 2024-11-05

Changes/additions:
* standardize on u32 for scaled, and introduce `ScaledType` (#3364)
* panic when `FSStorage::load_sig` encounters more than one `Signature`
in a JSON record (#3333)

Updates:

* Bump needletail from 0.5.1 to 0.6.0 (#3376)
* Bump histogram from 0.11.0 to 0.11.1 (#3377)
* Bump serde from 1.0.210 to 1.0.214 (#3368)
* Bump serde_json from 1.0.128 to 1.0.132 (#3358)
* Fix clippy lints from 1.83 beta (#3357)
@ctb ctb mentioned this pull request Dec 5, 2024
ctb added a commit that referenced this pull request Dec 5, 2024
Developer updates:

* build: move ORCID to metadata in pyproject.toml, fix pixi (#3416)
* build: simplify Rust release (#3392)
* fix: Avoid re-calculating md5sum on clone and conversion to
KmerMinHashBTree (#3385)
* r0.15.1 release (#3304)
* update sourmash core to r0.17.0 (#3381)
* Added union method to HLL (#3293)
* Build: upgrade to newer maturin (#3366)
* CI: use supported ubuntu for codspeed (#3350)
* Fix clippy lints from 1.83 beta (#3357)
* Implement resumability for revindex (#3275)
* add `Manifest::intersect_manifest` to Rust core (#3305)
* bump sourmash core to r0.17.2 (#3399)
* change `sig_from_record` to use scaled from `Record` to downsample
(#3387)
* derive Hash for `HashFunctions` (#3344)
* enforce a single scaled on a `CollectionSet` (#3397)
* fix formatting from #3306 (#3307)
* have ruff ignore ipynb so as to avoid triggering an error during CI
(#3325)
* improve downsampling behavior on `KmerMinHash`; fix `RevIndex::gather`
bug around `scaled`. (#3342)
* panic when `FSStorage::load_sig` encounters more than one `Signature`
in a JSON record (#3333)
* propagate error from `RocksDB::open` on bad directory (#3306)
* refactor `calculate_gather_stats` to disallow repeated downsampling
(#3352)
* release core r0.17.1 (#3388)
* release sourmash rust core r0.16.0 (#3356)
* standardize on u32 for scaled, and introduce `ScaledType` (#3364)
* update plugin documentation for users (#3286)
* update sourmash core to r0.15.2 (#3338)
* when lingroups are provided, use them for `csv_summary` (#3311)
* Misc Rust updates to core (#3297)
* Resolve issue for high precision MLE estimation (#3296)

Dependabot and pre-commit CI updates:

* Bump DeterminateSystems/magic-nix-cache-action from 7 to 8 (#3319)
* Bump DeterminateSystems/nix-installer-action from 13 to 14 (#3320)
* Bump DeterminateSystems/nix-installer-action from 14 to 15 (#3374)
* Bump DeterminateSystems/nix-installer-action from 15 to 16 (#3401)
* Bump camino from 1.1.7 to 1.1.9 (#3301)
* Bump codspeed-criterion-compat from 2.6.0 to 2.7.2 (#3324)
* Bump conda-incubator/setup-miniconda from 3.0.4 to 3.1.0 (#3373)
* Bump csv from 1.3.0 to 1.3.1 (#3390)
* Bump getset from 0.1.2 to 0.1.3 (#3328)
* Bump histogram from 0.11.0 to 0.11.1 (#3377)
* Bump js-sys from 0.3.72 to 0.3.74 (#3412)
* Bump memmap2 from 0.9.4 to 0.9.5 (#3326)
* Bump myst-parser from 3.0.1 to 4.0.0 (#3277)
* Bump needletail from 0.5.1 to 0.6.0 (#3376)
* Bump pypa/cibuildwheel from 2.19.2 to 2.20.0 (#3278)
* Bump pypa/cibuildwheel from 2.20.0 to 2.21.1 (#3332)
* Bump pypa/cibuildwheel from 2.21.1 to 2.21.2 (#3345)
* Bump pypa/cibuildwheel from 2.21.2 to 2.21.3 (#3353)
* Bump pypa/cibuildwheel from 2.21.3 to 2.22.0 (#3408)
* Bump roaring from 0.10.6 to 0.10.7 (#3413)
* Bump serde from 1.0.204 to 1.0.207 (#3289)
* Bump serde from 1.0.207 to 1.0.208 (#3298)
* Bump serde from 1.0.208 to 1.0.209 (#3310)
* Bump serde from 1.0.209 to 1.0.210 (#3318)
* Bump serde from 1.0.210 to 1.0.214 (#3368)
* Bump serde from 1.0.214 to 1.0.215 (#3403)
* Bump serde_json from 1.0.120 to 1.0.121 (#3267)
* Bump serde_json from 1.0.121 to 1.0.122 (#3280)
* Bump serde_json from 1.0.122 to 1.0.124 (#3288)
* Bump serde_json from 1.0.124 to 1.0.125 (#3302)
* Bump serde_json from 1.0.125 to 1.0.127 (#3309)
* Bump serde_json from 1.0.127 to 1.0.128 (#3316)
* Bump serde_json from 1.0.128 to 1.0.132 (#3358)
* Bump serde_json from 1.0.132 to 1.0.133 (#3402)
* Bump sphinx-design from 0.5.0 to 0.6.0 (#3268)
* Bump sphinx-design from 0.6.0 to 0.6.1 (#3276)
* Bump tempfile from 3.10.1 to 3.11.0 (#3279)
* Bump tempfile from 3.11.0 to 3.12.0 (#3287)
* Bump tempfile from 3.12.0 to 3.13.0 (#3340)
* Bump tempfile from 3.13.0 to 3.14.0 (#3391)
* Bump thiserror from 1.0.63 to 1.0.64 (#3335)
* Bump thiserror from 1.0.64 to 1.0.65 (#3367)
* Bump thiserror from 1.0.65 to 1.0.68 (#3379)
* Bump thiserror from 1.0.68 to 2.0.3 (#3389)
* Bump web-sys from 0.3.69 to 0.3.70 (#3299)
* Bump web-sys from 0.3.70 to 0.3.72 (#3354)
* Bump web-sys from 0.3.72 to 0.3.74 (#3411)
* Update pytest-cov requirement from <6.0,>=4 to >=4,<7.0 (#3375)
* Update sphinx requirement from <8,>=6 to >=6,<9 (#3269)
* Upgrade rocksdb to 0.22.0, bump MSRV to 1.66  (#3383)
* [pre-commit.ci] pre-commit autoupdate (#3281)
* [pre-commit.ci] pre-commit autoupdate (#3290)
* [pre-commit.ci] pre-commit autoupdate (#3312)
* [pre-commit.ci] pre-commit autoupdate (#3330)
* [pre-commit.ci] pre-commit autoupdate (#3336)
* [pre-commit.ci] pre-commit autoupdate (#3341)
* [pre-commit.ci] pre-commit autoupdate (#3346)
* [pre-commit.ci] pre-commit autoupdate (#3360)
* [pre-commit.ci] pre-commit autoupdate (#3369)
* [pre-commit.ci] pre-commit autoupdate (#3380)
* [pre-commit.ci] pre-commit autoupdate (#3393)
* [pre-commit.ci] pre-commit autoupdate (#3404)
* [pre-commit.ci] pre-commit autoupdate (#3409)
* [pre-commit.ci] pre-commit autoupdate (#3414)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants