Fix/prediction upload #298

j279li · 2025-08-06T14:28:55Z

This pull request introduces improvements to how Zarr object codecs and chunking are handled for prediction outputs, as well as some minor serialization and documentation fixes. The main focus is on making object codec support explicit and robust, especially for custom codecs, and ensuring correct serialization behavior for non-JSON-serializable fields.

Enhancements to Zarr object codec and chunking support:

Added a utility function detect_object_codec_and_chunking in polaris/utils/zarr/codecs.py to determine the correct object codec, filter list, and chunking compatibility from template filters.
Made codec chunking support explicit by adding a supports_chunking attribute to RDKitMolCodec (set to True) and AtomArrayCodec (set to False).
Improved handling of None values in AtomArrayCodec.encode to explicitly set missing values in packed arrays.

Serialization and model improvements:

Updated BenchmarkPredictionsV2 to exclude the non-serializable dataset_zarr_root attribute from JSON serialization using a Pydantic Field directive, and modified __repr__ to exclude the predictions field as they are also stored in zarr.

Minor documentation cleanup:

Removed an outdated docstring reference to splits in BenchmarkV2Specification.

cwognum

In line with my comments in the Hub PR:

Rather than having these workarounds, I think it makes more sense to ditch the custom codecs. We can use default codecs (i.e. MsgPack for AtomArrays and VLenBytes for RDKit mols), to ensure it remains a valid default Zarr archive that any machine can open, and then convert from these formats to the objects we need internally.

This would be a bigger change than you've signed up for, though, as it also requires non-trivial changes to the dataset class. Let's do the following:

For datasets, we keep using the custom codecs
For predictions, we use default Zarr codecs and add conversion code on the client and service side.

Does that make sense?

…to and from custom codecs handled by the client

cwognum

Hi @j279li , left some comments. Most are pretty minor! Take a look and let me know what you think.

polaris/utils/zarr/codecs.py

polaris/prediction/_predictions_v2.py

Co-authored-by: Cas Wognum <caswognum@outlook.com>

j279li added 2 commits August 6, 2025 10:24

fixes to prediction upload

cec0240

formatting

2b626f7

j279li requested review from danielpeng1 and Andrewq11 August 6, 2025 14:28

j279li self-assigned this Aug 6, 2025

j279li requested a review from cwognum as a code owner August 6, 2025 14:28

j279li added bug Something isn't working fix Annotates any PR that fixes bugs labels Aug 6, 2025

cwognum reviewed Aug 6, 2025

View reviewed changes

j279li added 2 commits August 6, 2025 16:11

refactor for the zarr archive to use standard codecs with conversion …

56672b2

…to and from custom codecs handled by the client

removed chunking check

f4121e0

cwognum reviewed Aug 7, 2025

View reviewed changes

j279li and others added 11 commits August 7, 2025 12:29

Update polaris/utils/zarr/codecs.py

ac3700f

Co-authored-by: Cas Wognum <caswognum@outlook.com>

Update polaris/utils/zarr/codecs.py

16927fe

Co-authored-by: Cas Wognum <caswognum@outlook.com>

Update polaris/utils/zarr/codecs.py

0375efb

Co-authored-by: Cas Wognum <caswognum@outlook.com>

Update polaris/utils/zarr/codecs.py

88caf59

Co-authored-by: Cas Wognum <caswognum@outlook.com>

Update polaris/utils/zarr/codecs.py

94e6cc3

Co-authored-by: Cas Wognum <caswognum@outlook.com>

various fixes

5c49dc0

use enum instead of strEnum

829f130

format

fce35ee

remove unecessary get prediction method and setting metadata from upload

1c079ed

updated to_zarr for future zarr v3 compatibility

de3b8ae

format

009d66e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix/prediction upload #298

Fix/prediction upload #298

Uh oh!

j279li commented Aug 6, 2025 •

edited

Loading

Uh oh!

cwognum left a comment

Uh oh!

cwognum left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Fix/prediction upload #298

Are you sure you want to change the base?

Fix/prediction upload #298

Uh oh!

Conversation

j279li commented Aug 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cwognum left a comment

Choose a reason for hiding this comment

Uh oh!

cwognum left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

j279li commented Aug 6, 2025 •

edited

Loading