Recommendations for model authors and data providers #343

ots22 · 2022-10-07T18:21:13Z

A lot of freedom is left to creators of models and datasets, particularly model dependencies and data formats - Scivision is supposed to work with a wider range of models and data than we could anticipate.

Despite this, there are certainly some recommendations we could make, even if it would be hard to make them requirements.

We can link to recommendations from others (general advice or community/library specific).

Some ideas below - please update the list with more!

General

Create a page in the docs for collecting these (or update model and data pages)

Model authors

platform portability
package dependencies. Ideally pin all primary dependencies either to a range (including both top and bottom) or to the current version (which is known to work)
Tensorflow-specific advice
- ...
pytorch-specific advice
- ...
Testing
- Include a test that runs the model on toy data (check the output at the right level - could check for NaN, probably don't want to insist on bitwise reproducibility. Classifier could check most probable class etc.)
- Insist on pytest?

Data providers

Some suggested options for data storage (e.g. [ENH] Investigate HuggingFace for data storage #317)
DOI creation
Size considerations - expectation is that these are to try out quickly, fit on available services, downloaded to users' machines.
If their dataset is 'large', to include a "sample" dataset (e.g. hosted on Zenodo)
- Should have an option to try out a dataset with a download limit of 10-100 MB
- potentially in addition to a larger version of the data, also in the catalog (consider how to link these - via a 'project'?)
- 'available on request' option (via 'homepage'/'contact' url - not currently in data catalog)

acocac added this to the Web interface milestone Apr 6, 2023

ots22 added this to Scivision fortnightly planning: 2023-04-18 to 2023-05-02 Apr 12, 2023

ots22 assigned IFenton Apr 20, 2023

IFenton added this to Scivision fortnightly planning: 2023-05-02 to 2023-05-16 May 4, 2023

ots22 mentioned this issue Aug 22, 2023

Investigate/add examples of datasets in ZARR format to the scivision catalog #599

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Recommendations for model authors and data providers #343

Recommendations for model authors and data providers #343

ots22 commented Oct 7, 2022 •

edited

Loading

Recommendations for model authors and data providers #343

Recommendations for model authors and data providers #343

Comments

ots22 commented Oct 7, 2022 • edited Loading

General

Model authors

Data providers

ots22 commented Oct 7, 2022 •

edited

Loading