Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JSON and HDF5 output data not reproducible due to timestamp #895

Closed
peterjc opened this issue Mar 6, 2023 · 1 comment · Fixed by #923
Closed

JSON and HDF5 output data not reproducible due to timestamp #895

peterjc opened this issue Mar 6, 2023 · 1 comment · Fixed by #923

Comments

@peterjc
Copy link
Contributor

peterjc commented Mar 6, 2023

Quoting table.py, both methods to_json and to_hdf5 use the following:

date = '"date": "%s",' % datetime.now().isoformat()

Using a live date means otherwise reproducible analysis will fail a simple diff due to the time stamp.

Quoting https://biom-format.org/documentation/format_versions/biom-1.0.html

date : <datetime> Date the table was built (ISO 8601 format)

Quoting https://biom-format.org/documentation/format_versions/biom-2.0.html and https://biom-format.org/documentation/format_versions/biom-2.1.html

creation-date : <datetime> Date the table was built (ISO 8601 format)

In both cases, this is clearly a required field, so I think the best solution is to allow the date to be passed as an optional argument (defaulting to the current default of now). The user could then explicitly use (for example) the last modified date of their input data and metadata. It would also facilitate using diff for continuous integration testing.

In comparison, although the BAM format for sequencing data uses the GZIP header, most implementations deliberately do not fill in the MTIME field, ensuring full reproducibility.

@wasade
Copy link
Member

wasade commented Mar 6, 2023

Thanks, @peterjc! I completely agree with the this proposition. For additional context, the exact lines impacted are here and here.

These should be pretty minor changes to make. I'll add them on the next release, and I think cutting a minor one relatively quickly to support this is valuable.

@wasade wasade mentioned this issue Apr 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants