-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Base Attributes: Software Deps & Machine #170
Base Attributes: Software Deps & Machine #170
Conversation
c821d49
to
469361e
Compare
Discussion result today: we will keep it When we have concrete examples on how to automate this in various programming languages, e.g. build systems, we can reconsider making it recommended. I will update the PR soon and make it "work in progress" (WIP) for now - so please do not merge yet. |
Actually, would it be fine to postpone this for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would not make this required nor recommended for a number of reasons: 1) To be useful the writer program will have to collect the necessary information in some automated way and I do not see how to do this. 2) I suspect that this information will only rarely be useful. 3) What information to include is rather vague.
I think this options would have quite a lot of value for no implementor cost attached, provided it is optional and handled by an OpenPMD helper library, so that one only remembers about its existence once a bug hunting session ensues. And yeah, the premise is valid, the recreation input and conditions are something that implementors eventually want to add to the output.
'Ideal' is a strong word, eh. A script to run, a complete set of input data and, dunno, a Nix expression for a fully reproducible environment would be what I call 'ideal' (and, of course, unattainable). Let's stop using 'ideal' early on.
That's not mutually exclusive, like, at all. Does that mean that we want a whole tree of system attributes or something? |
@RemiLehe Since we make it optional and do not impose anything The topic on how to add something and what needs to be added when is a totally different one imho and can not be answered in the scope of openPMD alone :) |
@RemiLehe @t184256 how we fill this with helper libraries would be a wonderful contribution to one ouf the openPMD projects that we can add. Currently, I would really like to reserve the keyword so people can try it in reality. There is no harm, it's purely informational and we will find various great solutions for various use cases. For example, in PIConGPU we will just serialize our Reproducible software environments are a very domain and application specific topic, that we e.g. discuss regularily in Helmholtz Open Science. We should discuss the how and what in such environments to not go OT here. Why I want this in openPMD is just to draw an informational connection so one can already pinpoint the major components that an app developer recognizes as central for the data creation. Not more and not less so far. |
@RemiLehe I update the PR with the discussed change to make the keyword optional and purely informational. I hope the description above clarifies what it is intended for and I am interested what people will use it for and what solutions will develop! I also removed the restriction on a format besides "comma-separated list" so people can also decide to add container URIs or something :) |
01029f9
to
f69e937
Compare
Implements openPMD#116 and openPMD#137 for reproducible data creation. The individual software alone is not sufficient for proper documentation and reproduction. We therefore reserve new attributes for both dependencies of the `software` and hardware.
f69e937
to
6e219c1
Compare
OK, I guess if this is optional, it is fine. I'm fine with merging this. |
Just FYI: While updating the validator, I added in the example creation script a possible way to collect The function |
Define new base attributes for software dependencies and involved machine.
(Required would also be possible but earliest in 2.0 since it would break existing files and might be too strict compared to other base attributes such as
software
. We can also add it as optional now and upgrade it to recommended in later major versions in case general workflows develop around it from the community.)Implements issues: #116 #137
Description
The individual software alone is not sufficient for proper documentation and reproducible data creation. We therefore reserve new attributes for both dependencies of the
software
and involved machinery.For many data files, reproducing how it was created is increasingly complicated. Besides the need to share input, the environment with and on which a software was build can have dramatic influence on the outcome, e.g. due to changes / later discovered bugs in dependencies such as writer libraries or linear algebra libraries, etc.
Examples
For pythonic-software, the semicolon-joined output of
pip freeze
orconda list --export
would beideala good start for the attributesoftwareDependencies
.On HPC systems, output of
module list
would be a good starting point.For CMake based builds, the versions of software in a target's
INTERFACE_LINK_LIBRARIES
property could be used to auto-generate a list.For
machine
the simplehostname
, the name of a scientific instrument (camera type, etc.) or cluster name are a good value. For hardware-centric projects, also a list of relevant hardware and versioning could be used.Affected Components
base
Logic Changes
None.
Writer Changes
Writers should (recommended) write the
machine
(e.g.hostname
) and software dependencies ofsoftware
to new openPMD files now.We should also update out example files to include the two new attributes:
Reader Changes
No effects besides additional information that can be read.
Data Converter
No effect, old files are forward compatible to this change.