-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use multiformat reader #74
Conversation
I see. We could also pass the config dict key down to the
I implemented the notation differently. You need to write
Could something like this work: https://github.com/FAIRmat-NFDI/pynxtools-mpes/blob/25983bae61f04ad379d77fd5b335fe4759ba4766/tests/data/config_file.json#L113 I like the approach of exposing the config dict for manipulation and being able to do some post-processing on it. So let's go for your approach.
One possibility I also thought about also notation wise is that we actually not write But I'm not sure if this is the use case you are aiming at. |
Btw. there is also the |
Yeah, I think that would be quite helpful, also for some of the other callbacks (like in the
Can you spell this out again? There may be an error in what you wrote. I tried
The problem for me is that depending on the measurement settings, more or less detectors may be used, which I can infer from the actual data. So I need to dynamically replace them and cannot use the
the
and remove the keys with the generic detector from the dict. For the XPS reader, this works (after merging my PR in pynxtools). We can think if we want a generic solution for this in the MultiFormatReader or if each reader shall implement this itself.
My plan was to do it like here, have some generic ELN metadata and then some metadata specific to entry Su1s. Do you want this |
This can be easily added. Edit: It's added in the multiformatreader PR.
If I understand my code correctly it expects something like
I think it's fine if this is a specialised version. The MultiFormatReader is rather an aid tool to quickly build a reader for different formats. I think what we do in mpes and xps is comparably involved already (and we mainly want the config dict standardisation there).
This should work with |
I also update the prefix-value parsing now and it should work as you tried it, e.g., |
Pull Request Test Coverage Report for Build 10354752021Details
💛 - Coveralls |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks all good. Found a few parts which can be made simpler. But nothing as big as blocking this PR.
I did not look into all details, but I think it's well tested now anyways :)
Hi @domna, I have I started to implement the multi format reader (first focussing on the phi parser).
There are multiple things that are currently working nicely already.
@attrs
-> had to implement a special logic so that I can keep working with my existing notation@data
-> works for theaverage
andraw_data
keys, which write theentry/data/data
and.../detector/raw_data/raw
fields. Also works for any axisnames that are stored as coordinates of my data xarray.I write the individual scans also as DATA fields to NXdata using theget_data_dims
function.Then, I have some questions that are probably already possible for the existing solution in FAIRmat-NFDI/pynxtools#250. Would be thankful for your input on these.
filling template from
@eln
-> previously, I was simply writing@eln
in the config file because for a given config key like/ENTRY[entry]/INSTRUMENT[instrument]/beamTYPE[beam_xray]/distance
, there is a corresponding/ENTRY[entry]/INSTRUMENT[instrument]/beamTYPE[beam_xray]/distance
key in the ELN dict after parsing (using the convert_dict). Thus, I did not need to give any path in the config. However, theget_eln_data
requires a path str and if I simply write@eln
, it doesn't work. I would probably need to write the whole path (incl. upper- and lowercase notation), which is quite cumbersome.EDIT: works now as config keys are passed to
get_*
methodsAllow lists as config value to provide different options depending if the data exists in the specified position. As an example, for
["@attrs:title", "@eln, "no title"]
, the converter should check (in order) the@attrs
and@eln
tokens and write the respective value if it exists there. If not, it defaults to "no title". However, not it writes this list["@attrs:title", "@eln, "no title"]
into the field.EDIT: works now through Allow lists of possible values in config file pynxtools#390
Finally, there are some things that I have no idea how to implement with the current solution:
multiple detectors, analyser, etc. -> for this to work, I would need to replace these in the keys in the
parse_json_config
method (like for the entries:key = key.replace("/ENTRY[entry]/", f"/ENTRY[{entry_name}]/")
). However, this is currently not possible. A workaround would be if we replaced them in the config file before we fill from the template (i.e. in thepost_process
function). I have implemented something here, but then we need to change the read function in the MultiFormatReader like I implemented here.EDIT: works as config dict is exposed to post processing
In my ELN file, I would like to give the possibility to have some metadata for all entries and then some only for specific entries. I can do this now using some special logic in the eln file parsing, but I am not sure that is even possible with the NOMAD ELNs. Just wanted to get your opinion on this if it is worth working in this direction.
EDIT: works, but only for XPS reader
And then, finally finally, there could be some additional functionalities added. In the XpsReader, I check that any unit in the final template is a valid pint unit, but for this I need to iterate the whole array. We could implement this earlier and make it a generic functionality in the multi format reader as well.
Sorry for the long text, but these are all the issues I encountered.