-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use hdf5 or nexus file in XRD #113
base: main
Are you sure you want to change the base?
Conversation
f2bef40
to
d583974
Compare
d583974
to
3aeb548
Compare
@hampusnasstrom @aalbino2 I merged the implementation of the HDF5Handler and support for The Plotly plots are removed in favor of the plots from H5Web. @budschi's current viewpoint is that Plotly plots have better visualizations and it might be a good idea to preserve them for 1D scans. This can be a point of discussion when we review this PR after the vacations @RubelMozumder will soon merge his implementations from #147 which will allow to use |
71b6952
to
19dec87
Compare
@RubelMozumder I have combined the common functionality from |
TODO
|
Have you checked what is the root cause of the issue? |
@TLCFEM I wasn't able to investigate it yet. But this will be among the first things I do in the new year and will reach out to you with my findings. Happy Holidays! |
If it is not the case, then all discussions are not valid anymore. |
If I explain the situation that may lay bear the scenario. Issue: In the second attempt of reprocessing the entire upload ( Temporary solution: |
@RubelMozumder what prevents you from checking the existence of the .nxs file and create a new one only in the case it doesn't exist yet? |
Co-authored-by: Sarthak Kapoor <57119427+ka-sarthak@users.noreply.github.com>
Co-authored-by: Sarthak Kapoor <57119427+ka-sarthak@users.noreply.github.com> Co-authored-by: Hampus Näsström <hampus.nasstrom@gmail.com>
It may resolve the race condition reading/writing function on the same file. There is another issue, I think that needs to be fixed by area-D to delete an entry (corrupted) and its related file from the single process thread running normalizer. The PR: #157 can help, you see that the test is completely failed. |
@lauri-codes, is there any functionality that deletes a entry, associated mainfile and the residue (if there is something e.g. ES data) of that deleted entry? This deletion must happens inside the eln normalization process. Just a quick overview of implementation:
Currently, You may want to take a quick view of code in function
I have created a small function to delete mainfile, entry and ES (here:
If you could please suggest any functionality that is available in NOMAD. |
@RubelMozumder: There is no such functionality, and I doubt there ever will be. Deleting entries during processing is not something we can really endorse in any way: there are too many ways to screw this up (what happens if the entry is deleted and then an expection happens before the new data is stored? What happens when some other processed entry tries to read the deleted entry simultaneously? What happens if the file is opened by another process and there is a lock on it when someone tries to delete it?) I would instead want to try and understand what is the goal you are trying to achieve with this normalizer. It is reasonable to create temporary files during normalization and also reasonable to create new entries at the end of normalization (assuming there are no circular processing steps or parallel processes that might cause issues). |
First processing:
Reprocessing the upload:
One way to avoid this is to control the nexus file access using an overwrite nexus file switch (BoolEditQuantity) in the ELN. In the first processing, the ELN generates the nexus file and sets the switch to |
The above solution does not work as intended due to the following issue: https://gitlab.mpcdf.mpg.de/nomad-lab/nomad-FAIR/-/merge_requests/2301 |
The changes made here are back-compatible. However, the oasis admins must re-process all the |
Reviewer's Guide by SourceryThis pull request introduces the use of HDF5 or NeXus files to store array data from XRD measurements, which reduces the archive size and loading time. It implements the use of Sequence diagram for XRD data processing with HDF5/Nexus storagesequenceDiagram
participant User
participant XRD as XRDMeasurement
participant Handler as HDF5Handler
participant Storage as HDF5/Nexus File
participant Archive as NOMAD Archive
User->>XRD: Upload XRD data
XRD->>Handler: Create HDF5Handler
Handler->>Handler: add_dataset()
Handler->>Handler: add_attribute()
Handler->>Storage: write_file()
Note over Handler,Storage: Creates .h5 or .nxs file
Handler->>Archive: set_hdf5_references()
Note over Handler,Archive: Updates archive with references
Archive-->>User: Return processed data
Class diagram for the updated XRD data handlingclassDiagram
class HDF5Handler {
+data_file: str
+archive: EntryArchive
+logger: BoundLogger
+nexus: bool
+add_dataset()
+add_attribute()
+read_dataset()
+write_file()
-_write_nx_file()
-_write_hdf5_file()
+set_hdf5_references()
}
class XRDResult {
+intensity: HDF5Reference
+two_theta: HDF5Reference
+q_norm: HDF5Reference
+omega: HDF5Reference
+phi: HDF5Reference
+chi: HDF5Reference
+plot_intensity: XRDResultPlotIntensity
+plot_intensity_scattering_vector: XRDResultPlotIntensityScatteringVector
}
class XRDResultPlotIntensity {
+intensity: HDF5Reference
+two_theta: HDF5Reference
+omega: HDF5Reference
+phi: HDF5Reference
+chi: HDF5Reference
+normalize()
}
XRDResult --> XRDResultPlotIntensity
XRDResult --> XRDResultPlotIntensityScatteringVector
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @ka-sarthak - I've reviewed your changes and they look great!
Here's what I looked at during the review
- 🟡 General issues: 1 issue found
- 🟢 Security: all looks good
- 🟡 Testing: 1 issue found
- 🟡 Complexity: 1 issue found
- 🟢 Documentation: all looks good
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
@hampusnasstrom If an old entry is not re-processed and opened, it's not broken and the data in the HDF5Reference quantity shows the array data. Here's a screenshot: If now I make some changes and save the entry, it raises the error "Shape mismatch for " and the data section goes away. This isn't good because it does not get fixed if I reprocess the upload. The safe way is to trigger the reprocess of the upload, rather than doing it from inside the entry. |
When array data from XRD measurements is added to the archives, the loading time increases as the archives become heavier (especially in the case of RSM which stores multiple 2D arrays). One solution is to use an auxiliary file to offload the heavy data and only save references to the auxiliary files in the archives.
To implement, we can use
.h5
files to store the data and make references to the offloaded datasets using HDF5Reference. Additionally, we can also generate a nexus.nx
file instead of.h5
file. Nexus file uses the.h5
file as the base file type and validates the data with the data models built by the Nexus community.The current plots are generated using Plotly. The
.json
files containing the plot data is also being stored in the archive. This also needs to be offloaded to make the archives lighter. UsingH5WebAnnotations
of NOMAD, we can leverage the H5Web to generate plots from the.h5
or.nx
files.To this end, the following steps are needed
HDF5Reference
as the type of the Quantity for array data: intensity, two_theta, q_parallel, q_perpendicular, q_norm, omega, phi, chi.HDF5Handler
or functions to create auxiliary files from the normalizers of the schema.h5
to store the data and save references to its datasets inHDF5Reference
quantities..nxs
file based on the archive. This happens in theHDF5Handler
and usespynxtools
.Summary by Sourcery
Implement support for storing XRD array data in external HDF5 or Nexus files, and generate plots using H5WebAnnotations.
New Features:
Tests: