-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handeling ES, mongo DB and raw files. #156
base: write-nexus-section
Are you sure you want to change the base?
Conversation
Co-authored-by: Sarthak Kapoor <57119427+ka-sarthak@users.noreply.github.com>
Co-authored-by: Sarthak Kapoor <57119427+ka-sarthak@users.noreply.github.com> Co-authored-by: Hampus Näsström <hampus.nasstrom@gmail.com>
* Implement write nexus section based on the populated nomad archive * app def missing. * mapping nomad_measurement. * All concept are connected, creates nexus file and subsection. * adding links in hdf5 file. * Remove the nxs file. * back to the previous design. * Include pynxtools plugins in nomad.yaml and extend dependencies including pynxtools ans pnxtools-xrd. * PR review correction. * Remove the entry_type overwtitten. * Remove comments. * Replace __str__ function. * RUFF * Update pyproject.toml Co-authored-by: Sarthak Kapoor <57119427+ka-sarthak@users.noreply.github.com> * Update src/nomad_measurements/xrd/schema.py Co-authored-by: Sarthak Kapoor <57119427+ka-sarthak@users.noreply.github.com> * Update src/nomad_measurements/xrd/nx.py * Replace Try-block. --------- Co-authored-by: Sarthak Kapoor <sarthakkapoor@hotmail.com> Co-authored-by: Sarthak Kapoor <57119427+ka-sarthak@users.noreply.github.com>
* updated plugin structure * added pynxtools dependency * Apply suggestions from code review Co-authored-by: Sarthak Kapoor <57119427+ka-sarthak@users.noreply.github.com> Co-authored-by: Hampus Näsström <hampus.nasstrom@gmail.com> * Add sections for RSM and 1D which uses HDF5 references * Abstract out data interaction using setter and getter; allows to use same methods for classes with hdf5 refs * Use arrays, not references, in the `archive.results` section * Lock the state for using nexus file and corresponding references * Populate results without references * Make a general reader for raw files * Remove nexus flags * Add quantity for auxialiary file * Fix rebase * Make integration_time as hdf5reference * Reset results (refactor) * Add backward compatibility * Refactor reader * add missing imports * AttrDict class * Make concept map global * Add function to remove nexus annotations in concept map * Move try block inside walk_through_object * Fix imports * Add methods for generating hdf5 file * Rename auxiliary file * Expect aux file to be .nxs in the beginning * Add attributes for hdf5: data_dict, dataset_paths * Method for adding a quantity to hdf5_data_dict * Abstract out methods for creating files based on hdf5_data_dict * Add dataset_paths for nexus * Some reverting back * Minor fixes * Refactor populate_hdf5_data_dict: store a reference to be made later * Handle shift from nxs to hdf5 * Set hdf5 references after aux file is created * Cleaning * Fixing * Redefine result sections instead of extending * Remove plotly plots from ELN * Read util for hdf5 ref * Fixing * Move hdf5 handling into a util class * Refactor instance variables * Reset data dicts and reference after each writing * Fixing * Overwrite dataset if it already exists * Refactor add_dataset * Reorganize and doctrings * Rename variable * Add read_dataset method * Cleaning * Adapting schema with hdf5 handler * Cooments, minor refactoring * Fixing; add `hdf5_handler` as an attribute for archive * Reorganization * Fixing * Refactoring * Cleaning * Try block for using hdf5 handler: dont fail early, as later normalization steps will have the handler! * Extract units from dataset attrs when reading * Fixing * Linting * Make archive_path optional in add_dataset * Rename class * attrs for add_dataset; use it for units * Add add_attribute method * Refactor add_attribute * Add plot attributes: 1D * Refactor hdf5 states * Add back plotly figures * rename auxiliary file name if changed by handler * Add referenced plots * Allow hard link using internel reference * Add sections for plots * Comment out validation * Add archive paths for the plot subsections * Add back validation with flag * Use nexus flag * Add interpolated intensity data into h5 for qspace plots * Use prefix to reduce len of string * Store regularized linespace of q vectors; revise descriptions * Remove plotly plots * Bring plots to overview * Fix tests * Linting; remove attr arg from add_dataset * Review: move none check into method * Review: use 'with' for opening h5 file * Review: make internal states as private vars * Add pydantic basemodel for dataset * Use data from variables if available for reading * Review: remove lazy arg * Move DatasetModel outside Handler class * Remove None from get, as it is already a default * Merge if conditions --------- Co-authored-by: Andrea Albino <andrea.albino@physik.hu-berlin.de> Co-authored-by: Andrea Albino <95371554+aalbino2@users.noreply.github.com> Co-authored-by: Hampus Näsström <hampus.nasstrom@gmail.com>
* Remove the Nexus file before regenerating it. * Reference to the NeXus entry. * PR review comments.
Reviewer's Guide by SourceryThis pull request refactors the file writing process to handle Elasticsearch, MongoDB, and raw files more efficiently. It introduces a new function Sequence diagram for updated file writing processsequenceDiagram
participant Client
participant WriteFile
participant DeleteEntry
participant Storage
participant ES
participant MongoDB
Client->>WriteFile: write_file()
alt nexus file
WriteFile->>DeleteEntry: delete_entry_file()
DeleteEntry->>Storage: Check if file exists
DeleteEntry->>Storage: Delete raw files
DeleteEntry->>MongoDB: Delete entry
DeleteEntry->>ES: Delete by query
WriteFile->>Storage: Write NeXus file
WriteFile->>Storage: Process updated raw file
else hdf5 file
WriteFile->>Storage: Write HDF5 file
end
Class diagram for file handling componentsclassDiagram
class WriteFile {
+write_file()
-_write_nx_file()
-_write_hdf5_file()
}
class DeleteEntryFile {
+delete_entry_file(archive, mainfile, delete_entry)
}
class Upload {
+upload_id
+published
}
class Entry {
+entry_id
+upload_id
+mainfile
}
class StagingUploadFiles {
+delete_rawfiles(path)
}
WriteFile ..> DeleteEntryFile : uses
DeleteEntryFile ..> Upload : checks
DeleteEntryFile ..> Entry : manages
DeleteEntryFile ..> StagingUploadFiles : uses
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @RubelMozumder - I've reviewed your changes - here's some feedback:
Overall Comments:
- The new
delete_entry_file()
function needs proper documentation - please add a docstring describing its purpose, parameters, and possible exceptions (especially the PermissionError for published entries).
Here's what I looked at during the review
- 🟡 General issues: 1 issue found
- 🟢 Security: all looks good
- 🟢 Testing: all looks good
- 🟢 Complexity: all looks good
- 🟢 Documentation: all looks good
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
@@ -569,6 +564,29 @@ def _set_hdf5_reference( | |||
resolved_section.m_set(quantity_name, ref) | |||
|
|||
|
|||
def delete_entry_file(archive, mainfile, delete_entry=False): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue: Consider handling the case where Entry.first() returns None to avoid potential AttributeError
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, this whole thing relies on deleting things to make things work.
This often indicates other parts are wrongly implemented.
This kind of functionality should be discouraged and shall not be implemented in the first place, because it encourages potential risky and faulty code.
You have been warned.
|
||
stagingUploadFiles = StagingUploadFiles(archive.m_context.upload_id) | ||
stagingUploadFiles.delete_rawfiles(path=mainfile) | ||
entry = Entry.objects( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to pay the cost if not used.
So changes like this is better.
if delete_entry and (entry := Entry.objects(
upload_id=archive.m_context.upload_id, mainfile=mainfile
).first()):
delete_by_query({'entry_id': entry.entry_id})
entry.delete()
@@ -569,6 +564,29 @@ def _set_hdf5_reference( | |||
resolved_section.m_set(quantity_name, ref) | |||
|
|||
|
|||
def delete_entry_file(archive, mainfile, delete_entry=False): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The first input argument can directly be upload_id
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Prefer explicit key-value argument for optional arguments.
2c838b5
to
6e47b5b
Compare
@ka-sarthak, I think it would fix deleting a nomad entry from Mongo, ES, and staging file systems. Please, take a look into it.