Handeling ES, mongo DB and raw files. #156

RubelMozumder · 2025-01-10T16:18:16Z

@ka-sarthak, I think it would fix deleting a nomad entry from Mongo, ES, and staging file systems. Please, take a look into it.

Co-authored-by: Sarthak Kapoor <57119427+ka-sarthak@users.noreply.github.com>

Co-authored-by: Sarthak Kapoor <57119427+ka-sarthak@users.noreply.github.com> Co-authored-by: Hampus Näsström <hampus.nasstrom@gmail.com>

* Implement write nexus section based on the populated nomad archive * app def missing. * mapping nomad_measurement. * All concept are connected, creates nexus file and subsection. * adding links in hdf5 file. * Remove the nxs file. * back to the previous design. * Include pynxtools plugins in nomad.yaml and extend dependencies including pynxtools ans pnxtools-xrd. * PR review correction. * Remove the entry_type overwtitten. * Remove comments. * Replace __str__ function. * RUFF * Update pyproject.toml Co-authored-by: Sarthak Kapoor <57119427+ka-sarthak@users.noreply.github.com> * Update src/nomad_measurements/xrd/schema.py Co-authored-by: Sarthak Kapoor <57119427+ka-sarthak@users.noreply.github.com> * Update src/nomad_measurements/xrd/nx.py * Replace Try-block. --------- Co-authored-by: Sarthak Kapoor <sarthakkapoor@hotmail.com> Co-authored-by: Sarthak Kapoor <57119427+ka-sarthak@users.noreply.github.com>

* updated plugin structure * added pynxtools dependency * Apply suggestions from code review Co-authored-by: Sarthak Kapoor <57119427+ka-sarthak@users.noreply.github.com> Co-authored-by: Hampus Näsström <hampus.nasstrom@gmail.com> * Add sections for RSM and 1D which uses HDF5 references * Abstract out data interaction using setter and getter; allows to use same methods for classes with hdf5 refs * Use arrays, not references, in the `archive.results` section * Lock the state for using nexus file and corresponding references * Populate results without references * Make a general reader for raw files * Remove nexus flags * Add quantity for auxialiary file * Fix rebase * Make integration_time as hdf5reference * Reset results (refactor) * Add backward compatibility * Refactor reader * add missing imports * AttrDict class * Make concept map global * Add function to remove nexus annotations in concept map * Move try block inside walk_through_object * Fix imports * Add methods for generating hdf5 file * Rename auxiliary file * Expect aux file to be .nxs in the beginning * Add attributes for hdf5: data_dict, dataset_paths * Method for adding a quantity to hdf5_data_dict * Abstract out methods for creating files based on hdf5_data_dict * Add dataset_paths for nexus * Some reverting back * Minor fixes * Refactor populate_hdf5_data_dict: store a reference to be made later * Handle shift from nxs to hdf5 * Set hdf5 references after aux file is created * Cleaning * Fixing * Redefine result sections instead of extending * Remove plotly plots from ELN * Read util for hdf5 ref * Fixing * Move hdf5 handling into a util class * Refactor instance variables * Reset data dicts and reference after each writing * Fixing * Overwrite dataset if it already exists * Refactor add_dataset * Reorganize and doctrings * Rename variable * Add read_dataset method * Cleaning * Adapting schema with hdf5 handler * Cooments, minor refactoring * Fixing; add `hdf5_handler` as an attribute for archive * Reorganization * Fixing * Refactoring * Cleaning * Try block for using hdf5 handler: dont fail early, as later normalization steps will have the handler! * Extract units from dataset attrs when reading * Fixing * Linting * Make archive_path optional in add_dataset * Rename class * attrs for add_dataset; use it for units * Add add_attribute method * Refactor add_attribute * Add plot attributes: 1D * Refactor hdf5 states * Add back plotly figures * rename auxiliary file name if changed by handler * Add referenced plots * Allow hard link using internel reference * Add sections for plots * Comment out validation * Add archive paths for the plot subsections * Add back validation with flag * Use nexus flag * Add interpolated intensity data into h5 for qspace plots * Use prefix to reduce len of string * Store regularized linespace of q vectors; revise descriptions * Remove plotly plots * Bring plots to overview * Fix tests * Linting; remove attr arg from add_dataset * Review: move none check into method * Review: use 'with' for opening h5 file * Review: make internal states as private vars * Add pydantic basemodel for dataset * Use data from variables if available for reading * Review: remove lazy arg * Move DatasetModel outside Handler class * Remove None from get, as it is already a default * Merge if conditions --------- Co-authored-by: Andrea Albino <andrea.albino@physik.hu-berlin.de> Co-authored-by: Andrea Albino <95371554+aalbino2@users.noreply.github.com> Co-authored-by: Hampus Näsström <hampus.nasstrom@gmail.com>

* Remove the Nexus file before regenerating it. * Reference to the NeXus entry. * PR review comments.

sourcery-ai · 2025-01-10T16:18:21Z

Reviewer's Guide by Sourcery

This pull request refactors the file writing process to handle Elasticsearch, MongoDB, and raw files more efficiently. It introduces a new function delete_entry_file to manage file deletion and updates the write_file function to use this new function. The changes also improve error handling and logging, and simplify the logic for writing different file types.

Sequence diagram for updated file writing process

sequenceDiagram
    participant Client
    participant WriteFile
    participant DeleteEntry
    participant Storage
    participant ES
    participant MongoDB

    Client->>WriteFile: write_file()
    alt nexus file
        WriteFile->>DeleteEntry: delete_entry_file()
        DeleteEntry->>Storage: Check if file exists
        DeleteEntry->>Storage: Delete raw files
        DeleteEntry->>MongoDB: Delete entry
        DeleteEntry->>ES: Delete by query
        WriteFile->>Storage: Write NeXus file
        WriteFile->>Storage: Process updated raw file
    else hdf5 file
        WriteFile->>Storage: Write HDF5 file
    end

Class diagram for file handling components

classDiagram
    class WriteFile {
        +write_file()
        -_write_nx_file()
        -_write_hdf5_file()
    }

    class DeleteEntryFile {
        +delete_entry_file(archive, mainfile, delete_entry)
    }

    class Upload {
        +upload_id
        +published
    }

    class Entry {
        +entry_id
        +upload_id
        +mainfile
    }

    class StagingUploadFiles {
        +delete_rawfiles(path)
    }

    WriteFile ..> DeleteEntryFile : uses
    DeleteEntryFile ..> Upload : checks
    DeleteEntryFile ..> Entry : manages
    DeleteEntryFile ..> StagingUploadFiles : uses

File-Level Changes

Change	Details	Files
Introduction of a new `delete_entry_file` function.	A new function `delete_entry_file` is introduced to handle the deletion of entry files and associated data in Elasticsearch and MongoDB. The function takes the archive, mainfile, and an optional delete_entry flag as arguments. It checks if the file exists and if the upload is published before proceeding with deletion. It uses `StagingUploadFiles` to delete raw files and `delete_by_query` to delete associated entries in Elasticsearch. If the `delete_entry` flag is set, it also deletes the entry from MongoDB.	`src/nomad_measurements/utils.py`
Refactoring of the `write_file` function.	The `write_file` function is updated to use the new `delete_entry_file` function. The logic for handling NeXus and HDF5 file writing is simplified. Error handling is improved by calling `delete_entry_file` with appropriate arguments in case of exceptions. The `process_updated_raw_file` function is called after writing the NeXus file. Redundant code for deleting files is removed.	`src/nomad_measurements/utils.py`
Removal of unused imports and code.	The unused import of `Entry` in the `_write_nx_file` function is removed. Redundant code for checking and deleting existing files before writing new ones is removed from `_write_nx_file`.	`src/nomad_measurements/utils.py`

Tips and commands

Interacting with Sourcery

Trigger a new review: Comment @sourcery-ai review on the pull request.
Continue discussions: Reply directly to Sourcery's review comments.
Generate a GitHub issue from a review comment: Ask Sourcery to create an
issue from a review comment by replying to it.
Generate a pull request title: Write @sourcery-ai anywhere in the pull
request title to generate a title at any time.
Generate a pull request summary: Write @sourcery-ai summary anywhere in
the pull request body to generate a PR summary at any time. You can also use
this command to specify where the summary should be inserted.

Customizing Your Experience

Access your dashboard to:

Enable or disable review features such as the Sourcery-generated pull request
summary, the reviewer's guide, and others.
Change the review language.
Add, remove or edit custom review instructions.
Adjust other review settings.

Getting Help

Contact our support team for questions or feedback.
Visit our documentation for detailed guides and information.
Keep in touch with the Sourcery team by following us on X/Twitter, LinkedIn or GitHub.

sourcery-ai

Hey @RubelMozumder - I've reviewed your changes - here's some feedback:

Overall Comments:

The new delete_entry_file() function needs proper documentation - please add a docstring describing its purpose, parameters, and possible exceptions (especially the PermissionError for published entries).

Here's what I looked at during the review

🟡 General issues: 1 issue found
🟢 Security: all looks good
🟢 Testing: all looks good
🟢 Complexity: all looks good
🟢 Documentation: all looks good

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

sourcery-ai · 2025-01-10T16:19:17Z

src/nomad_measurements/utils.py

@@ -569,6 +564,29 @@ def _set_hdf5_reference(
            resolved_section.m_set(quantity_name, ref)


+def delete_entry_file(archive, mainfile, delete_entry=False):


issue: Consider handling the case where Entry.first() returns None to avoid potential AttributeError

TLCFEM

Again, this whole thing relies on deleting things to make things work.
This often indicates other parts are wrongly implemented.
This kind of functionality should be discouraged and shall not be implemented in the first place, because it encourages potential risky and faulty code.

You have been warned.

TLCFEM · 2025-01-14T09:28:57Z

src/nomad_measurements/utils.py

+
+        stagingUploadFiles = StagingUploadFiles(archive.m_context.upload_id)
+        stagingUploadFiles.delete_rawfiles(path=mainfile)
+        entry = Entry.objects(


No need to pay the cost if not used.
So changes like this is better.

if delete_entry and (entry := Entry.objects( upload_id=archive.m_context.upload_id, mainfile=mainfile ).first()): delete_by_query({'entry_id': entry.entry_id}) entry.delete()

TLCFEM · 2025-01-14T09:29:52Z

src/nomad_measurements/utils.py

@@ -569,6 +564,29 @@ def _set_hdf5_reference(
            resolved_section.m_set(quantity_name, ref)


+def delete_entry_file(archive, mainfile, delete_entry=False):


The first input argument can directly be upload_id.

Prefer explicit key-value argument for optional arguments.

aalbino2 and others added 19 commits December 19, 2024 16:57

updated plugin structure

717df82

added pynxtools dependency

53d1ecf

Apply suggestions from Sarthak's code review

20c32bf

Co-authored-by: Sarthak Kapoor <57119427+ka-sarthak@users.noreply.github.com>

ruff linting

974e5f8

Ruff linting 2

fcdd502

Apply suggestions from code review

60e7edb

Co-authored-by: Sarthak Kapoor <57119427+ka-sarthak@users.noreply.github.com> Co-authored-by: Hampus Näsström <hampus.nasstrom@gmail.com>

changed xrd parser folder

58f8912

last fixes and descriptions

ce9a0ee

description of general schema

4f15052

changed general package into a module

98cbb69

Implement write nexus section based on the populated nomad archive

d78ce22

Update path of populate_nexus_subsection

09d21eb

Run Python test action for all PR

437c9b1

Fix for rebasing issues

6d04d02

Adding nexus in ref (#150)

90d07a1

* Remove the Nexus file before regenerating it. * Reference to the NeXus entry. * PR review comments.

Move common functionality to resolve_path

2c838b5

Handeling ES, mongo DB and raw files.

0d2f812

sourcery-ai bot reviewed Jan 10, 2025

View reviewed changes

RubelMozumder requested review from TLCFEM and removed request for TLCFEM January 12, 2025 17:57

TLCFEM requested changes Jan 14, 2025

View reviewed changes

ka-sarthak force-pushed the write-nexus-section branch from 2c838b5 to 6e47b5b Compare January 14, 2025 09:59

RubelMozumder mentioned this pull request Jan 15, 2025

Handeling ES, mongo DB and raw files. #157

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handeling ES, mongo DB and raw files. #156

Handeling ES, mongo DB and raw files. #156

RubelMozumder commented Jan 10, 2025 •

edited

Loading

sourcery-ai bot commented Jan 10, 2025 •

edited

Loading

Interacting with Sourcery

Customizing Your Experience

Getting Help

sourcery-ai bot left a comment

sourcery-ai bot Jan 10, 2025

TLCFEM left a comment

TLCFEM Jan 14, 2025

TLCFEM Jan 14, 2025

TLCFEM Jan 14, 2025

		@@ -569,6 +564,29 @@ def _set_hdf5_reference(
		resolved_section.m_set(quantity_name, ref)


		def delete_entry_file(archive, mainfile, delete_entry=False):

Handeling ES, mongo DB and raw files. #156

Are you sure you want to change the base?

Handeling ES, mongo DB and raw files. #156

Conversation

RubelMozumder commented Jan 10, 2025 • edited Loading

sourcery-ai bot commented Jan 10, 2025 • edited Loading

Reviewer's Guide by Sourcery

Sequence diagram for updated file writing process

Class diagram for file handling components

File-Level Changes

Interacting with Sourcery

Customizing Your Experience

Getting Help

sourcery-ai bot left a comment

Choose a reason for hiding this comment

sourcery-ai bot Jan 10, 2025

Choose a reason for hiding this comment

TLCFEM left a comment

Choose a reason for hiding this comment

TLCFEM Jan 14, 2025

Choose a reason for hiding this comment

TLCFEM Jan 14, 2025

Choose a reason for hiding this comment

TLCFEM Jan 14, 2025

Choose a reason for hiding this comment

RubelMozumder commented Jan 10, 2025 •

edited

Loading

sourcery-ai bot commented Jan 10, 2025 •

edited

Loading