Handeling ES, mongo DB and raw files. #157

RubelMozumder · 2025-01-15T14:58:08Z

Duplicate PR for PR: #156.

Summary by Sourcery

Enhancements:

Improved handling of existing files during file writing operations.

sourcery-ai · 2025-01-15T14:58:13Z

Reviewer's Guide by Sourcery

This pull request implements handling of Elastic Search, MongoDB, and raw files by adding a new function delete_entry_file and modifying the write_file method. The delete_entry_file function handles the deletion of entries and associated files from MongoDB, Elastic Search, and the file system. The write_file method is updated to use this new function for file deletion and to handle NeXus files correctly.

Sequence diagram for file writing and deletion process

sequenceDiagram
    participant Client
    participant WriteFile
    participant DeleteEntry
    participant ES as ElasticSearch
    participant MongoDB
    participant FileSystem

    Client->>WriteFile: write_file()
    alt nexus file
        WriteFile->>DeleteEntry: delete_entry_file()
        DeleteEntry->>ES: delete_by_query()
        DeleteEntry->>MongoDB: delete entry
        DeleteEntry->>FileSystem: delete raw files
        WriteFile->>WriteFile: _write_nx_file()
        WriteFile->>FileSystem: process_updated_raw_file()
    else h5 file
        WriteFile->>WriteFile: _write_hdf5_file()
    end

Class diagram showing updated file handling structure

classDiagram
    class NXFileGenerationError{
        +Exception
    }
    class FileWriter{
        -archive
        -data_file
        -nexus: bool
        -hdf5_data_dict
        +write_file()
        -_write_nx_file()
        -_write_hdf5_file()
    }
    class DeleteEntryFile{
        +delete_entry_file(archive, mainfile, delete_entry)
    }

    FileWriter ..> DeleteEntryFile: uses
    FileWriter --> NXFileGenerationError: throws

File-Level Changes

Change	Details	Files
Implement file deletion and entry removal logic.	Added the `delete_entry_file` function to delete entries and files from MongoDB, Elastic Search, and the file system. Modified the `write_file` method to use the `delete_entry_file` function for deleting files. Updated error handling in `write_file` to correctly handle file deletion when NeXus file generation fails. Added logic to process updated raw files when a NeXus file is written. Removed direct file deletion and entry processing logic from `_write_nx_file` and `_write_hdf5_file` methods. Added handling for deleting published entries in `delete_entry_file` function. Added import statements for required classes in `delete_entry_file` function.	`src/nomad_measurements/utils.py`
Refactor file writing logic.	Removed debugging code from `utils.py`. Removed unused import statement in `_write_nx_file` method. Simplified file writing logic by removing redundant checks and operations. Improved code readability by reorganizing and streamlining the file writing process.	`src/nomad_measurements/utils.py`

Tips and commands

Interacting with Sourcery

Trigger a new review: Comment @sourcery-ai review on the pull request.
Continue discussions: Reply directly to Sourcery's review comments.
Generate a GitHub issue from a review comment: Ask Sourcery to create an
issue from a review comment by replying to it.
Generate a pull request title: Write @sourcery-ai anywhere in the pull
request title to generate a title at any time.
Generate a pull request summary: Write @sourcery-ai summary anywhere in
the pull request body to generate a PR summary at any time. You can also use
this command to specify where the summary should be inserted.

Customizing Your Experience

Access your dashboard to:

Enable or disable review features such as the Sourcery-generated pull request
summary, the reviewer's guide, and others.
Change the review language.
Add, remove or edit custom review instructions.
Adjust other review settings.

Getting Help

Contact our support team for questions or feedback.
Visit our documentation for detailed guides and information.
Keep in touch with the Sourcery team by following us on X/Twitter, LinkedIn or GitHub.

sourcery-ai

Hey @RubelMozumder - I've reviewed your changes - here's some feedback:

Overall Comments:

Remove the commented out debug code at the top of the file
The delete_entry_file function is missing a docstring. Please add documentation explaining the purpose, parameters, and behavior of this function, especially since it handles critical file deletion operations.

Here's what I looked at during the review

🟡 General issues: 1 issue found
🟢 Security: all looks good
🟢 Testing: all looks good
🟡 Complexity: 1 issue found
🟢 Documentation: all looks good

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

sourcery-ai · 2025-01-15T14:59:05Z

src/nomad_measurements/utils.py

+        ).first()):
+        upload = Upload.objects(upload_id=archive.m_context.upload_id).first()
+        if upload.published:
+            raise PermissionError('Published entry cannot be deleted.')


issue (bug_risk): Handle PermissionError when attempting to modify published entries

The PermissionError thrown when attempting to modify published entries isn't caught in write_file(). This could cause unexpected failures in production.

sourcery-ai · 2025-01-15T14:59:05Z

src/nomad_measurements/utils.py

@@ -52,6 +52,8 @@
    )


+# import debugpy


issue (complexity): Consider simplifying file handling by separating entry cleanup and improving error handling logic.

The file handling could be simplified while maintaining all functionality by:

Separating entry cleanup from file operations:

def cleanup_entry(archive, mainfile): """Handle entry cleanup in database""" from nomad.processing import Entry entry = Entry.objects(upload_id=archive.m_context.upload_id, mainfile=mainfile).first() if entry: if Upload.objects(upload_id=archive.m_context.upload_id).first().published: raise PermissionError('Published entry cannot be deleted.') delete_by_query({'entry_id': entry.entry_id}) entry.delete() def write_file(self): """Simplified file writing with clear error handling""" try: if self.nexus: self.archive.m_context.delete_raw_file(self.data_file) # Clean slate self._write_nx_file() self.archive.m_context.process_updated_raw_file(self.data_file, allow_modify=True) else: self._write_hdf5_file() except Exception as e: self.logger.warning("NeXus file creation failed, falling back to HDF5") self.nexus = False cleanup_entry(self.archive, self.data_file) self._write_hdf5_file()

This refactoring:

Separates concerns between file operations and entry management

Simplifies the error handling flow

Makes the fallback process more explicit

Reduces code duplication

Maintains all existing functionality and safeguards

ka-sarthak · 2025-01-15T15:09:16Z

src/nomad_measurements/utils.py

+            upload_id=archive.m_context.upload_id, mainfile=mainfile
+        ).first()):
+        upload = Upload.objects(upload_id=archive.m_context.upload_id).first()
+        if upload.published:


I wonder at which point this condition will be met. If the upload is published, I think the user won't be able to run the normalization anymore.

I also assume as you think, as we are removing the entry and delete the entry. In case no blunder happens.

ka-sarthak · 2025-01-15T15:10:56Z

src/nomad_measurements/utils.py

                self._write_hdf5_file()
        else:
            self._write_hdf5_file()

+        if self.data_file.endswith('.nxs'):


Maybe we can do this right after self._write_nx_file() in line 377

In case write_nx_file fails, no need to run the process_update_raw_file it will stop in writting hdf5 file.

ka-sarthak

I still have my doubts about deleting the raw file and entry before creating a new one. In case the re-creation fails for some reason, things that might be relying on the file will break. But, in the current situation with resource contention issues, this seems to be the only thing that works.

We should definitely revisit this and raise the issue in general.

Also, for future reference, can you explain the delete_entry_file function (as PR comments or in the doc string perhaps)? For example, what things are being deleted by these: delete_by_query, entry.delete, delete_rawfiles.

TLCFEM · 2025-01-15T16:03:38Z

src/nomad_measurements/utils.py

@@ -371,28 +373,32 @@ def write_file(self):
        """
        if self.nexus:
            try:
+                delete_entry_file(archive=self.archive, mainfile=self.data_file)


How can you ensure the caller is not appending new data to existing data?
How can you ensure the caller always provides a complete file?
When the existing file is deleted, how can you tell what contents are valid and may need to be retained?

sourcery-ai bot reviewed Jan 15, 2025

View reviewed changes

ka-sarthak reviewed Jan 15, 2025

View reviewed changes

RubelMozumder mentioned this pull request Jan 15, 2025

Use hdf5 or nexus file in XRD #113

Open

6 tasks

ka-sarthak reviewed Jan 15, 2025

View reviewed changes

Handeling ES, mongo DB and raw files.

2d2709d

ka-sarthak force-pushed the ReprocessingNexusFile_2 branch from cf5b34e to 2d2709d Compare January 15, 2025 15:30

TLCFEM reviewed Jan 15, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handeling ES, mongo DB and raw files. #157

Handeling ES, mongo DB and raw files. #157

RubelMozumder commented Jan 15, 2025 •

edited

Loading

sourcery-ai bot commented Jan 15, 2025 •

edited

Loading

Interacting with Sourcery

Customizing Your Experience

Getting Help

sourcery-ai bot left a comment

sourcery-ai bot Jan 15, 2025

sourcery-ai bot Jan 15, 2025

ka-sarthak Jan 15, 2025

RubelMozumder Jan 15, 2025

ka-sarthak Jan 15, 2025

RubelMozumder Jan 15, 2025

ka-sarthak left a comment

TLCFEM Jan 15, 2025

Handeling ES, mongo DB and raw files. #157

Are you sure you want to change the base?

Handeling ES, mongo DB and raw files. #157

Conversation

RubelMozumder commented Jan 15, 2025 • edited Loading

Summary by Sourcery

sourcery-ai bot commented Jan 15, 2025 • edited Loading

Reviewer's Guide by Sourcery

Sequence diagram for file writing and deletion process

Class diagram showing updated file handling structure

File-Level Changes

Interacting with Sourcery

Customizing Your Experience

Getting Help

sourcery-ai bot left a comment

Choose a reason for hiding this comment

sourcery-ai bot Jan 15, 2025

Choose a reason for hiding this comment

sourcery-ai bot Jan 15, 2025

Choose a reason for hiding this comment

ka-sarthak Jan 15, 2025

Choose a reason for hiding this comment

RubelMozumder Jan 15, 2025

Choose a reason for hiding this comment

ka-sarthak Jan 15, 2025

Choose a reason for hiding this comment

RubelMozumder Jan 15, 2025

Choose a reason for hiding this comment

ka-sarthak left a comment

Choose a reason for hiding this comment

TLCFEM Jan 15, 2025

Choose a reason for hiding this comment

RubelMozumder commented Jan 15, 2025 •

edited

Loading

sourcery-ai bot commented Jan 15, 2025 •

edited

Loading