-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handeling ES, mongo DB and raw files. #157
base: write-nexus-section
Are you sure you want to change the base?
Conversation
Reviewer's Guide by SourceryThis pull request implements handling of Elastic Search, MongoDB, and raw files by adding a new function Sequence diagram for file writing and deletion processsequenceDiagram
participant Client
participant WriteFile
participant DeleteEntry
participant ES as ElasticSearch
participant MongoDB
participant FileSystem
Client->>WriteFile: write_file()
alt nexus file
WriteFile->>DeleteEntry: delete_entry_file()
DeleteEntry->>ES: delete_by_query()
DeleteEntry->>MongoDB: delete entry
DeleteEntry->>FileSystem: delete raw files
WriteFile->>WriteFile: _write_nx_file()
WriteFile->>FileSystem: process_updated_raw_file()
else h5 file
WriteFile->>WriteFile: _write_hdf5_file()
end
Class diagram showing updated file handling structureclassDiagram
class NXFileGenerationError{
+Exception
}
class FileWriter{
-archive
-data_file
-nexus: bool
-hdf5_data_dict
+write_file()
-_write_nx_file()
-_write_hdf5_file()
}
class DeleteEntryFile{
+delete_entry_file(archive, mainfile, delete_entry)
}
FileWriter ..> DeleteEntryFile: uses
FileWriter --> NXFileGenerationError: throws
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @RubelMozumder - I've reviewed your changes - here's some feedback:
Overall Comments:
- Remove the commented out debug code at the top of the file
- The delete_entry_file function is missing a docstring. Please add documentation explaining the purpose, parameters, and behavior of this function, especially since it handles critical file deletion operations.
Here's what I looked at during the review
- 🟡 General issues: 1 issue found
- 🟢 Security: all looks good
- 🟢 Testing: all looks good
- 🟡 Complexity: 1 issue found
- 🟢 Documentation: all looks good
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
).first()): | ||
upload = Upload.objects(upload_id=archive.m_context.upload_id).first() | ||
if upload.published: | ||
raise PermissionError('Published entry cannot be deleted.') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue (bug_risk): Handle PermissionError when attempting to modify published entries
The PermissionError thrown when attempting to modify published entries isn't caught in write_file(). This could cause unexpected failures in production.
@@ -52,6 +52,8 @@ | |||
) | |||
|
|||
|
|||
# import debugpy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue (complexity): Consider simplifying file handling by separating entry cleanup and improving error handling logic.
The file handling could be simplified while maintaining all functionality by:
- Separating entry cleanup from file operations:
def cleanup_entry(archive, mainfile):
"""Handle entry cleanup in database"""
from nomad.processing import Entry
entry = Entry.objects(upload_id=archive.m_context.upload_id, mainfile=mainfile).first()
if entry:
if Upload.objects(upload_id=archive.m_context.upload_id).first().published:
raise PermissionError('Published entry cannot be deleted.')
delete_by_query({'entry_id': entry.entry_id})
entry.delete()
def write_file(self):
"""Simplified file writing with clear error handling"""
try:
if self.nexus:
self.archive.m_context.delete_raw_file(self.data_file) # Clean slate
self._write_nx_file()
self.archive.m_context.process_updated_raw_file(self.data_file, allow_modify=True)
else:
self._write_hdf5_file()
except Exception as e:
self.logger.warning("NeXus file creation failed, falling back to HDF5")
self.nexus = False
cleanup_entry(self.archive, self.data_file)
self._write_hdf5_file()
This refactoring:
- Separates concerns between file operations and entry management
- Simplifies the error handling flow
- Makes the fallback process more explicit
- Reduces code duplication
- Maintains all existing functionality and safeguards
upload_id=archive.m_context.upload_id, mainfile=mainfile | ||
).first()): | ||
upload = Upload.objects(upload_id=archive.m_context.upload_id).first() | ||
if upload.published: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder at which point this condition will be met. If the upload is published, I think the user won't be able to run the normalization anymore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also assume as you think, as we are removing the entry and delete the entry. In case no blunder happens.
self._write_hdf5_file() | ||
else: | ||
self._write_hdf5_file() | ||
|
||
if self.data_file.endswith('.nxs'): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can do this right after self._write_nx_file()
in line 377
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In case write_nx_file fails, no need to run the process_update_raw_file it will stop in writting hdf5 file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still have my doubts about deleting the raw file and entry before creating a new one. In case the re-creation fails for some reason, things that might be relying on the file will break. But, in the current situation with resource contention issues, this seems to be the only thing that works.
We should definitely revisit this and raise the issue in general.
Also, for future reference, can you explain the delete_entry_file function (as PR comments or in the doc string perhaps)? For example, what things are being deleted by these: delete_by_query
, entry.delete
, delete_rawfiles
.
cf5b34e
to
2d2709d
Compare
@@ -371,28 +373,32 @@ def write_file(self): | |||
""" | |||
if self.nexus: | |||
try: | |||
delete_entry_file(archive=self.archive, mainfile=self.data_file) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How can you ensure the caller is not appending new data to existing data?
How can you ensure the caller always provides a complete file?
When the existing file is deleted, how can you tell what contents are valid and may need to be retained?
Duplicate PR for PR: #156.
Summary by Sourcery
Enhancements: