-
Notifications
You must be signed in to change notification settings - Fork 9
fix: extensive loading harmonization config #80
Conversation
When `Event` is init, it automatically loads the harmonization config file. This means that for every event uploaded, the entire harmonization config file is loaded and parsed. This dramatically extends the time needed to parse a single event. This PR ensures that the harmonization file is only loaded once and then used for generating all `Event` objects. Fixes: certat#79
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for finding this performance bottleneck and providing the fix!
I suggest to load the harmonization directly from the file and not to extract it from the class
Ensure that harmonization config file is loaded from file directly, instead of `Event` object.
Is any further improvement on the efficiency of submitting the files appreciated. Mainly thinking about using Python concurrent features to parallelize the generating of the |
Keep in mind that there's a fork which has not all the features of this repo (e.g. in the fork the fieldnames are fixed, here arbitrary extra.* names can be set) but its backend is completely re-written. |
I'm familiar with the fork. however, due to it being more JS heavy, we encountered more issues with it when handling very large files. We therefore shifted our focus on the original. |
Events were previously serialized serialy. However by using Python concurrent features it is possible to execute event serialization over multiple cores greatly decreasing overall execution time. The actual sending of the serialized event is done without parallelization due to threading lock issues in the Python Redis implementation. Relies on PR: certat#80
Another reasons why it's "only" a fork. However it has CSRF protection and offers authentication (same as the manager), which is a must criterium in some organizations. |
Yup, but I guess you can wrap authentication around it via reverse proxies.
@royk: I'd be super interested in an improvement. Especially in my experience the parsing detection of what is what field type is sometimes problematic.
I'd say go for it if you have the resources. Thanks a lot!!
… On 01.11.2022, at 21:26, Sebastian ***@***.***> wrote:
I'm familiar with the fork. however, due to it being more JS heavy, we encountered more issues with it when handling very large files.
Another reasons why it's "only" a fork. However it has CSRF protection and offers authentication (same as the manager), which is a must criterium in some organizations.
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are subscribed to this thread.
|
Okay awesome, could you then maybe look into the already open PR which will at least greatly increase the overall processing time for uploading events into IntelMQ. |
Events were previously serialized serialy. However by using Python concurrent features it is possible to execute event serialization over multiple cores greatly decreasing overall execution time. The actual sending of the serialized event is done without parallelization due to threading lock issues in the Python Redis implementation. Relies on PR: certat#80
When
Event
is init, it automatically loads the harmonization config file. This means that for every event uploaded, the entire harmonization config file is loaded and parsed. This dramatically extends the time needed to parse a single event.This PR ensures that the harmonization file is only loaded once and then used for generating all
Event
objects.Fixes: #79