-
Notifications
You must be signed in to change notification settings - Fork 73
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor invocation of Action listeners in correlations #880
Refactor invocation of Action listeners in correlations #880
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #880 +/- ##
============================================
+ Coverage 24.79% 25.04% +0.25%
- Complexity 1026 1029 +3
============================================
Files 277 277
Lines 12702 12579 -123
Branches 1394 1373 -21
============================================
+ Hits 3149 3151 +2
+ Misses 9288 9164 -124
+ Partials 265 264 -1 ☔ View full report in Codecov by Sentry. |
Iterator<SearchHit> hits = response.getHits().iterator(); | ||
List<CorrelationRule> correlationRules = new ArrayList<>(); | ||
while (hits.hasNext()) { | ||
try { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Try-catch is redundant as we are using ActionListener.wrap() which catches generic Exception
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed.
CorrelationRule rule = CorrelationRule.parse(xcp, hit.getId(), hit.getVersion()); | ||
correlationRules.add(rule); | ||
} catch (IOException e) { | ||
onFailure(e); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this catch block is wrongly placed as after onFailure() we will continue iterating the loop. plz remove this try catch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Umm, that's not true. Once an exception is raised, the loop will break for this thread. Since we are closing the parent listener, so we should be good.
Anyway, I removed it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
try-catch was within for loop so the flow of control would have returned to this thread
|
||
CorrelationRule rule = CorrelationRule.parse(xcp, hit.getId(), hit.getVersion()); | ||
correlationRules.add(rule); | ||
} catch (IOException e) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add error log
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
onFailure() calls TransportCorrelateFindings.onFailures(), which will make sure that we log this, and that too once in the lifetime of the task.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we don't need to log at generic failure handling code blocks. we should log in the method so that we are able to communicate where the exception came from or pass a custom error log message to on failure (i would prefer second approach but that would require a lot more change in impl. so was suggesting simple error log. )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. For now, I have logged the following information for now:
- Exception and stack trace
- Monitor and finding id of that request
- Trace should tell us where the exception originated from.
I'm not reverting this for now, as we are missing logging completely at this point. As a long term, I created a git hub issue to take this up: #883
getValidDocuments(detectorType, indices, correlationRules, relatedDocIds, autoCorrelations); | ||
client.search(searchRequest, ActionListener.wrap(response -> { | ||
if (response.isTimedOut()) { | ||
onFailure(new OpenSearchStatusException("Search request timed out", RestStatus.REQUEST_TIMEOUT)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we throw this exception
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
onFailure() will inadvertently throw the exception in finishHim() function in TransportCorrelateFindingsAction.java
}); | ||
getValidDocuments(detectorType, indices, correlationRules, relatedDocIds, autoCorrelations); | ||
}, e -> { | ||
log.error("[CORRELATIONS] Exception encountered while searching correlation rule index for finding id {}", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actionListener.onFailure() operation is not surrounded by try-catch as it is terminal unlike onResponse which catches Generic exception. you need to add a try catch if you are not directly invoking onfailure.
plz add try catch here and anywhere else where you have some additional business logic in failure consumer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. Added in getTimesStampFeature() too.
response.getResponse().getHits().getHits(), validFields.get(idx))); | ||
} | ||
++idx; | ||
if (response.getResponse().getHits().getTotalHits().value > 0L) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we be checking for hits.length here also?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Corrected.
onFailures(e); | ||
SearchHits hits = response.getHits(); | ||
// Detectors Index hits count could be more even if we fetch one | ||
if (hits.getTotalHits().value >= 1 && hits.getHits().length > 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this check looks different in every place.
Why cant we just iterate hits with a for loop and not do any of these checks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the problem is that the usage is different everywhere. This usage only expects a single doc and probably why it was written like this. I can take a quick pass at all usages of getTotalHits() in a follow-up and fix this everywhere to make it consistent.
Removing the check on total hits entirely.
onFailures(e); | ||
} | ||
} else { | ||
onFailures(new OpenSearchStatusException("detector not found given monitor id", RestStatus.INTERNAL_SERVER_ERROR)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
log the monitor id
} | ||
correlationIndices.setupCorrelationIndex(indexTimeout, setupTimestamp, ActionListener.wrap(bulkResponse -> { | ||
if (bulkResponse.hasFailures()) { | ||
log.error(new OpenSearchStatusException(bulkResponse.toString(), RestStatus.INTERNAL_SERVER_ERROR)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
onfailure()?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We haven't been following this practice for most bulk requests. But, I checked that the setupCorrelationIndex() is simply indexing two docs, which are later needed as well. Modifying this behavior to fail here itself. @sbcd90 please verify. This is one of the cases where we observed exceptions on, metadata index not having the right docs.
Rectifying this for other calls too.
}); | ||
} | ||
client.search(searchMetadataIndexRequest, ActionListener.wrap(searchMetadataResponse -> { | ||
String id = searchMetadataResponse.getHits().getHits()[0].getId(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
size check on hits array?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added.
}, this::onFailures)); | ||
}, this::onFailures)); | ||
} catch (Exception ex) { | ||
onFailures(ex); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
log exception
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment as before
Signed-off-by: Megha Goyal <goyamegh@amazon.com>
f653429
to
5c77d65
Compare
Signed-off-by: Megha Goyal <goyamegh@amazon.com>
Signed-off-by: Megha Goyal <goyamegh@amazon.com>
getValidDocuments(detectorType, indices, correlationRules, relatedDocIds, autoCorrelations); | ||
}, e -> { | ||
try { | ||
log.error("[CORRELATIONS] Exception encountered while searching correlation rule index for finding id {}", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor: we should remove the [CORRELATIONS] prefix
The backport to
To backport manually, run these commands in your terminal: # Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add .worktrees/backport-2.x 2.x
# Navigate to the new working tree
cd .worktrees/backport-2.x
# Create a new branch
git switch --create backport/backport-880-to-2.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 ec0657d74a3b147f304e5985250f0e3d8e0e3e4b
# Push it to GitHub
git push --set-upstream origin backport/backport-880-to-2.x
# Go back to the original working tree
cd ../..
# Delete the working tree
git worktree remove .worktrees/backport-2.x Then, create a pull request where the |
The backport to
To backport manually, run these commands in your terminal: # Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add .worktrees/backport-2.11 2.11
# Navigate to the new working tree
cd .worktrees/backport-2.11
# Create a new branch
git switch --create backport/backport-880-to-2.11
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 ec0657d74a3b147f304e5985250f0e3d8e0e3e4b
# Push it to GitHub
git push --set-upstream origin backport/backport-880-to-2.11
# Go back to the original working tree
cd ../..
# Delete the working tree
git worktree remove .worktrees/backport-2.11 Then, create a pull request where the |
…roject#880) * Refactor invocation of Action listeners in correlations Signed-off-by: Megha Goyal <goyamegh@amazon.com> * Close hanging tasks in correlations workflow Signed-off-by: Megha Goyal <goyamegh@amazon.com> * Logging finding id and monitor id in error logs Signed-off-by: Megha Goyal <goyamegh@amazon.com> --------- Signed-off-by: Megha Goyal <goyamegh@amazon.com>
Signed-off-by: Joanne Wang <jowg@amazon.com> (cherry picked from commit 4d4f5e3) Co-authored-by: Joanne Wang <jowg@amazon.com> Reduce log level for informative message (opensearch-project#203) (opensearch-project#833) Signed-off-by: Enrico Tröger <enrico.troeger@uvena.de> Co-authored-by: Enrico Tröger <enrico.troeger@uvena.de> Updated alert creation following common-utils PR 584. (opensearch-project#837) (opensearch-project#839) Signed-off-by: AWSHurneyt <hurneyt@amazon.com> (cherry picked from commit 8adb9c3) Co-authored-by: AWSHurneyt <hurneyt@amazon.com> Release notes for 2.12.0 (opensearch-project#834) (opensearch-project#841) * release notes for 2.12 Signed-off-by: Joanne Wang <jowg@amazon.com> * update release notes Signed-off-by: Joanne Wang <jowg@amazon.com> * update release notes Signed-off-by: Joanne Wang <jowg@amazon.com> --------- Signed-off-by: Joanne Wang <jowg@amazon.com> (cherry picked from commit 414484a) Co-authored-by: Joanne Wang <jowg@amazon.com> Remove blocking calls and change threat intel feed flow to event driven (opensearch-project#871) (opensearch-project#876) * remove actionGet() and change threat intel feed flow to event driven Signed-off-by: Surya Sashank Nistala <snistala@amazon.com> * fix javadocs Signed-off-by: Surya Sashank Nistala <snistala@amazon.com> * revert try catch removals Signed-off-by: Surya Sashank Nistala <snistala@amazon.com> * use action listener wrap() in detector threat intel code paths Signed-off-by: Surya Sashank Nistala <snistala@amazon.com> * add try catch Signed-off-by: Surya Sashank Nistala <snistala@amazon.com> --------- Signed-off-by: Surya Sashank Nistala <snistala@amazon.com> (cherry picked from commit 172d58d) Co-authored-by: Surya Sashank Nistala <snistala@amazon.com> Fail the flow the when detectot type is missing in the log types index (opensearch-project#845) (opensearch-project#857) Signed-off-by: Megha Goyal <goyamegh@amazon.com> (cherry picked from commit 8d19912) Co-authored-by: Megha Goyal <56077967+goyamegh@users.noreply.github.com> [BUG] ArrayIndexOutOfBoundsException for inconsistent detector index behavior (opensearch-project#843) (opensearch-project#858) * Catch ArrayIndexOutOfBoundsException when detector is missing Signed-off-by: Megha Goyal <goyamegh@amazon.com> * Add a check on SearchHits.getHits() length Signed-off-by: Megha Goyal <goyamegh@amazon.com> * Remove index out of bounds exception Signed-off-by: Megha Goyal <goyamegh@amazon.com> --------- Signed-off-by: Megha Goyal <goyamegh@amazon.com> (cherry picked from commit 0ef8543) Co-authored-by: Megha Goyal <56077967+goyamegh@users.noreply.github.com> Backport opensearch-project#873 and opensearch-project#789 (opensearch-project#895) * support object fields in aggregation based sigma rules (opensearch-project#789) Signed-off-by: Subhobrata Dey <sbcd90@gmail.com> * Pass rule field names in doc level queries during monitor/creation. Remove blocking actionGet() calls (opensearch-project#873) * pass query field names in doc level queries during monitor creation/updation Signed-off-by: Surya Sashank Nistala <snistala@amazon.com> * remove actionGet() and change get index mapping call to event driven flow Signed-off-by: Surya Sashank Nistala <snistala@amazon.com> * fix chained findings monitor Signed-off-by: Surya Sashank Nistala <snistala@amazon.com> * add finding mappings Signed-off-by: Surya Sashank Nistala <snistala@amazon.com> * remove test messages from logs Signed-off-by: Surya Sashank Nistala <snistala@amazon.com> * revert build.gradle change Signed-off-by: Surya Sashank Nistala <snistala@amazon.com> --------- Signed-off-by: Surya Sashank Nistala <snistala@amazon.com> --------- Signed-off-by: Subhobrata Dey <sbcd90@gmail.com> Signed-off-by: Surya Sashank Nistala <snistala@amazon.com> Co-authored-by: Subhobrata Dey <sbcd90@gmail.com> Fix duplicate ecs mappings which returns incorrect log index field in mapping view API (opensearch-project#786) (opensearch-project#788) (opensearch-project#898) * field mapping changes * add integ test * turn unmappedfieldaliases as set and add integ test * add comments * fix integ tests * moved logic to method for better readability --------- Signed-off-by: Joanne Wang <jowg@amazon.com> Add throw for empty strings in rules with modifier contains, startwith, and endswith (opensearch-project#860) (opensearch-project#896) * add validation for empty strings with contains, startswith and endswith modifiers * throw exception if empty string with contains, startswith, or endswith * change var name * add modifiers to log --------- Signed-off-by: Joanne Wang <jowg@amazon.com> Add an "exists" check for "not" condition in sigma rules (opensearch-project#852) (opensearch-project#897) * test design Signed-off-by: Joanne Wang <jowg@amazon.com> * working version Signed-off-by: Joanne Wang <jowg@amazon.com> * cleaning up Signed-off-by: Joanne Wang <jowg@amazon.com> * testing Signed-off-by: Joanne Wang <jowg@amazon.com> * working version Signed-off-by: Joanne Wang <jowg@amazon.com> * working version Signed-off-by: Joanne Wang <jowg@amazon.com> * refactored querybackend Signed-off-by: Joanne Wang <jowg@amazon.com> * working on tests Signed-off-by: Joanne Wang <jowg@amazon.com> * fixed alerting and finding tests Signed-off-by: Joanne Wang <jowg@amazon.com> * fix correlation tests Signed-off-by: Joanne Wang <jowg@amazon.com> * working all tests Signed-off-by: Joanne Wang <jowg@amazon.com> * moved test and changed alias for adldap Signed-off-by: Joanne Wang <jowg@amazon.com> * added more tests Signed-off-by: Joanne Wang <jowg@amazon.com> * cleanup code Signed-off-by: Joanne Wang <jowg@amazon.com> * remove exists flag Signed-off-by: Joanne Wang <jowg@amazon.com> --------- Signed-off-by: Joanne Wang <jowg@amazon.com> (cherry picked from commit 656a5fe) Co-authored-by: Joanne Wang <jowg@amazon.com> Add goyamegh as a maintainer (opensearch-project#868) (opensearch-project#899) Signed-off-by: Megha Goyal <goyamegh@amazon.com> Refactor invocation of Action listeners in correlations (opensearch-project#880) (opensearch-project#900) * Refactor invocation of Action listeners in correlations * Close hanging tasks in correlations workflow * Logging finding id and monitor id in error logs --------- Signed-off-by: Megha Goyal <goyamegh@amazon.com> Add search request timeouts for correlations workflows (opensearch-project#893) (opensearch-project#901) * Reinstating more leaks plugged-in for correlations workflows Signed-off-by: Megha Goyal <goyamegh@amazon.com> * Add search timeouts to all correlation searches Signed-off-by: Megha Goyal <goyamegh@amazon.com> * Fix logging and exception messages Signed-off-by: Megha Goyal <goyamegh@amazon.com> * Change search timeout to 30 seconds Signed-off-by: Megha Goyal <goyamegh@amazon.com> --------- Signed-off-by: Megha Goyal <goyamegh@amazon.com> (cherry picked from commit 75c4429) Co-authored-by: Megha Goyal <56077967+goyamegh@users.noreply.github.com>
The backport to
To backport manually, run these commands in your terminal: # Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add .worktrees/backport-2.9 2.9
# Navigate to the new working tree
cd .worktrees/backport-2.9
# Create a new branch
git switch --create backport/backport-880-to-2.9
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 ec0657d74a3b147f304e5985250f0e3d8e0e3e4b
# Push it to GitHub
git push --set-upstream origin backport/backport-880-to-2.9
# Go back to the original working tree
cd ../..
# Delete the working tree
git worktree remove .worktrees/backport-2.9 Then, create a pull request where the |
The backport to
To backport manually, run these commands in your terminal: # Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add .worktrees/backport-2.7 2.7
# Navigate to the new working tree
cd .worktrees/backport-2.7
# Create a new branch
git switch --create backport/backport-880-to-2.7
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 ec0657d74a3b147f304e5985250f0e3d8e0e3e4b
# Push it to GitHub
git push --set-upstream origin backport/backport-880-to-2.7
# Go back to the original working tree
cd ../..
# Delete the working tree
git worktree remove .worktrees/backport-2.7 Then, create a pull request where the |
* Refactor invocation of Action listeners in correlations Signed-off-by: Megha Goyal <goyamegh@amazon.com> * Close hanging tasks in correlations workflow Signed-off-by: Megha Goyal <goyamegh@amazon.com> * Logging finding id and monitor id in error logs Signed-off-by: Megha Goyal <goyamegh@amazon.com> --------- Signed-off-by: Megha Goyal <goyamegh@amazon.com>
…earch-project#878) (opensearch-project#880) Signed-off-by: Subhobrata Dey <sbcd90@gmail.com>
Description
This PR is intended to fix the hanging tasks observed in _cat/tasks by refactoring the correlation workflow to ensure timely closure of parent action listeners upon successful completion or encountering exceptions, and consolidating exception handling logic into a centralized function. The aim is to optimize task management efficiency and enhance the overall reliability of our system.
This logic has been tested against a high indexing workload ( approx. 1 M docs/ minute) where the issue was observed prominently in a cluster of 3 or more data nodes, and generating findings with the help of a Cloudtrail logs detector running all 32 pre-packaged rules at a frequency of 1 minute. Further, the correlations were generated with the help of a single rule on the same log type for testing, where the findings are generated at a rate of 1~2k per minute.
Issues Resolved
#879
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.