Add Feature Filtering in Model Validation #1258

kaituo · 2024-07-02T21:27:41Z

Description

Previously, we used date histogram aggregation for feature sparsity validation and interval recommendation. But it is possible a feature itself has filters besides the detector's filter. This can cause incorrect interval recommendations. We may fail to root cause feature definition itself caused data sparsity. This PR improves feature sparsity validation and interval recommendation by incorporating feature aggregation. This prevents incorrect interval recommendations and helps identify if feature definitions are causing data sparsity. The same date range aggregation used in the cold start is now employed to retrieve data and verify sparsity.

Detailed Changes

Feature Sparsity Validation: Integrated feature aggregation in the sparsity validation and interval recommendation processes. When a feature is defined, we utilize date range aggregation, consistent with cold start methodology, to ensure accurate data retrieval and sparsity verification.
Coverly report: To fix [BUG] CodeCov is failing project-tools#87, I followed [CI] CodeCov no longer works after upgrade to v4 of the action flow-framework#490 to upgrade codecov GitHub Action dependency from v3 to v4 and and add the standard-named token.
Avoid redundant coverly report: Limited coverage upload to the macOS environment to avoid redundant uploads across Linux, macOS, and Windows builds.
Error Message Refactoring: Transferred shareable error messages from ADCommonMessages to CommonMessages.
Code Refactoring: Refactored ModelColdStart and SearchFeatureDao for reusability in model validation and interval recommendation.
Moved HistogramAggregationHelper to AggregationPrep, which now includes both histogram and date_range aggregations.
Top Entities Retrieval Logic: Improved LatestTimeRetriever logic: Previously in LatestTimeRetriever we retrieve top entities first before retrieving latest time based on current time. But it is possible that the window delay is high and we could get no top entities. We now retrieve the latest time first, then fetches top entities based on this time. The change increases the likelihood of obtaining top entities, especially when window delay is high.
Bug Fix in ModelValidationActionHandler: Fixed an issue in checkFeatureQueryDelegate where missing colons in messages caused an index out of bounds exception in BaseValidateConfigTransportAction.getFeatureSubIssuesFromErrorMessage.
Previously, in HistogramAggregationHelper.getNumberOfSamples, we queried up to 24 hours of data. This approach could result in an incorrect bucket coverage rate, especially when the historical data is limited. Even if the limited history provides 100% coverage, the bucket coverage rate might still appear low within the one day time frame. This PR updates HistogramAggregationHelper.getNumberOfSamples to limit data queries to config.getHistoryIntervals(), ensuring that the queried interval is at least the shingle size plus 32. This adjustment improves the accuracy of bucket coverage rate calculations.
Improved password generation in security tests: Current implementation may cause "Password is similar to user name". This PR improve the randomness and ensure the generated password is not similar to the username by using a Secure Random Generator and ensure that the password includes a mix of uppercase, lowercase, digits, and special characters.

Testing done:

added various integ and unit tests. Test coverage increased from 71.83% to 77.05%.

Issues Resolved

List any issues this PR will resolve, e.g. Closes [...].

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Kaituo Li <kaituo@amazon.com>

codecov · 2024-07-02T23:46:30Z

Codecov Report

Attention: Patch coverage is 68.12227% with 73 lines in your changes missing coverage. Please review.

Project coverage is 77.05%. Comparing base (3f0fc8c) to head (1e8be88).
Report is 2 commits behind head on main.

Additional details and impacted files

@@             Coverage Diff              @@
##               main    #1258      +/-   ##
============================================
+ Coverage     71.83%   77.05%   +5.22%     
- Complexity     4898     5287     +389     
============================================
  Files           518      517       -1     
  Lines         22879    22890      +11     
  Branches       2245     2239       -6     
============================================
+ Hits          16434    17639    +1205     
+ Misses         5410     4216    -1194     
  Partials       1035     1035

Flag	Coverage Δ
plugin	`77.05% <68.12%> (+5.22%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files	Coverage Δ
...a/org/opensearch/ad/constant/ADCommonMessages.java	`95.83% <ø> (ø)`
...d/rest/handler/ADModelValidationActionHandler.java	`100.00% <ø> (+100.00%)`	⬆️
...nsport/ValidateAnomalyDetectorTransportAction.java	`100.00% <ø> (ø)`
.../handler/ForecastModelValidationActionHandler.java	`100.00% <ø> (+100.00%)`	⬆️
...t/transport/ValidateForecasterTransportAction.java	`100.00% <ø> (+70.00%)`	⬆️
...opensearch/timeseries/constant/CommonMessages.java	`97.43% <ø> (ø)`
...a/org/opensearch/timeseries/ml/ModelColdStart.java	`79.25% <100.00%> (-0.95%)`	⬇️
...ch/timeseries/model/IntervalTimeConfiguration.java	`92.85% <ø> (+7.14%)`	⬆️
...opensearch/timeseries/model/TimeConfiguration.java	`100.00% <ø> (ø)`
...ansport/BaseSuggestConfigParamTransportAction.java	`68.51% <100.00%> (+44.98%)`	⬆️
... and 8 more

... and 88 files with indirect coverage changes

kaituo · 2024-07-03T00:46:37Z

WhiteSource Security Check is failing. Same for other repo (e.g., KNN) as well. Error:

Oops! An error occurred while running the Security Check.The Scanner was unable to connect to the repository or branch and clone it.
ref: https://github.com/opensearch-project/anomaly-detection/pull/1258/checks?check_run_id=26966747402

Reached out to infra team to get this resolved.

Signed-off-by: Kaituo Li <kaituo@amazon.com> (cherry picked from commit 63dacaa) Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

Signed-off-by: Kaituo Li <kaituo@amazon.com>

* Consider feature filter in model validation (#1258) Signed-off-by: Kaituo Li <kaituo@amazon.com> * follow opensearch-project/k-NN#1795 Signed-off-by: Kaituo Li <kaituo@amazon.com> --------- Signed-off-by: Kaituo Li <kaituo@amazon.com>

kaituo requested review from jmazanec15, jngz-es, saratvemulapalli, ohltyler, vamshin, VijayanB, ylwu-amzn, amitgalitz, jackiehanyang, sean-zheng-amazon, dbwiddis, owaiskazi19 and joshpalis as code owners July 2, 2024 21:27

opensearch-trigger-bot bot added infra Changes to infrastructure, testing, CI/CD, pipelines, etc. backport 2.x labels Jul 2, 2024

kaituo force-pushed the bash5 branch 2 times, most recently from df241c4 to be78b86 Compare July 2, 2024 22:16

kaituo changed the title ~~Consider feature filter in model validation~~ Add Feature Filtering in Model Validation Jul 2, 2024

Consider feature filter in model validation

1e8be88

Signed-off-by: Kaituo Li <kaituo@amazon.com>

kaituo force-pushed the bash5 branch from be78b86 to 1e8be88 Compare July 2, 2024 23:19

jackiehanyang approved these changes Jul 8, 2024

View reviewed changes

kaituo mentioned this pull request Jul 8, 2024

Checking feature queries as part of Validate API recommendation #445

Closed

kaituo merged commit 63dacaa into opensearch-project:main Jul 8, 2024
16 of 17 checks passed

opensearch-trigger-bot bot mentioned this pull request Jul 8, 2024

[Backport 2.x] Add Feature Filtering in Model Validation #1260

Closed

kaituo added a commit to kaituo/anomaly-detection-1 that referenced this pull request Jul 8, 2024

Consider feature filter in model validation (opensearch-project#1258)

f056543

Signed-off-by: Kaituo Li <kaituo@amazon.com>

kaituo mentioned this pull request Jul 8, 2024

[Backport 2.x] Add Feature Filtering in Model Validation #1261

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Feature Filtering in Model Validation #1258

Add Feature Filtering in Model Validation #1258

kaituo commented Jul 2, 2024 •

edited

Loading

codecov bot commented Jul 2, 2024

kaituo commented Jul 3, 2024

Add Feature Filtering in Model Validation #1258

Add Feature Filtering in Model Validation #1258

Conversation

kaituo commented Jul 2, 2024 • edited Loading

Description

Issues Resolved

codecov bot commented Jul 2, 2024

Codecov Report

kaituo commented Jul 3, 2024

kaituo commented Jul 2, 2024 •

edited

Loading