Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve performance of encoding composite keys in multi-term aggregations #9412

Merged
merged 4 commits into from
Aug 18, 2023

Conversation

ketanv3
Copy link
Contributor

@ketanv3 ketanv3 commented Aug 17, 2023

Description

Composite keys of arbitrary Java objects are encoded using the StreamOutput::writeGenericValue method (here) which is expensive as it involves a series of "instanceof" checks and map lookups to identify the registered writer. This bottleneck adds up as the number of composite keys is proportional to the number of hits, number of multi-term fields, and the number of field values.

This PR improves the encoding performance by:

  1. Removing unnecessary writer lookups.
  2. Reusing partial encodings of shared prefixes to avoid wasteful work.

Improvement

Improvement is proportional to the number of hits, fields, and field values. The improvement is significant!

Test p50 before p90 before p50 after p90 after
PR #2687 / mix 5500 ms 5605 ms 4023 ms 4185 ms
PR #2687 / numeric 1688 ms 1756 ms 538 ms 591 ms
OSB / http_logs / multi_term_agg 3204 ms 3243 ms 2241 ms 2282 ms

Related Issues

#8710

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Commits are signed per the DCO using --signoff
  • Commit changes are listed out in CHANGELOG.md file (See: Changelog)

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

…ions

Signed-off-by: Ketan Verma <ketan9495@gmail.com>
@opensearch-trigger-bot
Copy link
Contributor

Compatibility status:

Checks if related components are compatible with change e1c40b4

Incompatible components

Incompatible components: [https://github.com/opensearch-project/index-management.git, https://github.com/opensearch-project/asynchronous-search.git, https://github.com/opensearch-project/security-analytics.git]

Skipped components

Compatible components

Compatible components: [https://github.com/opensearch-project/security.git, https://github.com/opensearch-project/alerting.git, https://github.com/opensearch-project/anomaly-detection.git, https://github.com/opensearch-project/sql.git, https://github.com/opensearch-project/job-scheduler.git, https://github.com/opensearch-project/common-utils.git, https://github.com/opensearch-project/observability.git, https://github.com/opensearch-project/k-nn.git, https://github.com/opensearch-project/reporting.git, https://github.com/opensearch-project/cross-cluster-replication.git, https://github.com/opensearch-project/geospatial.git, https://github.com/opensearch-project/notifications.git, https://github.com/opensearch-project/performance-analyzer.git, https://github.com/opensearch-project/neural-search.git, https://github.com/opensearch-project/ml-commons.git, https://github.com/opensearch-project/performance-analyzer-rca.git, https://github.com/opensearch-project/opensearch-oci-object-storage.git]

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@codecov
Copy link

codecov bot commented Aug 17, 2023

Codecov Report

Merging #9412 (3178f37) into main (6a5b464) will decrease coverage by 0.04%.
The diff coverage is 94.64%.

@@             Coverage Diff              @@
##               main    #9412      +/-   ##
============================================
- Coverage     71.12%   71.08%   -0.04%     
  Complexity    57417    57417              
============================================
  Files          4776     4776              
  Lines        270742   270759      +17     
  Branches      39578    39577       -1     
============================================
- Hits         192558   192478      -80     
- Misses        62044    62120      +76     
- Partials      16140    16161      +21     
Files Changed Coverage Δ
...ggregations/bucket/terms/MultiTermsAggregator.java 82.07% <93.75%> (+2.88%) ⬆️
...opensearch/core/common/io/stream/StreamOutput.java 94.88% <100.00%> (-0.79%) ⬇️

... and 428 files with indirect coverage changes

Copy link
Collaborator

@nknize nknize left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Big heart here for this change! Nice clean up not having to deal with all of this introspection.

I left one minor nitpick purely out of preference but I won't block the PR for it.

Thx for this clean change!

…Class

Signed-off-by: Ketan Verma <ketan9495@gmail.com>
@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

Copy link
Contributor

@backslasht backslasht left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice improvement 🥇
Looks good to me. Thanks @ketanv3

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@opensearch-trigger-bot
Copy link
Contributor

Compatibility status:

Checks if related components are compatible with change 6a5b464

Incompatible components

Incompatible components: [https://github.com/opensearch-project/index-management.git, https://github.com/opensearch-project/security-analytics.git, https://github.com/opensearch-project/asynchronous-search.git]

Skipped components

Compatible components

Compatible components: [https://github.com/opensearch-project/geospatial.git, https://github.com/opensearch-project/security.git, https://github.com/opensearch-project/notifications.git, https://github.com/opensearch-project/neural-search.git, https://github.com/opensearch-project/sql.git, https://github.com/opensearch-project/job-scheduler.git, https://github.com/opensearch-project/opensearch-oci-object-storage.git, https://github.com/opensearch-project/observability.git, https://github.com/opensearch-project/k-nn.git, https://github.com/opensearch-project/alerting.git, https://github.com/opensearch-project/cross-cluster-replication.git, https://github.com/opensearch-project/anomaly-detection.git, https://github.com/opensearch-project/performance-analyzer.git, https://github.com/opensearch-project/common-utils.git, https://github.com/opensearch-project/ml-commons.git, https://github.com/opensearch-project/performance-analyzer-rca.git, https://github.com/opensearch-project/reporting.git]

@reta reta merged commit 46f2bd0 into opensearch-project:main Aug 18, 2023
11 checks passed
@reta reta added the backport 2.x Backport to 2.x branch label Aug 18, 2023
opensearch-trigger-bot bot pushed a commit that referenced this pull request Aug 18, 2023
…ions (#9412)

* Improve performance of encoding composite keys in multi-term aggregations

Signed-off-by: Ketan Verma <ketan9495@gmail.com>

* Rename StreamOutput::getGenericValueWriter to getGenericValueWriterByClass

Signed-off-by: Ketan Verma <ketan9495@gmail.com>

* Rename StreamOutput::getGenericValueWriterByClass to getWriter and remove unused code

Signed-off-by: Ketan Verma <ketan9495@gmail.com>

---------

Signed-off-by: Ketan Verma <ketan9495@gmail.com>
(cherry picked from commit 46f2bd0)
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
reta pushed a commit that referenced this pull request Aug 18, 2023
…ions (#9412) (#9434)

* Improve performance of encoding composite keys in multi-term aggregations



* Rename StreamOutput::getGenericValueWriter to getGenericValueWriterByClass



* Rename StreamOutput::getGenericValueWriterByClass to getWriter and remove unused code



---------


(cherry picked from commit 46f2bd0)

Signed-off-by: Ketan Verma <ketan9495@gmail.com>
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
austintlee pushed a commit to austintlee/OpenSearch that referenced this pull request Aug 25, 2023
…ions (opensearch-project#9412)

* Improve performance of encoding composite keys in multi-term aggregations

Signed-off-by: Ketan Verma <ketan9495@gmail.com>

* Rename StreamOutput::getGenericValueWriter to getGenericValueWriterByClass

Signed-off-by: Ketan Verma <ketan9495@gmail.com>

* Rename StreamOutput::getGenericValueWriterByClass to getWriter and remove unused code

Signed-off-by: Ketan Verma <ketan9495@gmail.com>

---------

Signed-off-by: Ketan Verma <ketan9495@gmail.com>
Gaganjuneja pushed a commit to Gaganjuneja/OpenSearch that referenced this pull request Aug 28, 2023
…ions (opensearch-project#9412)

* Improve performance of encoding composite keys in multi-term aggregations

Signed-off-by: Ketan Verma <ketan9495@gmail.com>

* Rename StreamOutput::getGenericValueWriter to getGenericValueWriterByClass

Signed-off-by: Ketan Verma <ketan9495@gmail.com>

* Rename StreamOutput::getGenericValueWriterByClass to getWriter and remove unused code

Signed-off-by: Ketan Verma <ketan9495@gmail.com>

---------

Signed-off-by: Ketan Verma <ketan9495@gmail.com>
Gaganjuneja pushed a commit to Gaganjuneja/OpenSearch that referenced this pull request Aug 28, 2023
…ions (opensearch-project#9412)

* Improve performance of encoding composite keys in multi-term aggregations

Signed-off-by: Ketan Verma <ketan9495@gmail.com>

* Rename StreamOutput::getGenericValueWriter to getGenericValueWriterByClass

Signed-off-by: Ketan Verma <ketan9495@gmail.com>

* Rename StreamOutput::getGenericValueWriterByClass to getWriter and remove unused code

Signed-off-by: Ketan Verma <ketan9495@gmail.com>

---------

Signed-off-by: Ketan Verma <ketan9495@gmail.com>
Gaganjuneja pushed a commit to Gaganjuneja/OpenSearch that referenced this pull request Aug 28, 2023
…ions (opensearch-project#9412)

* Improve performance of encoding composite keys in multi-term aggregations

Signed-off-by: Ketan Verma <ketan9495@gmail.com>

* Rename StreamOutput::getGenericValueWriter to getGenericValueWriterByClass

Signed-off-by: Ketan Verma <ketan9495@gmail.com>

* Rename StreamOutput::getGenericValueWriterByClass to getWriter and remove unused code

Signed-off-by: Ketan Verma <ketan9495@gmail.com>

---------

Signed-off-by: Ketan Verma <ketan9495@gmail.com>
Signed-off-by: Gagan Juneja <gjjuneja@amazon.com>
Gaganjuneja pushed a commit to Gaganjuneja/OpenSearch that referenced this pull request Aug 28, 2023
…ions (opensearch-project#9412)

* Improve performance of encoding composite keys in multi-term aggregations

Signed-off-by: Ketan Verma <ketan9495@gmail.com>

* Rename StreamOutput::getGenericValueWriter to getGenericValueWriterByClass

Signed-off-by: Ketan Verma <ketan9495@gmail.com>

* Rename StreamOutput::getGenericValueWriterByClass to getWriter and remove unused code

Signed-off-by: Ketan Verma <ketan9495@gmail.com>

---------

Signed-off-by: Ketan Verma <ketan9495@gmail.com>
Signed-off-by: Gagan Juneja <gjjuneja@amazon.com>
kkmr pushed a commit to kkmr/OpenSearch that referenced this pull request Aug 28, 2023
…ions (opensearch-project#9412)

* Improve performance of encoding composite keys in multi-term aggregations

Signed-off-by: Ketan Verma <ketan9495@gmail.com>

* Rename StreamOutput::getGenericValueWriter to getGenericValueWriterByClass

Signed-off-by: Ketan Verma <ketan9495@gmail.com>

* Rename StreamOutput::getGenericValueWriterByClass to getWriter and remove unused code

Signed-off-by: Ketan Verma <ketan9495@gmail.com>

---------

Signed-off-by: Ketan Verma <ketan9495@gmail.com>
Signed-off-by: Kiran Reddy <kkreddy@amazon.com>
kaushalmahi12 pushed a commit to kaushalmahi12/OpenSearch that referenced this pull request Sep 12, 2023
…ions (opensearch-project#9412)

* Improve performance of encoding composite keys in multi-term aggregations

Signed-off-by: Ketan Verma <ketan9495@gmail.com>

* Rename StreamOutput::getGenericValueWriter to getGenericValueWriterByClass

Signed-off-by: Ketan Verma <ketan9495@gmail.com>

* Rename StreamOutput::getGenericValueWriterByClass to getWriter and remove unused code

Signed-off-by: Ketan Verma <ketan9495@gmail.com>

---------

Signed-off-by: Ketan Verma <ketan9495@gmail.com>
Signed-off-by: Kaushal Kumar <ravi.kaushal97@gmail.com>
brusic pushed a commit to brusic/OpenSearch that referenced this pull request Sep 25, 2023
…ions (opensearch-project#9412)

* Improve performance of encoding composite keys in multi-term aggregations

Signed-off-by: Ketan Verma <ketan9495@gmail.com>

* Rename StreamOutput::getGenericValueWriter to getGenericValueWriterByClass

Signed-off-by: Ketan Verma <ketan9495@gmail.com>

* Rename StreamOutput::getGenericValueWriterByClass to getWriter and remove unused code

Signed-off-by: Ketan Verma <ketan9495@gmail.com>

---------

Signed-off-by: Ketan Verma <ketan9495@gmail.com>
Signed-off-by: Ivan Brusic <ivan.brusic@flocksafety.com>
shiv0408 pushed a commit to Gaurav614/OpenSearch that referenced this pull request Apr 25, 2024
…ions (opensearch-project#9412)

* Improve performance of encoding composite keys in multi-term aggregations

Signed-off-by: Ketan Verma <ketan9495@gmail.com>

* Rename StreamOutput::getGenericValueWriter to getGenericValueWriterByClass

Signed-off-by: Ketan Verma <ketan9495@gmail.com>

* Rename StreamOutput::getGenericValueWriterByClass to getWriter and remove unused code

Signed-off-by: Ketan Verma <ketan9495@gmail.com>

---------

Signed-off-by: Ketan Verma <ketan9495@gmail.com>
Signed-off-by: Shivansh Arora <hishiv@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 2.x Backport to 2.x branch
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants