-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixes log deduplication when mutating Labels using LogQL #5289
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
jeschkies
reviewed
Feb 1, 2022
chaudum
reviewed
Feb 2, 2022
owen-d
approved these changes
Feb 3, 2022
sandeepsukhani
approved these changes
Feb 4, 2022
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Nice work!
cyriltovena
force-pushed
the
label-hash
branch
from
February 4, 2022 08:16
cdf0012
to
1c3936f
Compare
3 tasks
KMiller-Grafana
pushed a commit
to KMiller-Grafana/loki
that referenced
this pull request
Feb 4, 2022
* Changes the iterator interface to include labelshash. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Improve unused args and fixes build Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Implement the new hash in the memchunk. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Add the hash in logproto Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Insert the hash from ingester-querier Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Rename LabelsHash to StreamHash Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Working through fixing tests. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Refactor HeapIterator into merge and sort Iterator. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * lint Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Order alphabetically when ordering samples. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Do not JSON encode the hash for legacy model. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Add more regression tests. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Update CHANGELOG.md Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Review feedback
3 tasks
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What this PR does / why we need it:
Labels for each sample and log were used to find and remove duplicates when iterating over them. However since those labels can be mutated by the LogQL engine for optimization or by the users (
label_format
,sum()
) the deduplication wasn't consistent anymore.This introduce a new StreamHash function that contains the original stream label hash and can be used for deduplication purpose. This ensure the deduplication is consistent and independent of the queries made.
This PR is built on #5281 which separate the sorting logic to the dedupe logic allowing to have different behaviour, this was required to make sharding work.
Which issue(s) this PR fixes:
Fixes #4500
Example of metric queries with duplicates before and after:
The last hour is still buggy because I did not deploy the ingesters during my test.
Special notes for your reviewer:
Checklist
CHANGELOG.md
about the changes.