Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feat] Deduplicate spans based upon their hashcode #6009

Merged
merged 19 commits into from
Sep 24, 2024

Conversation

cdanis
Copy link
Contributor

@cdanis cdanis commented Sep 23, 2024

Which problem is this PR solving?

Description of the changes

  • Rename the Zipkin-legacy span "deduper" to a less confusing name
  • Add a real span deduper that deduplicates based on span hashcodes
  • Sort tags in spans before we attempt deduplication, so hashcode is deterministic

How was this change tested?

  • Unit tested

Checklist

@cdanis cdanis requested a review from a team as a code owner September 23, 2024 14:33
@dosubot dosubot bot added the enhancement label Sep 23, 2024
@cdanis cdanis force-pushed the dupe-dedupe branch 2 times, most recently from fd74590 to 888ca59 Compare September 23, 2024 16:52
@cdanis
Copy link
Contributor Author

cdanis commented Sep 23, 2024

@yurishkuro as discussed! let me know if you want anything else changed

@cdanis cdanis force-pushed the dupe-dedupe branch 2 times, most recently from fa8c498 to 8022f67 Compare September 23, 2024 17:12
Copy link

codecov bot commented Sep 23, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 96.79%. Comparing base (d243452) to head (3dcc05c).
Report is 1 commits behind head on main.

Additional details and impacted files
@@             Coverage Diff             @@
##             main    #6009       +/-   ##
===========================================
+ Coverage   51.48%   96.79%   +45.31%     
===========================================
  Files         176      349      +173     
  Lines        8875    16582     +7707     
===========================================
+ Hits         4569    16050    +11481     
+ Misses       3863      343     -3520     
+ Partials      443      189      -254     
Flag Coverage Δ
badger_v1 8.00% <0.00%> (-0.02%) ⬇️
badger_v2 1.82% <0.00%> (-0.01%) ⬇️
cassandra-4.x-v1 16.56% <0.00%> (-0.04%) ⬇️
cassandra-4.x-v2 1.75% <0.00%> (-0.01%) ⬇️
cassandra-5.x-v1 16.56% <0.00%> (-0.04%) ⬇️
cassandra-5.x-v2 1.75% <0.00%> (-0.01%) ⬇️
elasticsearch-6.x-v1 18.72% <0.00%> (-0.06%) ⬇️
elasticsearch-7.x-v1 18.79% <0.00%> (-0.05%) ⬇️
elasticsearch-8.x-v1 18.98% <0.00%> (-0.04%) ⬇️
elasticsearch-8.x-v2 1.82% <0.00%> (-0.01%) ⬇️
grpc_v1 9.52% <7.40%> (-0.04%) ⬇️
grpc_v2 7.13% <7.40%> (-0.02%) ⬇️
kafka-v1 9.71% <7.40%> (-0.03%) ⬇️
kafka-v2 1.82% <0.00%> (-0.01%) ⬇️
memory_v2 1.82% <0.00%> (-0.01%) ⬇️
opensearch-1.x-v1 18.83% <0.00%> (-0.05%) ⬇️
opensearch-2.x-v1 18.84% <0.00%> (-0.05%) ⬇️
opensearch-2.x-v2 1.81% <0.00%> (-0.01%) ⬇️
tailsampling-processor 0.46% <0.00%> (-0.01%) ⬇️
unittests 95.27% <100.00%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@cdanis cdanis changed the title Deduplicate spans based upon their spanID [bugfix] Deduplicate spans based upon their hashcode Sep 23, 2024
Signed-off-by: Chris Danis <cdanis@wikimedia.org>
Signed-off-by: Chris Danis <cdanis@wikimedia.org>
Signed-off-by: Chris Danis <cdanis@wikimedia.org>
Signed-off-by: Chris Danis <cdanis@wikimedia.org>
Signed-off-by: Chris Danis <cdanis@wikimedia.org>
Signed-off-by: Chris Danis <cdanis@wikimedia.org>
Signed-off-by: Chris Danis <cdanis@wikimedia.org>
Signed-off-by: Chris Danis <cdanis@wikimedia.org>
Signed-off-by: Chris Danis <cdanis@wikimedia.org>
Signed-off-by: Chris Danis <cdanis@wikimedia.org>
Signed-off-by: Chris Danis <cdanis@wikimedia.org>
Signed-off-by: Chris Danis <cdanis@wikimedia.org>
@cdanis
Copy link
Contributor Author

cdanis commented Sep 23, 2024

PTAL :D

Signed-off-by: Chris Danis <cdanis@wikimedia.org>
model/adjuster/span_hash_deduper_test.go Outdated Show resolved Hide resolved
model/adjuster/span_hash_deduper.go Outdated Show resolved Hide resolved
model/adjuster/span_hash_deduper.go Outdated Show resolved Hide resolved
cdanis and others added 3 commits September 24, 2024 11:12
Co-authored-by: Yuri Shkuro <yurishkuro@users.noreply.github.com>
Signed-off-by: Chris Danis <cdanis@gmail.com>
Co-authored-by: Yuri Shkuro <yurishkuro@users.noreply.github.com>
Signed-off-by: Chris Danis <cdanis@gmail.com>
Signed-off-by: Chris Danis <cdanis@wikimedia.org>
@yurishkuro yurishkuro changed the title [bugfix] Deduplicate spans based upon their hashcode [feat] Deduplicate spans based upon their hashcode Sep 24, 2024
Co-authored-by: Yuri Shkuro <yurishkuro@users.noreply.github.com>
Signed-off-by: Chris Danis <cdanis@gmail.com>
Copy link
Member

@yurishkuro yurishkuro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@yurishkuro yurishkuro enabled auto-merge (squash) September 24, 2024 15:44
Signed-off-by: Chris Danis <cdanis@wikimedia.org>
auto-merge was automatically disabled September 24, 2024 15:59

Head branch was pushed to by a user without write access

@cdanis
Copy link
Contributor Author

cdanis commented Sep 24, 2024

added additional tests because patch code coverage failed

@yurishkuro yurishkuro merged commit 7c6ed87 into jaegertracing:main Sep 24, 2024
49 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug]: ElasticSearch storage not deduplicating multiply-archived spans
2 participants