Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Enhancement]Added a metric for geo replication for tracking replicated subscriptions snapshot timeouts #22381

Merged
merged 5 commits into from
Oct 10, 2024

Conversation

nikam14
Copy link
Contributor

@nikam14 nikam14 commented Mar 29, 2024

Fixes #21793

Motivation

Geo replication replicated subscriptions (PIP-33) snapshot creation might time out.
The code contains a debug log message when this happens
When this happens, the subscription state won't be reflected on the remote side and a backlog would build up.
There's no metric to detect this situation.

Modifications

Add a new metric pulsar_replicated_subscriptions_snapshot_timeouts which is a counter (that only resets when the broker restarts).

Verifying this change

  • Make sure that the change passes the CI checks.

Does this pull request potentially affect one of the following parts:

If the box was checked, please highlight the changes

  • Dependencies (add or upgrade a dependency)
  • The public API
  • The schema
  • The default values of configurations
  • The threading model
  • The binary protocol
  • The REST endpoints
  • The admin CLI options
  • The metrics
  • Anything that affects deployment

Documentation

  • doc
  • doc-required
  • doc-not-needed
  • doc-complete

Matching PR in forked repository

PR in forked repository:

Copy link

@nikam14 Please add the following content to your PR description and select a checkbox:

- [ ] `doc` <!-- Your PR contains doc changes -->
- [ ] `doc-required` <!-- Your PR changes impact docs and you will update later -->
- [ ] `doc-not-needed` <!-- Your PR changes do not impact docs -->
- [ ] `doc-complete` <!-- Docs have been already added -->

@dao-jun dao-jun requested a review from asafm March 29, 2024 07:43
@dao-jun dao-jun added doc-required Your PR changes impact docs and you will update later. type/enhancement The enhancements for the existing features or docs. e.g. reduce memory usage of the delayed messages area/metrics area/broker and removed doc-label-missing labels Mar 29, 2024
@dao-jun dao-jun added this to the 3.3.0 milestone Mar 29, 2024
@dao-jun dao-jun requested a review from lhotari March 29, 2024 07:44
@github-actions github-actions bot added doc-label-missing and removed doc-required Your PR changes impact docs and you will update later. labels Mar 29, 2024
@dao-jun
Copy link
Member

dao-jun commented Mar 29, 2024

  1. it's better to use OpenTelemetry
  2. we need a proposal

@lhotari
Copy link
Member

lhotari commented Mar 29, 2024

  1. it's better to use OpenTelemetry
  2. we need a proposal

@dao-jun we don't have Otel in use yet. Yes, we can handle this in a proposal.

@github-actions github-actions bot added doc-required Your PR changes impact docs and you will update later. and removed doc-label-missing labels Mar 29, 2024
Copy link
Member

@lhotari lhotari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I think that the lack of this metric could be considered a significant problem is replicated subscription observability and should be added to LTS version.

@lhotari
Copy link
Member

lhotari commented Mar 29, 2024

Thanks for the contribution @nikam14 !

Copy link
Contributor

@codelipenghui codelipenghui left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please help add an unit test to avoid the regression?

@asafm
Copy link
Contributor

asafm commented Mar 31, 2024

FYI @dragosvictor

@coderzc coderzc modified the milestones: 3.3.0, 3.4.0 May 8, 2024
@lhotari lhotari added the release/blocker Indicate the PR or issue that should block the release until it gets resolved label Oct 10, 2024
@codecov-commenter
Copy link

codecov-commenter commented Oct 10, 2024

Codecov Report

Attention: Patch coverage is 75.00000% with 1 line in your changes missing coverage. Please review.

Project coverage is 74.33%. Comparing base (bbc6224) to head (5b69685).
Report is 662 commits behind head on master.

Files with missing lines Patch % Lines
.../persistent/ReplicatedSubscriptionsController.java 75.00% 1 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@             Coverage Diff              @@
##             master   #22381      +/-   ##
============================================
+ Coverage     73.57%   74.33%   +0.75%     
- Complexity    32624    34963    +2339     
============================================
  Files          1877     1952      +75     
  Lines        139502   147139    +7637     
  Branches      15299    16197     +898     
============================================
+ Hits         102638   109369    +6731     
- Misses        28908    29335     +427     
- Partials       7956     8435     +479     
Flag Coverage Δ
inttests 27.58% <0.00%> (+2.99%) ⬆️
systests 24.36% <0.00%> (+0.04%) ⬆️
unittests 73.69% <75.00%> (+0.84%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
.../persistent/ReplicatedSubscriptionsController.java 70.54% <75.00%> (-2.05%) ⬇️

... and 634 files with indirect coverage changes

@lhotari lhotari merged commit 667904c into apache:master Oct 10, 2024
52 checks passed
hanmz pushed a commit to hanmz/pulsar that referenced this pull request Feb 12, 2025
…ed subscriptions snapshot timeouts (apache#22381)

Co-authored-by: Lari Hotari <lhotari@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/broker area/metrics doc-required Your PR changes impact docs and you will update later. release/blocker Indicate the PR or issue that should block the release until it gets resolved type/enhancement The enhancements for the existing features or docs. e.g. reduce memory usage of the delayed messages
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add a metric for geo replication for tracking replicated subscriptions snapshot timeouts
7 participants