Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KAFKA-18733: Updating share group metrics (1/N) #18826

Merged
merged 9 commits into from
Feb 14, 2025

Conversation

apoorvmittal10
Copy link
Collaborator

@apoorvmittal10 apoorvmittal10 commented Feb 6, 2025

The PR does the following:

  • Refactors ShareGroupMetrics out of SharePartitionManager as there are additional metrics need to be defined, as per KIP-1103, where ShareGroupMetrics is required to be pass to DelayedShareFetch, etc.
  • As the KIP-1103 defines metrics which are Histogram in ShareGroupMetrics, also the BrokerTopicMetrics are Yammer metrics hence to maintian consistency, I moved existing ShareGroupMetrics (defined in KIP-932) also to Yammer metrics following Yammer metrics group naming convention i.e. from share-group-metrics to ShareGroupMetrics (this will require KIP-932 upadate where share-acknowledgement metric is already moved to BrokerTopicMetrics and other 2 in this PR to Yammer metrics).
  • Moved record-acknowledgement-rate|count to Yammer meter metric with name RecordAcknowledgementPerSec.
  • Moved partition-load-time-avg|max to Yammer histogram metric with name PartitionLoadTimeMs.
  • Corrects the way partition load time metrics was calculated. The load time of share partition should be considered when the state of share partition moves from empty which happens once mayBeInitialize finishes. The existing implementation is not much helpful now since share partition initialization is now async, at the time of this metric implementation the initialization of share partition was sync hence the metric needed correction.

This PR is first step to start implementing other ShareGroupMetrics as defined in KIP-1103.

Screenshot 2025-02-13 at 12 53 46

@github-actions github-actions bot added triage PRs from the community core Kafka Broker KIP-932 Queues for Kafka labels Feb 6, 2025
@AndrewJSchofield AndrewJSchofield removed the triage PRs from the community label Feb 7, 2025
Copy link
Contributor

@smjn smjn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM
Could we plz also attach a jconsole screenshot of relevant metrics for posterity?

Copy link
Member

@AndrewJSchofield AndrewJSchofield left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR. As discussed, I would remove the share-group-metrics from KIP-932 and put them into KIP-1103 as part of moving them to Yammer. Then the related metrics are all together which keeps things easier to understand.

when(sp1.maybeInitialize()).thenReturn(pendingInitializationFuture2);
when(sp1.loadStartTimeMs()).thenReturn(40L);

DelayedOperationPurgatory<DelayedShareFetch> delayedShareFetchPurgatory = new DelayedOperationPurgatory<>(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would expect a call to delayedShareFetchPurgatory.shutdown(), ideally whether the test fails or not. Because the reaper is not enabled, I suppose the risk of leaking a thread is not present, but it would still be preferable to leave things tidy.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added thread related fixes here in this PR and additional checks: #18862. The shutdown of delayedShareFetchPurgatory will close the reaper thread which is also closed by SharePartitionManager.
Invoking close again on timer, sometimes gives exception as add while close fails. So I have tied things with SharePartitionManager close and added check for no pending threads, so in future if some test case fails to close things then we can catch the error early.

@apoorvmittal10
Copy link
Collaborator Author

LGTM Could we plz also attach a jconsole screenshot of relevant metrics for posterity?

Attached screenshot in description.

@apoorvmittal10
Copy link
Collaborator Author

Thanks for the PR. As discussed, I would remove the share-group-metrics from KIP-932 and put them into KIP-1103 as part of moving them to Yammer. Then the related metrics are all together which keeps things easier to understand.

I have updated both KIPs.

@AndrewJSchofield AndrewJSchofield merged commit 53543bc into apache:trunk Feb 14, 2025
9 checks passed
ijuma added a commit to ijuma/kafka that referenced this pull request Feb 14, 2025
…ssume-baseline-3.3-for-server

* apache-github/trunk: (36 commits)
  MINOR: Add KIP-848's metric to the doc (apache#18890)
  KAFKA-18772 Define share group config defaults for Docker (apache#18899)
  KAFKA-18733: Updating share group metrics (1/N) (apache#18826)
  KAFKA-18634: Fix ELR metadata version issues (apache#18680)
  KAFKA-18298 Fix flaky testConsumerGroupsDeprecatedConsumerGroupState and testConsumerGroups in PlaintextAdminIntegrationTest (apache#18513)
  MINOR: Marking testVerifyFetchAndCloseImplicit flaky (apache#18893)
  MINOR: Adjust javadoc to reflect the correct status of standby task TopicPartition (apache#18892)
  KAFKA-17182: Consumer fetch sessions are evicted too quickly with AsyncKafkaConsumer (apache#18795)
  KAFKA-16720: Support multiple groups in DescribeShareGroupOffsets RPC (apache#18834)
  KAFKA-18654[2/2]: Transction V2 retry add partitions on the server side when handling produce request. (apache#18810)
  MINOR: fix warn log message in Kafka Streams (apache#18878)
  KAFKA-18776: Fix flaky coordinator disconnect test & fix log level (apache#18866)
  KAFKA-17298: Update upgrade notes for 4.0 KIP-848 (apache#18756)
  MINOR bump setup-gradle (apache#18879)
  MINOR: Run javadoc as part of check task (apache#18863)
  MINOR: Updated share partition manager tests to close and other fixes (apache#18862)
  MINOR: Fix typo in ClusterControlManager (apache#18886)
  KAFKA-18728 Move ListOffsetsPartitionStatus to server module (apache#18807)
  MINOR: cleanup KStream JavaDocs (14/N) - stream-globalTable-left-join (apache#18867)
  MINOR: cleanup KStream JavaDocs (13/N) - stream-stream-outer-join (apache#18865)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Kafka Broker KIP-932 Queues for Kafka
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants