Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FISH-5827 Stuck Thread count as MicroProfile Metric Gauge (Payara 5.x) #5957

Merged
merged 4 commits into from
Oct 3, 2022

Conversation

jGauravGupta
Copy link
Contributor

Description

This PR adds HealthCheck stats (Stuck thread count) to MicroProfile Metrics.

Payara 6 PR:

#5952

Testing

Testing Performed

asadmin start-domain
asadmin set-healthcheck-service-configuration --serviceName=stuck-thread --enabled=true
asadmin restart-domain

Open http://localhost:8080/metrics/vendor URL which so=houl contains the following result:

# TYPE vendor_system_cpu_load gauge
# HELP vendor_system_cpu_load Display the "recent cpu usage" for the whole system. This value is a double in the [0.0,1.0] interval. A value of 0.0 means that all CPUs were idle during the recent period of time observed, while a value of 1.0 means that all CPUs were actively running 100% of the time during the recent period being observed. All values between 0.0 and 1.0 are possible depending of the activities going on in the system. If the system recent cpu usage is not available, the method returns a negative value.
vendor_system_cpu_load 0.2421606522975489
# TYPE vendor_thread_stuck_count gauge
# HELP vendor_thread_stuck_count Displays the stuck thread count which is blocked, and can't return to the threadpool for a certain amount of time.
vendor_thread_stuck_count 1

Now disable health check stuck-thread:

asadmin set-healthcheck-service-configuration --serviceName=stuck-thread --enabled=false
asadmin restart-domain

Open http://localhost:8080/metrics/vendor URL which so=houl contains the following result:

# TYPE vendor_system_cpu_load gauge
# HELP vendor_system_cpu_load Display the "recent cpu usage" for the whole system. This value is a double in the [0.0,1.0] interval. A value of 0.0 means that all CPUs were idle during the recent period of time observed, while a value of 1.0 means that all CPUs were actively running 100% of the time during the recent period being observed. All values between 0.0 and 1.0 are possible depending of the activities going on in the system. If the system recent cpu usage is not available, the method returns a negative value.
vendor_system_cpu_load 0.2421606522975489

Testing Environment

Windows 11, JDK 11.0.16

@jGauravGupta
Copy link
Contributor Author

jenkins test please

@jGauravGupta jGauravGupta changed the title FISH-5827 Stuck Thread count as MicroProfile Metric Gauge FISH-5827 Stuck Thread count as MicroProfile Metric Gauge (Payara 5.x) Sep 23, 2022
Copy link
Member

@Pandrex247 Pandrex247 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See review on other PR, I believe there's a bug - it always shows at least 1 stuck thread

@jGauravGupta
Copy link
Contributor Author

jenkins test please

@jGauravGupta
Copy link
Contributor Author

jenkins test please

Copy link
Member

@Pandrex247 Pandrex247 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Working better, but one issue - it's still running even when the health check service is disabled.
This seems counter intuitive to how it works elsewhere in the server.

@jGauravGupta
Copy link
Contributor Author

Hi Andrew,

it's still running even when the health check service is disabled.

I checked and can't reproduce it.
When Stuck Thread disabled:
image
On accessing the http://localhost:8080/metrics/vendor:

# TYPE vendor_system_cpu_load gauge
# HELP vendor_system_cpu_load Display the "recent cpu usage" for the whole system. This value is a double in the [0.0,1.0] interval. A value of 0.0 means that all CPUs were idle during the recent period of time observed, while a value of 1.0 means that all CPUs were actively running 100% of the time during the recent period being observed. All values between 0.0 and 1.0 are possible depending of the activities going on in the system. If the system recent cpu usage is not available, the method returns a negative value.
vendor_system_cpu_load 0.388351853542798

When stuck thread enabled:
image
On accessing the http://localhost:8080/metrics/vendor:

# TYPE vendor_system_cpu_load gauge
# HELP vendor_system_cpu_load Display the "recent cpu usage" for the whole system. This value is a double in the [0.0,1.0] interval. A value of 0.0 means that all CPUs were idle during the recent period of time observed, while a value of 1.0 means that all CPUs were actively running 100% of the time during the recent period being observed. All values between 0.0 and 1.0 are possible depending of the activities going on in the system. If the system recent cpu usage is not available, the method returns a negative value.
vendor_system_cpu_load 0.16640233851087813
# TYPE vendor_thread_stuck_count gauge
# HELP vendor_thread_stuck_count Displays the stuck thread count which is blocked, and can't return to the threadpool for a certain amount of time.
vendor_thread_stuck_count 0
# TYPE vendor_thread_stuck_maxDuration gauge
# HELP vendor_thread_stuck_maxDuration Displays the maximum duration of stuck thread which is blocked, and can't return to the threadpool for a certain amount of time.
vendor_thread_stuck_maxDuration 0

@Pandrex247
Copy link
Member

Sorry I should have been more explicit - I meant the "overall" health check service.

Stuck thread, hogging thread, CPU etc. are all sub-systems of the overall health check service.
Each of the subsystems can be on or off, and the overall health check service itself can also be on or off.

So the way it should work, to be consistent with how the healthcheck system itself works, is:

  • HealthCheck enabled, stuck thread disabled - no metric
  • HealthCheck enabled, stuck thread enabled - metric
  • HealthCheck disabled, stuck thread disabled - no metric
  • HealthCheck disabled, stuck thread enabled - no metric

image

@jGauravGupta
Copy link
Contributor Author

jenkins test please

@jGauravGupta jGauravGupta merged commit 15d60cf into payara:master Oct 3, 2022
pzygielo pushed a commit to pzygielo/Payara that referenced this pull request Dec 25, 2022
pzygielo added a commit to pzygielo/Payara that referenced this pull request Jan 19, 2023
JamesHillyard pushed a commit to JamesHillyard/Payara that referenced this pull request Jan 24, 2023
pzygielo added a commit to pzygielo/Payara that referenced this pull request Jan 24, 2023
JamesHillyard added a commit that referenced this pull request Jan 25, 2023
FISH-6962 Follow the rename from #5957/FISH-5827
arieki pushed a commit to arieki/Payara that referenced this pull request Mar 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants