Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FISH-5827 Stuck Thread count as MicroProfile Metric Gauge #5952

Merged
merged 8 commits into from
Oct 3, 2022

Conversation

jGauravGupta
Copy link
Contributor

Description

This PR adds HealthCheck stats (Stuck thread count) to MicroProfile Metrics.

Important Info

StuckThreadsHealthCheck required by MetricsServiceImpl.
MetricsServiceImpl required by FaultToleranceServiceImpl and MicroProfileMetricsChecker.
All of these services started at @RunLevel(StartupRunLevel.VAL).
The enable status of HealthCheck can not be checked during registration of Metrics metadata which is by default registered and during response writer operation, health check status is checked.

Testing

Testing Performed

asadmin start-domain
asadmin set-healthcheck-service-configuration --serviceName=stuck-thread --enabled=true
asadmin restart-domain

Open http://localhost:8080/metrics/vendor URL which so=houl contains the following result:

# TYPE vendor_system_cpu_load gauge
# HELP vendor_system_cpu_load Display the "recent cpu usage" for the whole system. This value is a double in the [0.0,1.0] interval. A value of 0.0 means that all CPUs were idle during the recent period of time observed, while a value of 1.0 means that all CPUs were actively running 100% of the time during the recent period being observed. All values between 0.0 and 1.0 are possible depending of the activities going on in the system. If the system recent cpu usage is not available, the method returns a negative value.
vendor_system_cpu_load 0.2421606522975489
# TYPE vendor_thread_stuck_count gauge
# HELP vendor_thread_stuck_count Displays the stuck thread count which is blocked, and can't return to the threadpool for a certain amount of time.
vendor_thread_stuck_count 1

Now disable health check stuck-thread:

asadmin set-healthcheck-service-configuration --serviceName=stuck-thread --enabled=false
asadmin restart-domain

Open http://localhost:8080/metrics/vendor URL which so=houl contains the following result:

# TYPE vendor_system_cpu_load gauge
# HELP vendor_system_cpu_load Display the "recent cpu usage" for the whole system. This value is a double in the [0.0,1.0] interval. A value of 0.0 means that all CPUs were idle during the recent period of time observed, while a value of 1.0 means that all CPUs were actively running 100% of the time during the recent period being observed. All values between 0.0 and 1.0 are possible depending of the activities going on in the system. If the system recent cpu usage is not available, the method returns a negative value.
vendor_system_cpu_load 0.2421606522975489

Testing Environment

Windows 11, JDK 11.0.16

Copy link
Member

@Pandrex247 Pandrex247 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason it seems to always say there is a stuck thread?

@jGauravGupta jGauravGupta changed the base branch from Payara6 to Payara6-tck September 23, 2022 05:26
@jGauravGupta
Copy link
Contributor Author

Changing the base branch to Payara6-tck as JavaEE8 Samples requires CDI 4.0, see payara/patched-src-javaee8-samples@5956e84#diff-d26e10d72530f9acdf8b358d1aa078c602ed4aa38dea26ca966117cb1857e3a2

@jGauravGupta
Copy link
Contributor Author

jenkins test please

@jGauravGupta
Copy link
Contributor Author

Hi @Pandrex247,

At the start of the request to http://localhost:8080/metrics/vendor, the Stuck thread count is incremented by 1, and end of the request decremented as the metric writer processing is done in between it always shows at least 1.

(org.glassfish.grizzly.threadpool.DefaultWorkerThread) Thread[http-thread-pool::http-listener-1(4),5,main]

@Pandrex247
Copy link
Member

I don't understand.
If I turn on the log notifier there are no listed stuck threads - I don't see why a call to this endpoint has a discrepancy, it seems like a bug

Copy link
Member

@Pandrex247 Pandrex247 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I think this is a bug.
Setting the stuck thread threshold to 8 minutes and poking the endpoint still has it show 1 stuck thread.

@Pandrex247 Pandrex247 changed the base branch from Payara6-tck to Payara6 September 23, 2022 15:50
@Pandrex247
Copy link
Member

We're closing the Payara6-tck branch, changing base.

@jGauravGupta
Copy link
Contributor Author

jenkins test please

@jGauravGupta
Copy link
Contributor Author

jenkins test please

@jGauravGupta jGauravGupta merged commit a1b3891 into payara:Payara6 Oct 3, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants