DefaultExecutionRequestObservationConvention does not produce INTERNAL_ERROR outcomes #1058
Labels
in: core
Issues related to config and core support
status: backported
An issue that has been backported to maintenance branches
type: bug
A general bug
Milestone
Project details:
Spring Boot project, having:
According to this doc: https://docs.spring.io/spring-graphql/reference/observability.html#observability.server.request, we should get graphql.outcome INTERNAL_ERROR if no valid GraphQL response could be produced. It's not really specified what exactly that means. When I first read the docu I understood that's when there is unchecked exception thrown and not being handled by exception handler.
from the code I see that's actually in the case when there is general throwable error set in observable or there are no data to produce.
Main "throwable error" from ExecutionRequestObservationContext never gets set in case there was an unchecked exception thrown from application code. In that scenario the error is handled via ExceptionResolversExceptionHandler and translated to 'regular error' from executionResult:
ExecutionResultImpl{errors=[INTERNAL_ERROR for 2ee7db16-1015-ae28-d84e-768be03a8e6b], data={calculateProductionCost=null}, dataPresent=true, extensions=null}
Though it is reported as part of GraphQlObservationInstrumentation as "graphql.request.INTERNAL_SERVER_ERROR" counter metric.
As a result we can see this behaviour in datadog:
graphql.request.count metric which has two tags: graphql.outcome either SUCCESS or REQUEST_ERROR, which is consisting of all possible error responses, even for internal server errors. OUTCOME_INTERNAL_ERROR Is never really reported in our case.
graphql.request.INTERNAL_SERVER_ERROR metric reported as count for all INTERNAL_SERVER_ERRORs.
I'm finding this really confusing. It would be much more useful to produce INTERNAL_ERROR observation outcome in case the actual server internal error (aka uncatched exception, not covered by user-defined exception handlers) happens.
It would be also more inline with HTTP world and status codes (I know gql always respond with http 200, it's just an inspiration for metric outcomes):
2xx codes -> outcome success
4xx codes -> outcome request error
5xx codes -> outcome internal/server error
If that's not the desired behaviour, I'd make it very explicit in documentation as current description is IMHO misleading.
EDIT://
for now, we found a workaround by doing this:
The text was updated successfully, but these errors were encountered: