Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Telemetry] Custom logger appender? #89839

Open
Tracked by #119466
afharo opened this issue Feb 1, 2021 · 7 comments
Open
Tracked by #119466

[Telemetry] Custom logger appender? #89839

afharo opened this issue Feb 1, 2021 · 7 comments
Labels
enhancement New value added to drive a business result Feature:Telemetry impact:medium Addressing this issue will have a medium level of impact on the quality/strength of our product. loe:small Small Level of Effort Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc

Comments

@afharo
Copy link
Member

afharo commented Feb 1, 2021

Follow up to #89588.

We are constantly switching warn/error logs in the usage-collection/telemetry plugins to debug to avoid making noise to our end-users. It's an easy way to silence those errors. However, it feels to me like debug errors can make it harder to identify any possible issues we may introduce (they are not noisy enough).

How about we configure the loggers for the UsageCollection & Telemetry plugins to be silent when in production mode, but properly warn when in dev mode?

@afharo afharo added discuss Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc Feature:Telemetry labels Feb 1, 2021
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-core (Team:Core)

@Bamieh
Copy link
Member

Bamieh commented Feb 1, 2021

+1 on this. We can add an explicit flag to the configs and default it to true if env.dev

@rudolf
Copy link
Contributor

rudolf commented Feb 1, 2021

+1

I think we should aim to explicitly ignore expected errors though. E.g. reading a saved object can timeout when there's network instability, if we know this call will be repeated at some point in the future, we should just always ignore this error and not print any logs, not even a developer needs to see that it happened, it's part of the expected behaviour of the plugin.

Then for the scenarios we're not sure we're handling correctly or that we don't know if we can recover from we can log in development.

@LeeDr
Copy link
Contributor

LeeDr commented Apr 27, 2021

-1 wait, why would we have network instability when running Elasticsearch and Kibana on the same host?

@Bamieh
Copy link
Member

Bamieh commented Apr 28, 2021

@LeeDr Elasticsearch can be non-responsive for multiple reasons even on the same host. Out of memory/space, node is restarting, kibana index is locked or misconfigured, etc.

Our telemetry collectors are usually the first to scream when kibana is unable to reach ES for any reason. This usually means that customers open tickets that telemetry is causing kibana to fail although telemetry was only the first plugin to log an error in the server logs.

If we silence the telemetry/collection logs on production or put them behind a config we'd be avoiding these situations which helps users identify the real cause of the issue.

I'm assuming users would try to disable telemetry as a first step to debug such cases although the root cause is not related to telemetry, hence we'd be missing out from receiving usage from these clusters.

@afharo afharo added enhancement New value added to drive a business result and removed discuss labels Jan 18, 2022
@exalate-issue-sync exalate-issue-sync bot added impact:needs-assessment Product and/or Engineering needs to evaluate the impact of the change. loe:small Small Level of Effort labels Jan 18, 2022
@afharo
Copy link
Member Author

afharo commented Jan 18, 2022

For implementation details: the POC #95960 had it implemented. I think that the effort would be to:

  1. copy that piece of logic from the POC to the telemetry and usageCollection plugins
  2. Review the current .debug logs and set them to their appropriate level.

@exalate-issue-sync exalate-issue-sync bot added impact:medium Addressing this issue will have a medium level of impact on the quality/strength of our product. and removed impact:needs-assessment Product and/or Engineering needs to evaluate the impact of the change. labels Feb 8, 2022
@afharo
Copy link
Member Author

afharo commented Mar 21, 2022

@pjhampton added an important point: We should set Cloud to enable these logs as well by default so we can catch any potential bugs in those controlled environments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New value added to drive a business result Feature:Telemetry impact:medium Addressing this issue will have a medium level of impact on the quality/strength of our product. loe:small Small Level of Effort Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc
Projects
None yet
Development

No branches or pull requests

5 participants