[Telemetry] Custom logger appender? #89839

afharo · 2021-02-01T10:09:54Z

Follow up to #89588.

We are constantly switching warn/error logs in the usage-collection/telemetry plugins to debug to avoid making noise to our end-users. It's an easy way to silence those errors. However, it feels to me like debug errors can make it harder to identify any possible issues we may introduce (they are not noisy enough).

How about we configure the loggers for the UsageCollection & Telemetry plugins to be silent when in production mode, but properly warn when in dev mode?

The text was updated successfully, but these errors were encountered:

elasticmachine · 2021-02-01T10:09:56Z

Pinging @elastic/kibana-core (Team:Core)

Bamieh · 2021-02-01T10:42:31Z

+1 on this. We can add an explicit flag to the configs and default it to true if env.dev

rudolf · 2021-02-01T14:36:24Z

+1

I think we should aim to explicitly ignore expected errors though. E.g. reading a saved object can timeout when there's network instability, if we know this call will be repeated at some point in the future, we should just always ignore this error and not print any logs, not even a developer needs to see that it happened, it's part of the expected behaviour of the plugin.

Then for the scenarios we're not sure we're handling correctly or that we don't know if we can recover from we can log in development.

LeeDr · 2021-04-27T22:47:09Z

-1 wait, why would we have network instability when running Elasticsearch and Kibana on the same host?

Bamieh · 2021-04-28T12:29:28Z

@LeeDr Elasticsearch can be non-responsive for multiple reasons even on the same host. Out of memory/space, node is restarting, kibana index is locked or misconfigured, etc.

Our telemetry collectors are usually the first to scream when kibana is unable to reach ES for any reason. This usually means that customers open tickets that telemetry is causing kibana to fail although telemetry was only the first plugin to log an error in the server logs.

If we silence the telemetry/collection logs on production or put them behind a config we'd be avoiding these situations which helps users identify the real cause of the issue.

I'm assuming users would try to disable telemetry as a first step to debug such cases although the root cause is not related to telemetry, hence we'd be missing out from receiving usage from these clusters.

afharo · 2022-01-18T12:53:49Z

For implementation details: the POC #95960 had it implemented. I think that the effort would be to:

copy that piece of logic from the POC to the telemetry and usageCollection plugins
Review the current .debug logs and set them to their appropriate level.

afharo · 2022-03-21T13:07:19Z

@pjhampton added an important point: We should set Cloud to enable these logs as well by default so we can catch any potential bugs in those controlled environments.

afharo added discuss Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc Feature:Telemetry labels Feb 1, 2021

afharo mentioned this issue Feb 1, 2021

Kibana logs [warning][collector-set][plugins][usageCollection] StatusCodeError: Request Timeout after 30000ms after being idle for a few hours #89588

Closed

afharo mentioned this issue Mar 10, 2021

[Telemetry] Set logger's context to silent by default when in production #94338

Closed

afharo mentioned this issue Mar 31, 2021

[POC] Event-based Telemetry #95960

Closed

12 tasks

afharo mentioned this issue Dec 20, 2021

[Usage collection] Update bulkFetch logic #121437

Merged

afharo mentioned this issue Jan 4, 2022

[Meta][Telemetry] Reduce telemetry footprint #119466

Closed

19 tasks

afharo added enhancement New value added to drive a business result and removed discuss labels Jan 18, 2022

exalate-issue-sync bot added impact:needs-assessment Product and/or Engineering needs to evaluate the impact of the change. loe:small Small Level of Effort labels Jan 18, 2022

afharo mentioned this issue Jan 27, 2022

[Usage Collection] Better DX - meta #96641

Open

10 tasks

exalate-issue-sync bot added impact:medium Addressing this issue will have a medium level of impact on the quality/strength of our product. and removed impact:needs-assessment Product and/or Engineering needs to evaluate the impact of the change. labels Feb 8, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Telemetry] Custom logger appender? #89839

[Telemetry] Custom logger appender? #89839

afharo commented Feb 1, 2021 •

edited

Loading

elasticmachine commented Feb 1, 2021

Bamieh commented Feb 1, 2021

rudolf commented Feb 1, 2021

LeeDr commented Apr 27, 2021

Bamieh commented Apr 28, 2021 •

edited

Loading

afharo commented Jan 18, 2022

afharo commented Mar 21, 2022

[Telemetry] Custom logger appender? #89839

[Telemetry] Custom logger appender? #89839

Comments

afharo commented Feb 1, 2021 • edited Loading

elasticmachine commented Feb 1, 2021

Bamieh commented Feb 1, 2021

rudolf commented Feb 1, 2021

LeeDr commented Apr 27, 2021

Bamieh commented Apr 28, 2021 • edited Loading

afharo commented Jan 18, 2022

afharo commented Mar 21, 2022

afharo commented Feb 1, 2021 •

edited

Loading

Bamieh commented Apr 28, 2021 •

edited

Loading