Skip to content

Latest commit

 

History

History
364 lines (274 loc) · 25.3 KB

File metadata and controls

364 lines (274 loc) · 25.3 KB

Elasticsearch Exporter

Status
Stability development: metrics
beta: traces, logs
Distributions contrib
Issues Open issues Closed issues
Code Owners @JaredTan95, @carsonip, @lahsivjar

This exporter supports sending logs, metrics and traces to Elasticsearch.

Configuration options

Exactly one of the following settings is required:

  • endpoint (no default): The target Elasticsearch URL to which data will be sent (e.g. https://elasticsearch:9200)
  • endpoints (no default): A list of Elasticsearch URLs to which data will be sent, attempted in round-robin order
  • cloudid (no default): The Elastic Cloud ID of the Elastic Cloud Cluster to which data will be sent (e.g. foo:YmFyLmNsb3VkLmVzLmlvJGFiYzEyMyRkZWY0NTY=)

When the above settings are missing, endpoints will default to the comma-separated ELASTICSEARCH_URL environment variable.

Elasticsearch credentials may be configured via Authentication configuration settings. As a shortcut, the following settings are also supported:

  • user (optional): Username used for HTTP Basic Authentication.
  • password (optional): Password used for HTTP Basic Authentication.
  • api_key (optional): Elasticsearch API Key in "encoded" format.

Example:

exporters:
  elasticsearch:
    endpoint: https://elastic.example.com:9200
    auth:
      authenticator: basicauth

extensions:
  basicauth:
    client_auth:
      username: elastic
      password: changeme

······

service:
  extensions: [basicauth]
  pipelines:
    logs:
      receivers: [otlp]
      processors: [batch]
      exporters: [elasticsearch]
    traces:
      receivers: [otlp]
      processors: [batch]
      exporters: [elasticsearch]

Advanced configuration

HTTP settings

The Elasticsearch exporter supports common HTTP Configuration Settings. Gzip compression is enabled by default. To disable compression, set compression to none. As a consequence of supporting confighttp, the Elasticsearch exporter also supports common TLS Configuration Settings.

The Elasticsearch exporter sets timeout (HTTP request timeout) to 90s by default. All other defaults are as defined by confighttp.

Queuing

The Elasticsearch exporter supports the common sending_queue settings. However, the sending queue is currently disabled by default.

Batching

Warning

The batcher config is experimental and may change without notice.

The Elasticsearch exporter supports the common batcher settings.

  • batcher:
    • enabled (default=unset): Enable batching of requests into 1 or more bulk requests. On a batcher flush, it is possible for a batched request to be translated to more than 1 bulk request due to flush::bytes.
    • min_size_items (default=5000): Minimum number of log records / spans / data points in the batched request to immediately trigger a batcher flush.
    • max_size_items (default=0): Maximum number of log records / spans / data points in a batched request. To limit bulk request size, configure flush::bytes instead. ⚠️ It is recommended to keep max_size_items as 0 as a non-zero value may lead to broken metrics grouping and indexing rejections.
    • flush_timeout (default=30s): Maximum time of the oldest item spent inside the batcher buffer, aka "max age of batcher buffer". A batcher flush will happen regardless of the size of content in batcher buffer.

By default, the exporter will perform its own buffering and batching, as configured through the flush config, and batcher will be unused. By setting batcher::enabled to either true or false, the exporter will not perform any of its own buffering or batching, and the flush::interval config will be ignored. In a future release when the batcher config is stable, and has feature parity with the exporter's existing flush config, it will be enabled by default.

Using the common batcher functionality provides several benefits over the default behavior:

  • Combined with a persistent queue, or no queue at all, batcher enables at least once delivery. With the default behavior, the exporter will accept data and process it asynchronously, which interacts poorly with queuing.
  • By ensuring the exporter makes requests to Elasticsearch synchronously, client metadata can be passed through to Elasticsearch requests, e.g. by using the headers_setter extension.

Elasticsearch document routing

Telemetry data will be written to signal specific data streams by default: logs to logs-generic-default, metrics to metrics-generic-default, and traces to traces-generic-default. This can be customised through the following settings:

  • index (DEPRECATED, please use logs_index for logs, metrics_index for metrics, traces_index for traces): The index or data stream name to publish events to. The default value is logs-generic-default.

  • logs_index: The index or data stream name to publish events to. The default value is logs-generic-default

  • logs_dynamic_index (optional): uses resource, scope, or log record attributes to dynamically construct index name.

    • enabled(default=false): Enable/Disable dynamic index for log records. If data_stream.dataset or data_stream.namespace exist in attributes (precedence: log record attribute > scope attribute > resource attribute), they will be used to dynamically construct index name in the form logs-${data_stream.dataset}-${data_stream.namespace}. Otherwise, if elasticsearch.index.prefix or elasticsearch.index.suffix exist in attributes (precedence: resource attribute > scope attribute > log record attribute), they will be used to dynamically construct index name in the form ${elasticsearch.index.prefix}${logs_index}${elasticsearch.index.suffix}. Otherwise, if scope name matches regex /receiver/(\w*receiver), data_stream.dataset will be capture group #1. Otherwise, the index name falls back to logs-generic-default, and logs_index config will be ignored. Except for prefix/suffix attribute presence, the resulting docs will contain the corresponding data_stream.* fields, see restrictions applied to Data Stream Fields.
  • metrics_index (optional): The index or data stream name to publish metrics to. The default value is metrics-generic-default. ⚠️ Note that metrics support is currently in development.

  • metrics_dynamic_index (optional): uses resource, scope or data point attributes to dynamically construct index name. ⚠️ Note that metrics support is currently in development.

    • enabled(default=true): Enable/disable dynamic index for metrics. If data_stream.dataset or data_stream.namespace exist in attributes (precedence: data point attribute > scope attribute > resource attribute), they will be used to dynamically construct index name in the form metrics-${data_stream.dataset}-${data_stream.namespace}. Otherwise, if elasticsearch.index.prefix or elasticsearch.index.suffix exist in attributes (precedence: resource attribute > scope attribute > data point attribute), they will be used to dynamically construct index name in the form ${elasticsearch.index.prefix}${metrics_index}${elasticsearch.index.suffix}. Otherwise, if scope name matches regex /receiver/(\w*receiver), data_stream.dataset will be capture group #1. Otherwise, the index name falls back to metrics-generic-default, and metrics_index config will be ignored. Except for prefix/suffix attribute presence, the resulting docs will contain the corresponding data_stream.* fields, see restrictions applied to Data Stream Fields.
  • traces_index: The index or data stream name to publish traces to. The default value is traces-generic-default.

  • traces_dynamic_index (optional): uses resource, scope, or span attributes to dynamically construct index name.

    • enabled(default=false): Enable/Disable dynamic index for trace spans. If data_stream.dataset or data_stream.namespace exist in attributes (precedence: span attribute > scope attribute > resource attribute), they will be used to dynamically construct index name in the form traces-${data_stream.dataset}-${data_stream.namespace}. Otherwise, if elasticsearch.index.prefix or elasticsearch.index.suffix exist in attributes (precedence: resource attribute > scope attribute > span attribute), they will be used to dynamically construct index name in the form ${elasticsearch.index.prefix}${traces_index}${elasticsearch.index.suffix}. Otherwise, if scope name matches regex /receiver/(\w*receiver), data_stream.dataset will be capture group #1. Otherwise, the index name falls back to traces-generic-default, and traces_index config will be ignored. Except for prefix/suffix attribute presence, the resulting docs will contain the corresponding data_stream.* fields, see restrictions applied to Data Stream Fields. There is an exception for span events under OTel mapping mode (mapping::mode: otel), where span event attributes instead of span attributes are considered, and data_stream.type is always logs instead of traces such that documents are routed to logs-${data_stream.dataset}-${data_stream.namespace}.
  • logstash_format (optional): Logstash format compatibility. Logs, metrics and traces can be written into an index in Logstash format.

    • enabled(default=false): Enable/disable Logstash format compatibility. When logstash_format.enabled is true, the index name is composed using (logs|metrics|traces)_index or (logs|metrics|traces)_dynamic_index as prefix and the date as suffix, e.g: If logs_index or logs_dynamic_index is equal to logs-generic-default, your index will become logs-generic-default-YYYY.MM.DD. The last string appended belongs to the date when the data is being generated.
    • prefix_separator(default=-): Set a separator between logstash_prefix and date.
    • date_format(default=%Y.%m.%d): Time format (based on strftime) to generate the second part of the Index name.

Elasticsearch document mapping

The Elasticsearch exporter supports several document schemas and preprocessing behaviours, which may be configured through the following settings:

  • mapping: Events are encoded to JSON. The mapping allows users to configure additional mapping rules.
    • mode (default=none): The fields naming mode. valid modes are:
      • none: Use original fields and event structure from the OTLP event.

      • ecs: Try to map fields to Elastic Common Schema (ECS)

      • otel: Elastic's preferred "OTel-native" mapping mode. Uses original fields and event structure from the OTLP event.

        • ⚠️ This mode's behavior is unstable, it is currently experimental and undergoing changes.
        • There's a special treatment for the following attributes: data_stream.type, data_stream.dataset, data_stream.namespace. Instead of serializing these values under the *attributes.* namespace, they're put at the root of the document, to conform with the conventions of the data stream naming scheme that maps these as constant_keyword fields.
        • data_stream.dataset will always be appended with .otel. It is recommended to use with *_dynamic_index.enabled: true to route documents to data stream ${data_stream.type}-${data_stream.dataset}-${data_stream.namespace}.
        • Span events are stored in separate documents. They will be routed with data_stream.type set to logs if traces_dynamic_index::enabled is true.
      • raw: Omit the Attributes. string prefixed to field names for log and span attributes as well as omit the Events. string prefixed to field names for span events.

      • bodymap: Provides fine-grained control over the final documents to be ingested. :warning: This mode's behavior is unstable, it is currently experimental and undergoing changes. It works only for logs where the log record body is a map. Each LogRecord body is serialized to JSON as-is and becomes a separate document for ingestion. If the log record body is not a map, the exporter will log a warning and drop the log record.

    • dedup (DEPRECATED). This configuration is deprecated and non-operational, and will be removed in the future. Object keys are always deduplicated to avoid Elasticsearch rejecting documents.
    • dedot (default=true; DEPRECATED, in future dedotting will always be enabled for ECS mode, and never for other modes): When enabled attributes with . will be split into proper json objects.

ECS mapping mode

Warning

The ECS mode mapping mode is currently undergoing changes, and its behaviour is unstable.

In ECS mapping mode, the Elasticsearch Exporter attempts to map fields from OpenTelemetry Semantic Conventions (version 1.22.0) to Elastic Common Schema. This mode may be used for compatibility with existing dashboards that work with ECS.

Elasticsearch ingest pipeline

Documents may be optionally passed through an Elasticsearch Ingest pipeline prior to indexing. This can be configured through the following settings:

Elasticsearch bulk indexing

The Elasticsearch exporter uses the Elasticsearch Bulk API for indexing documents. The behaviour of this bulk indexing can be configured with the following settings:

  • num_workers (default=runtime.NumCPU()): Number of workers publishing bulk requests concurrently.
  • flush: Event bulk indexer buffer flush settings
    • bytes (default=5000000): Write buffer flush size limit before compression. A bulk request will be sent immediately when its buffer exceeds this limit. This value should be much lower than Elasticsearch's http.max_content_length config to avoid HTTP 413 Entity Too Large error. It is recommended to keep this value under 5MB.
    • interval (default=30s): Write buffer flush time limit.
  • retry: Elasticsearch bulk request retry settings
    • enabled (default=true): Enable/Disable request retry on error. Failed requests are retried with exponential backoff.
    • max_requests (DEPRECATED, use retry::max_retries instead): Number of HTTP request retries including the initial attempt. If used, retry::max_retries will be set to max_requests - 1.
    • max_retries (default=2): Number of HTTP request retries. To disable retries, set retry::enabled to false instead of setting max_retries to 0.
    • initial_interval (default=100ms): Initial waiting time if a HTTP request failed.
    • max_interval (default=1m): Max waiting time if a HTTP request failed.
    • retry_on_status (default=[429]): Status codes that trigger request or document level retries. Request level retry and document level retry status codes are shared and cannot be configured separately. To avoid duplicates, it defaults to [429].

Note

The flush::interval config will be ignored when batcher::enabled config is explicitly set to true or false.

Elasticsearch node discovery

The Elasticsearch Exporter will regularly check Elasticsearch for available nodes. Newly discovered nodes will automatically be used for load balancing. Settings related to node discovery are:

  • discover:
    • on_start (optional): If enabled the exporter queries Elasticsearch for all known nodes in the cluster on startup.
    • interval (optional): Interval to update the list of Elasticsearch nodes.

Node discovery can be disabled by setting discover.interval to 0.

Telemetry settings

The Elasticsearch Exporter's own telemetry settings for testing and debugging purposes.

⚠️ This is experimental and may change at any time.

  • telemetry:
    • log_request_body (default=false): Logs Elasticsearch client request body as a field in a log line at DEBUG level. It requires service::telemetry::logs::level to be set to debug. WARNING: Enabling this config may expose sensitive data.
    • log_response_body (default=false): Logs Elasticsearch client response body as a field in a log line at DEBUG level. It requires service::telemetry::logs::level to be set to debug. WARNING: Enabling this config may expose sensitive data.

Exporting metrics

Metrics support is currently in development. The metric types supported are:

  • Gauge
  • Sum
  • Histogram (Delta temporality only)
  • Exponential histogram (Delta temporality only)
  • Summary

ECS Mapping

elasticsearchexporter follows ECS mapping defined here: https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/logs/data-model-appendix.md#elastic-common-schema

When mode is set to ecs, elasticsearchexporter performs conversions for resource-level attributes from their Semantic Conventions (SemConv) names to equivalent Elastic Common Schema (ECS) names.

If the target ECS field name is specified as an empty string (""), the converter will neither convert the SemConv key to the equivalent ECS name nor pass through the SemConv key as-is to become the ECS name.

When "Preserved" is true, the attribute will be preserved in the payload and duplicated as mapped to its ECS equivalent.

Semantic Convention Name ECS Name Preserve
cloud.platform cloud.service.name false
container.image.tags container.image.tag false
deployment.environment service.environment false
host.arch host.architecture false
host.name host.hostname true
k8s.cluster.name orchestrator.cluster.name false
k8s.container.name kubernetes.container.name false
k8s.cronjob.name kubernetes.cronjob.name false
k8s.daemonset.name kubernetes.daemonset.name false
k8s.deployment.name kubernetes.deployment.name false
k8s.job.name kubernetes.job.name false
k8s.namespace.name kubernetes.namespace false
k8s.node.name kubernetes.node.name false
k8s.pod.name kubernetes.pod.name false
k8s.pod.uid kubernetes.pod.uid false
k8s.replicaset.name kubernetes.replicaset.name false
k8s.statefulset.name kubernetes.statefulset.name false
os.description host.os.full false
os.name host.os.name false
os.type host.os.platform false
os.version host.os.version false
process.executable.path process.executable false
process.runtime.name service.runtime.name false
process.runtime.version service.runtime.version false
service.instance.id service.node.name false
telemetry.distro.name "" false
telemetry.distro.version "" false
telemetry.sdk.language "" false
telemetry.sdk.name "" false
telemetry.sdk.version "" false

Compound Mapping

There are ECS fields that are not mapped easily 1 to 1 but require more advanced logic.

agent.name

The agent name takes the form of a compound name consisting of 3 components:

  • telemetry.sdk.name or, if not present, defaults to otlp,
  • telemetry.sdk.language, defaulting to unknown in case it is missing,
  • telemetry.distro.name, which is allowed to be empty.

These values are all valid:

telemetry.sdk.name telemetry.sdk.language telemetry.distro.name agent.name
"" "" "" otlp/unknown
"" dotnet "" otlp/dotnet
opentelemetry dotnet "" opentelemetry/dotnet
"" java parts-unlimited-java otlp/java/parts-unlimited-java
"" "" parts-unlimited-java otlp/unknown/parts-unlimited-java

agent.version

Takes the value of telemetry.distro.version or telemetry.sdk.version. If both telemetry.distro.version and telemetry.sdk.version are present, telemetry.distro.version takes precedence.

host.os.type

Maps values of os.type in the following manner:

SemConv Value ECS Value
windows windows
linux linux
darwin macos
aix unix
hpux unix
solaris unix

In case os.name is present and falls within the specified range of values:

SemConv Value ECS Value
Android android
iOS ios

Otherwise, it is mapped to an empty string ("").

@timestamp

In case the record contains timestamp, this value is used. Otherwise, the observed timestamp is used.

Known issues

version_conflict_engine_exception

When sending high traffic of metrics to a TSDB metrics data stream, e.g. using OTel mapping mode to a 8.16 Elasticsearch, it is possible to get error logs "failed to index document" with error.type "version_conflict_engine_exception" and error.reason containing "version conflict, document already exists". It is due to Elasticsearch grouping metrics with the same dimensions, whether it is the same or different metric name, using @timestamp in milliseconds precision as opposed to nanoseconds in elasticsearchexporter.

This will be fixed in a future version of Elasticsearch. A possible workaround would be to use a transform processor to truncate the timestamp, but this will cause duplicate data to be dropped silently.

However, if @timestamp precision is not the problem, check your metrics pipeline setup for misconfiguration that causes an actual violation of the single writer principle.