Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Che Logging #10290

Closed
yarivlifchuk opened this issue Jul 5, 2018 · 2 comments
Closed

Che Logging #10290

yarivlifchuk opened this issue Jul 5, 2018 · 2 comments
Labels
kind/epic A long-lived, PM-driven feature request. Must include a checklist of items that must be completed. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale.

Comments

@yarivlifchuk
Copy link

yarivlifchuk commented Jul 5, 2018

Summary

Logging provides system administrators with information useful for diagnostics and auditing. We propose a logging mechanism that does not require changes to existing Che code. However, we do recommend standardizing the format in which log events are written.
In addition, we propose an option to enable providing additional parameters to log entries in a standard way, to improve supportability.

Technically, the decoupling of the logging mechanism from the code is done by reading standard output on the K8S Pod level. To support this, additional industry-accepted open source components must be deployed to the K8S cluster with special focus on security aspect.

Description

Che epics [Complementary]:
Tracing - #10298, #10288
Monitoring - #10329

Che epics [to be reevaluated]:
Logging - #5483
Logstash - #6537, #7566

Background

Access to the logs of Che agents and applications running within the workspace(aka WS) is required for supportability (analysis, app behavior, monitor), also after the WS was evicted.
Logs should have separate storage and lifecycle independent of nodes and pods.
This concept is called cluster-level-logging which has several common approaches:

  1. Use a node-level logging agent that runs on every node.
  2. Include a dedicated sidecar container for logging in an application pod.
  3. Push logs directly to a backend from within an application.

Using a node level logging agent is the most common and encouraged approach for K8S cluster because it creates only one agent per node and it doesn’t require any installation on each pod (where logged applications are running). It is based on application’s standard output and standard error.
https://kubernetes.io/docs/concepts/cluster-administration/logging

Logging agents (not refer to Che agent)

Common K8S logging agent options:

  1. Stackdriver Logging
  2. Elasticsearch

They both use fluentd as an agent on the node.

In the open source world, the two most-popular data collectors are Logstash and Fluentd. Logstash is most known for being part of the ELK Stack while Fluentd has become increasingly used by communities of users of software such as Docker, GCP, and Elasticsearch.

Logstash and Fluentd are data processing pipeline that ingests data from a multitude of sources simultaneously, transforms it, and then sends it.
Main differences:

  1. Event Routing – Logstash use algorithmic statements while Fluentd uses Tags
  2. Performance – Logstash uses more memory in the node agents

However, the similarities between Logstash and Fluentd are greater than their differences
https://logz.io/blog/fluentd-logstash
https://www.elastic.co/guide/en/logstash/current/introduction.html
https://docs.fluentd.org/v0.12/articles/quickstart
http://larmog.github.io/2016/03/13/elk-cluster-on-kubernetes-on-arm---part-1
http://larmog.github.io/2016/05/02/efk-cluster-on-kubernetes-on-arm---part-2

Common node level agents available for Fluentd

Common node level agents available for Logstash

Container Logs Collection

Cluster level logging collects the standard output and error of the applications running in the containers.
K8S logs the content of the stdout and stderr streams of a pod to a file. It creates one file for each container in a pod. The default location for these files is /var/log/containers. The filename contains: pod name, pod namespace, container name, and container id. The file contains one JSON object per line of the two streams stout and stderr. K8S exposes the content of the log file to clients via its API.

The collection process in Fluentd as an example is done in the following way:
The Fluentd parses the filename of the log file and uses this information to fetch additional metadata from the K8S API. The metadata like labels and annotations are attached to the log event as additional fields so it can be used for search and filter.

The fluentd pod mounts the /var/lib/containers/ host volume to access the logs of all pods scheduled to that Kubelets as well as a host volume for a fluentd position file. This position file saves which log lines are already shipped to the central log store.

Implementation recommendation

  1. Write logs to stdout and stderr by the Che agents and any relevant applications.
  2. Add custom environment params to the log’s records.
    There are two kind of custom environment params.
    • Mandatory param added to each log’s record, e.g. user’s tenant id.
    • Optional param added only to specific log’s record, e.g. API name if relevant or trace_id.
      If the log format uses CSV like, based on delimiter, instead of enhanced JSON or XML format then
      this param need to be added to each log record (with empty value if not relevant).
@ibuziuk
Copy link
Member

ibuziuk commented Jul 5, 2018

Opentracing epic has been created - #10288

@slemeur slemeur added the kind/epic A long-lived, PM-driven feature request. Must include a checklist of items that must be completed. label Jul 5, 2018
@skabashnyuk skabashnyuk changed the title K8S Che6 Logging Che Logging Jan 28, 2019
@che-bot
Copy link
Contributor

che-bot commented Sep 7, 2019

Issues go stale after 180 days of inactivity. lifecycle/stale issues rot after an additional 7 days of inactivity and eventually close.

Mark the issue as fresh with /remove-lifecycle stale in a new comment.

If this issue is safe to close now please do so.

Moderators: Add lifecycle/frozen label to avoid stale mode.

@che-bot che-bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 7, 2019
@che-bot che-bot closed this as completed Sep 17, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/epic A long-lived, PM-driven feature request. Must include a checklist of items that must be completed. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale.
Projects
None yet
Development

No branches or pull requests

4 participants