Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Elasticsearch, Kibana and fluentd to collect, store and view logs. #327

Merged
merged 10 commits into from
Mar 12, 2018
Merged

Add Elasticsearch, Kibana and fluentd to collect, store and view logs. #327

merged 10 commits into from
Mar 12, 2018

Conversation

mdemirhan
Copy link
Contributor

  • Removed the fluentd sidecar container from ela pods
  • Added a fluentd daemonset with a fixed configuration to collect ela and build container logs.
  • Added a "dev" option to setup log collection for ela controllers (see telemetry.md for more information)
  • Refactored the existing configurations into a clearer folder structure.

@mdemirhan mdemirhan requested review from grantr, mchmarny, mattmoor, vaikas, akyyy and a team March 8, 2018 21:41
@google-prow-robot google-prow-robot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Mar 8, 2018
Copy link
Member

@mattmoor mattmoor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a few comments, but I'd find this a bit easier to review if we broke this down directory by directory.

template = "prometheus-operator/service.yaml",
)

k8s_objects(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you'd like, I can show you how to use a Bazel glob() to make this a bit more concise?


k8s_object(
name = "prometheus-operator-service",
template = "prometheus-operator/service.yaml",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file starts to get pretty long. Can we move these rules into BUILD.bazel files in each directory, and they can export their :prometheus-operator k8s_objects group?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated PR with individual BUILD files. Thanks!

1. Forward the Grafana server to your machine:
2. **everything-dev**: This configuration collects everything in (1) plus Elafros controller logs.
```shell
bazel run config/monitoring:everything-dev.apply
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should talk sometime about how we release these configs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed.

```
Then open Kibana UI at this [link](http://localhost:8001/api/v1/namespaces/monitoring/services/kibana-logging/proxy/app/kibana)
(*it might take a couple of minutes for the proxy to work*).
When Kibana is opened the first time, it will ask you to crete an index. Accept the default options as is. As logs get ingested,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: crete

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

## Default metrics
Following metrics are collected by default:
* Elafros controller metrics
* Istio metrics (mixer, envoy and pilot)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should talk about how we get Build metrics into here.

@imjasonh FYI

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure thing. I added two sample metrics into ela-controller and I will be more than happy to make the same for build-controller. Let's sync up on this one.

To enable log collection from other containers and destinations, edit fluentd-es-configmap.yaml (search for "fluentd-containers.log" for the starting point). Then run the following:
```shell
kubectl delete -f config/monitoring/fluentd/fluentd-es-ds.yaml
kubectl delete -f config/monitoring/fluentd/fluentd-es-configmap.yaml
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe what you want is kubectl replace?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, didn't know such a command existed :-)


## Generating logs
Use glog to write logs in your code. In your container spec, add the following args to redirect the logs to stderr:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

link to glog godoc?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will add. Thanks!

Copy link
Contributor

@grantr grantr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @mdemirhan! I have a few questions, but nothing major. I haven't reviewed any of the yaml files for correctness. I trust you've verified they work 😃

@@ -170,6 +171,7 @@ func handler(w http.ResponseWriter, r *http.Request) {
}

func main() {
flag.Parse()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Were any flags added or did they already exist?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Command line arguments are already being passed to the executable. This is to ensure that the arguments are passed and not ignored.

@@ -183,6 +184,7 @@ func handler(w http.ResponseWriter, r *http.Request) {
}

func main() {
flag.Parse()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Were any flags added or did they already exist?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Command line arguments are already being passed to the executable. This is to ensure that the arguments are passed and not ignored.

```
Then open Kibana UI at this [link](http://localhost:8001/api/v1/namespaces/monitoring/services/kibana-logging/proxy/app/kibana)
(*it might take a couple of minutes for the proxy to work*).
When Kibana is opened the first time, it will ask you to crete an index. Accept the default options as is. As logs get ingested,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo nit: "create an index"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I will fix that.

@@ -31,6 +35,9 @@ func handler(w http.ResponseWriter, r *http.Request) {
}

func main() {
flag.Parse()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this for glog flags?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is for all command line flags - glog is one consumer of the command line flags. And if flags are not parsed before any glog calls, glog writes an error message to stderr and that was causing a lot of noise in the logs.

@@ -142,10 +128,6 @@ func MakeElaPodSpec(u *v1alpha1.Revision) *corev1.PodSpec {
Name: nginxConfigMapName,
ReadOnly: true,
},
{
MountPath: nginxLogVolumeMountPath,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to change the nginx config now that its log destination isn't this volume?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed fluentd sidecar container that is watching for nginx log volume. That volume is no longer of use and I removed it from the pod definition as well. As the next step, I will remove nginx completely - waiting on Ian for enabling Istio's Envoy sidecar injection.

```
Then open Kibana UI at this [link](http://localhost:8001/api/v1/namespaces/monitoring/services/kibana-logging/proxy/app/kibana)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do the internal kibana links work when accessed via this url?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, they should work as long as you have kubectl proxy running.

@mdemirhan
Copy link
Contributor Author

/retest

@mattmoor
Copy link
Member

/buildify

Copy link
Member

@mattmoor mattmoor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for splitting things up. A few comments inline

load("@k8s_object//:defaults.bzl", "k8s_object")


k8s_object(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a tool call buildifier that will format these for you. I added a plugin to Prow so that /buildify will report when files aren't properly formed and tell you the command to run them, but I think it's struggling with our private repo :(

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will run the tool. Thanks for pointing that out!

k8s_object(
name = "istio",
template = "istio.yaml",
visibility = ["//visibility:public"],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this specifically public vs. everything?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will fix.

)

k8s_objects(
name = "everything-extended",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add comments to these groupings?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do.

k8s_object(
name = "prometheus-obj",
template = "prometheus.yaml",
visibility = ["//visibility:public"],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why public?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On a brand new cluster, installing prometheus fails with an error that says "monitoring.coreos.com/v1" not found. I found that installing prometheus after installing service monitors and grafana mitigates this issue. I have yet to find the root cause of this issue (I have seen others hitting this issue with other components and they were asked to open a bug to k8s team - not sure if it is the same).

@mdemirhan
Copy link
Contributor Author

/retest

name = "namespace",
template = "namespace.yaml",
visibility = ["//visibility:public"],
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this should just move up to config/monitoring?

@@ -0,0 +1,122 @@
# Copyright 2018 Google LLC
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file has a bit more than a StatefulSet. Perhaps break it apart or rename?

@mdemirhan
Copy link
Contributor Author

/retest

mdemirhan added 3 commits March 12, 2018 11:20
Copy link
Member

@mattmoor mattmoor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm eager to try this out :)

## Default logs
Deployment above enables collection of the following logs:
* stdout & stderr from all ela-container
* stdout & stderr from build-controller
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you really mean build-controller and not the user's build container?

@mdemirhan mdemirhan merged commit cf463ba into knative:master Mar 12, 2018
@mdemirhan mdemirhan deleted the mdemirhan/logging branch March 12, 2018 23:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants