-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Elasticsearch, Kibana and fluentd to collect, store and view logs. #327
Conversation
mdemirhan
commented
Mar 8, 2018
- Removed the fluentd sidecar container from ela pods
- Added a fluentd daemonset with a fixed configuration to collect ela and build container logs.
- Added a "dev" option to setup log collection for ela controllers (see telemetry.md for more information)
- Refactored the existing configurations into a clearer folder structure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left a few comments, but I'd find this a bit easier to review if we broke this down directory by directory.
config/monitoring/BUILD
Outdated
template = "prometheus-operator/service.yaml", | ||
) | ||
|
||
k8s_objects( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you'd like, I can show you how to use a Bazel glob()
to make this a bit more concise?
config/monitoring/BUILD
Outdated
|
||
k8s_object( | ||
name = "prometheus-operator-service", | ||
template = "prometheus-operator/service.yaml", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file starts to get pretty long. Can we move these rules into BUILD.bazel
files in each directory, and they can export their :prometheus-operator
k8s_objects
group?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated PR with individual BUILD files. Thanks!
1. Forward the Grafana server to your machine: | ||
2. **everything-dev**: This configuration collects everything in (1) plus Elafros controller logs. | ||
```shell | ||
bazel run config/monitoring:everything-dev.apply |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should talk sometime about how we release these configs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed.
docs/telemetry.md
Outdated
``` | ||
Then open Kibana UI at this [link](http://localhost:8001/api/v1/namespaces/monitoring/services/kibana-logging/proxy/app/kibana) | ||
(*it might take a couple of minutes for the proxy to work*). | ||
When Kibana is opened the first time, it will ask you to crete an index. Accept the default options as is. As logs get ingested, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: crete
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
## Default metrics | ||
Following metrics are collected by default: | ||
* Elafros controller metrics | ||
* Istio metrics (mixer, envoy and pilot) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should talk about how we get Build metrics into here.
@imjasonh FYI
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure thing. I added two sample metrics into ela-controller and I will be more than happy to make the same for build-controller. Let's sync up on this one.
docs/telemetry.md
Outdated
To enable log collection from other containers and destinations, edit fluentd-es-configmap.yaml (search for "fluentd-containers.log" for the starting point). Then run the following: | ||
```shell | ||
kubectl delete -f config/monitoring/fluentd/fluentd-es-ds.yaml | ||
kubectl delete -f config/monitoring/fluentd/fluentd-es-configmap.yaml |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe what you want is kubectl replace
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, didn't know such a command existed :-)
docs/telemetry.md
Outdated
|
||
## Generating logs | ||
Use glog to write logs in your code. In your container spec, add the following args to redirect the logs to stderr: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
link to glog
godoc?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will add. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @mdemirhan! I have a few questions, but nothing major. I haven't reviewed any of the yaml files for correctness. I trust you've verified they work 😃
cmd/ela-autoscaler/main.go
Outdated
@@ -170,6 +171,7 @@ func handler(w http.ResponseWriter, r *http.Request) { | |||
} | |||
|
|||
func main() { | |||
flag.Parse() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Were any flags added or did they already exist?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Command line arguments are already being passed to the executable. This is to ensure that the arguments are passed and not ignored.
cmd/ela-queue/main.go
Outdated
@@ -183,6 +184,7 @@ func handler(w http.ResponseWriter, r *http.Request) { | |||
} | |||
|
|||
func main() { | |||
flag.Parse() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Were any flags added or did they already exist?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Command line arguments are already being passed to the executable. This is to ensure that the arguments are passed and not ignored.
docs/telemetry.md
Outdated
``` | ||
Then open Kibana UI at this [link](http://localhost:8001/api/v1/namespaces/monitoring/services/kibana-logging/proxy/app/kibana) | ||
(*it might take a couple of minutes for the proxy to work*). | ||
When Kibana is opened the first time, it will ask you to crete an index. Accept the default options as is. As logs get ingested, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo nit: "create an index"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I will fix that.
sample/helloworld/helloworld.go
Outdated
@@ -31,6 +35,9 @@ func handler(w http.ResponseWriter, r *http.Request) { | |||
} | |||
|
|||
func main() { | |||
flag.Parse() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this for glog flags?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is for all command line flags - glog is one consumer of the command line flags. And if flags are not parsed before any glog calls, glog writes an error message to stderr and that was causing a lot of noise in the logs.
pkg/controller/revision/ela_pod.go
Outdated
@@ -142,10 +128,6 @@ func MakeElaPodSpec(u *v1alpha1.Revision) *corev1.PodSpec { | |||
Name: nginxConfigMapName, | |||
ReadOnly: true, | |||
}, | |||
{ | |||
MountPath: nginxLogVolumeMountPath, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to change the nginx config now that its log destination isn't this volume?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed fluentd sidecar container that is watching for nginx log volume. That volume is no longer of use and I removed it from the pod definition as well. As the next step, I will remove nginx completely - waiting on Ian for enabling Istio's Envoy sidecar injection.
``` | ||
Then open Kibana UI at this [link](http://localhost:8001/api/v1/namespaces/monitoring/services/kibana-logging/proxy/app/kibana) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do the internal kibana links work when accessed via this url?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, they should work as long as you have kubectl proxy running.
# Conflicts: # pkg/controller/revision/ela_pod.go
/retest |
/buildify |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for splitting things up. A few comments inline
config/monitoring/fluentd/BUILD
Outdated
load("@k8s_object//:defaults.bzl", "k8s_object") | ||
|
||
|
||
k8s_object( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a tool call buildifier
that will format these for you. I added a plugin to Prow so that /buildify
will report when files aren't properly formed and tell you the command to run them, but I think it's struggling with our private repo :(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will run the tool. Thanks for pointing that out!
config/monitoring/fluentd/BUILD
Outdated
k8s_object( | ||
name = "istio", | ||
template = "istio.yaml", | ||
visibility = ["//visibility:public"], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this specifically public vs. everything?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will fix.
) | ||
|
||
k8s_objects( | ||
name = "everything-extended", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add comments to these groupings?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do.
k8s_object( | ||
name = "prometheus-obj", | ||
template = "prometheus.yaml", | ||
visibility = ["//visibility:public"], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why public?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On a brand new cluster, installing prometheus fails with an error that says "monitoring.coreos.com/v1" not found. I found that installing prometheus after installing service monitors and grafana mitigates this issue. I have yet to find the root cause of this issue (I have seen others hitting this issue with other components and they were asked to open a bug to k8s team - not sure if it is the same).
/retest |
config/monitoring/common/BUILD
Outdated
name = "namespace", | ||
template = "namespace.yaml", | ||
visibility = ["//visibility:public"], | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if this should just move up to config/monitoring
?
@@ -0,0 +1,122 @@ | |||
# Copyright 2018 Google LLC |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file has a bit more than a StatefulSet
. Perhaps break it apart or rename?
/retest |
# Conflicts: # pkg/controller/revision/ela_pod.go # sample/helloworld/helloworld.go
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm eager to try this out :)
## Default logs | ||
Deployment above enables collection of the following logs: | ||
* stdout & stderr from all ela-container | ||
* stdout & stderr from build-controller |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you really mean build-controller
and not the user's build container?