Rules in case of server=<UUID> #12

filippog · 2016-12-16T20:40:52Z

Following up from #8

In Varnish 4 varnishstat -j will return VCL UUIDs in non-cold state (from varnishadm vcl.list) (e.g. busy or warm) and in turn the UUID is parsed as value for server Prometheus tag.
In Varnish 4 each time a VCLs is compiled and loaded it is assigned a new UUID, in cases where VCLs are reloaded frequently this can cause metric churn in the Prometheus server and the server tag itself isn't very meaningful in most cases.

varnish_backend_req{backend="be_ununpentium_wikimedia_org",server="02610319-7c2e-4154-8298-b991fc0b1042"} 0
varnish_backend_req{backend="be_ununpentium_wikimedia_org",server="9ffc72df-697e-42d1-8598-a06497f56ace"} 0
varnish_backend_req{backend="be_ununpentium_wikimedia_org",server="c558b103-d1ca-4a27-b529-5492e21de38e"} 0

In these cases my suggestion would be to recommend to users a set of recording rules to use to get meaningful aggregated metrics per-backend instead, e.g.

backend:varnish_backend_bereq_bodybytes:sum = sum(varnish_backend_bereq_bodybytes) without (server)
backend:varnish_backend_bereq_hdrbytes:sum = sum(varnish_backend_bereq_hdrbytes) without (server)
backend:varnish_backend_beresp_bodybytes:sum = sum(varnish_backend_beresp_bodybytes) without (server)
backend:varnish_backend_beresp_hdrbytes:sum = sum(varnish_backend_beresp_hdrbytes) without (server)
backend:varnish_backend_conn:sum = sum(varnish_backend_conn) without (server)
backend:varnish_backend_happy:sum = sum(varnish_backend_happy) without (server)
backend:varnish_backend_pipe_hdrbytes:sum = sum(varnish_backend_pipe) without (server)
backend:varnish_backend_pipe_in:sum = sum(varnish_backend_pipe_in) without (server)
backend:varnish_backend_pipe_out:sum = sum(varnish_backend_pipe_out) without (server)
backend:varnish_backend_req:sum = sum(varnish_backend_req) without (server)

The text was updated successfully, but these errors were encountered:

jonnenauha · 2016-12-19T10:27:26Z

Where does these recording rules go? I'm presuming they go to prometheus config?

Would you like to make a pull request to the README.md that has this explained and details where to put those recording rules to avoid metrics churn.

There could be new title

P.S. I now think I made the graphana queries do this merge for me, or at least I dropped the server label so it did not show up in the dashboards. But your way is probably better.

Configuration considerations

Varnish reports inactive VCLs for some time after VCL reloads. This exporter identifies VCL instances with the server label. For setups that frequently reload VCLs this can cause metrics churn and hard to read dashboards. Here is how to sum up all VCLs by backend name, ignoring the server label. This is optional if you rarely reload VCLs, the inactive VCLs will be removed by Varnish after some time.

the rules and info where they should be put

filippog mentioned this issue Dec 16, 2016

Varnish 4 backend UUID causing metric churn #8

Closed

filippog mentioned this issue Dec 19, 2016

Recommend recording rules for backend metrics #13

Merged

jonnenauha closed this as completed in 87700bf Dec 27, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rules in case of server=<UUID> #12

Rules in case of server=<UUID> #12

filippog commented Dec 16, 2016

jonnenauha commented Dec 19, 2016 •

edited

Loading

Rules in case of server=<UUID> #12

Rules in case of server=<UUID> #12

Comments

filippog commented Dec 16, 2016

jonnenauha commented Dec 19, 2016 • edited Loading

Configuration considerations

jonnenauha commented Dec 19, 2016 •

edited

Loading