Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create a public registry interface and separate out HTTP exposition #214

Merged
merged 11 commits into from
Aug 15, 2016

Conversation

beorn7
Copy link
Member

@beorn7 beorn7 commented Aug 1, 2016

@grobie who might be most keen on this.
@juliusv whom I gave a short preview in person.
But then probably everybody else is interested, @fabxc , @brian-brazil , …

General context and approch

This is the first part of the long awaited wider refurbishment of
client_golang/prometheus/.... After a lot of struggling, I decided
to not go for one breaking big-bang, but cut things into smaller steps
after all, mostly to keep the changes manageable and easy to
review. I'm aiming for having the invasive breaking changes
concentrated in as few steps as possible (ideally one). Some steps
will not be breaking at all, but typically there will be breaking
changes that only affect quite special cases so that 95+% of users
will not be affected. This first step is an example for that, see
details below.

What's happening in this commit?

This step is about finally creating an exported registry
interface. This could not be done by simply export the existing
internal implementation because the interface would be way too
fat. This commit introduces a qutie lean Registry interface
(compared to the previous interval implementation). The functions that
act on the default registry are retained (with very few exceptions) so
that most use cases won't see a change. However, several of those are
deprecated now to clean up the namespace in the future.

The default registry is kept in the public variable
DefaultRegistry. This follows the example of the http package in the
standard library (cf. http.DefaultServeMux, http.DefaultClient)
with the same implications. (This pattern is somewhat disputed within
the Go community but I chose to go with the devil you know instead of
creating something more complex or even disallowing any changes to the
default registry. The current approach gives everybody the freedom to
not touch DefaultRegistry or to do everything with a custom registry
to play save.)

Another important part in making the registry lean is the extraction
of the HTTP exposition, which also allows for customization of the
HTTP exposition. Note that the separation of metric collection and
exposition has the side effect that managing the MetricFamily and
Metric protobuf objects in a free-list or pool isn't really feasible
anymore. By now (with better GC in more recent Go versions), the
returns were anyway dimisishing. To be effective at all, scrapes had
to happen more often than GC cycles, and even then most elements of
the protobufs (everything excetp the MetricFamily and Metric structs
themselves) would still cause allocation churn. In a future breaking
change, the signature of the Write method in the Metric interface will
be adjusted accordingly. In this commit, avoiding breakage is more
important.

The following issues are fixed by this commit (some solved "on the
fly" now that I was touching the code anyway and it would have been
stupid to port the bugs):
#46
#100
#170
#205

Documentation including examples have been amended as required.

What future changes does this commit enable?

The following items are not yet implemented, but this commit opens the
possibility of implementing these independently.

  • The separation of the HTTP exposition allows the implementation of
    other exposition methods based on the Registry interface, as known
    from other Prometheus client libraries, e.g. sending the metrics to
    Graphite.
    Cf. Provide graphite bridge #197
  • The public Registry interface allows the implementation of
    convenience tools for testing metrics collection. Those tools can
    inspect the collected MetricFamily protobufs and compare them to
    expectation. Also, tests can use their own testing instance of a
    registry.
    Cf. Improve testability #58

Notable non-goals of this commit

Non-goals that will be tackled later

The following two issues are quite closely connected to the changes in
this commit but the line has been drawn deliberately to address them
in later steps of the refurbishment:

  • InstrumentHandler has many known problems. The plan is to create a
    saner way to conveniently intrument HTTP handlers and remove the old
    InstrumentHandler altogether. To keep breakage low for now, even
    the default handler to expose metrics is still using the old
    InstrumentHandler. This leads to weird naming inconsistencies but
    I have deemed it better to not break the world right now but do it
    in the change that provides better ways of instrumenting HTTP
    handlers.
    Cf. Race with httputil.ReverseProxy #200
  • There is work underway to make the whole handling of metric
    descriptors (Desc) more intuitive and transparent for the user
    (including an ability for less strict checking,
    cf. Allow more flexible metric exposition. #47). That's
    quite invasive from the perspective of the internal code, namely the
    registry. I deliberately kept those changes out of this commit.
  • While this commit adds new external dependency, the effort to vendor
    anything within the library that is not visible in any exported
    types will have to be done later.

Non-goals that might be tackled later

There is a strong and understandable urge to divide the prometheus
package into a number of sub-packages (like registry, collectors,
http, metrics, …). However, to not run into a multitude of
circular import chains, this would need to break every single existing
usage of the library. (As just one example, if the ubiquitious
prometheus.MustRegister (with more than 2,000 uses on GitHub alone)
is kept in the prometheus package, but the other registry concerns
go into a new registry package, then the prometheus package would
import the registry package (to call the actual register method),
while at the same time the registry package needs to import the
prometheus package to access Collector, Metric, Desc and
more. If we moved MustRegister into the registry package,
thousands of code lines would have to be fixed (which would be easy if
the world was a mono repo, but it is not). If we moved everything else
the proposed registry package needs into packages of their own, we
would break thousands of other code lines.)

The main problem is really the top-level functions like
MustRegister, Handler, …, which effectively pull everything into
one package. Those functions are however very convenient for the easy
and very frequent use-cases.

This problem has to be revisited later.

For now, I'm trying to keep the amount of exported names in the
package as low as possible (e.g. I unexported expvarCollector in this
commit because the NewExpvarCollector constructor is enough to export,
and it is now consistent with other collectors, like the goCollector).

Non-goals that won't be tackled anytime soon

Something that I have played with a lot is "streaming collection",
i.e. allow an implementation of the Registry interface that collects
metrics incrementally and serves them while doing so. As it has turned
out, this has many many issues and makes the Registry interface very
clunky. Eventually, I made the call that it is unlikely we will really
implement streaming collection; and making the interface more clunky
for something that might not even happen is really a big no-no. Note
that the Registry interface only creates the in-memory
representation of the metric family protobufs in one go. The
serializaton onto the wire can still be handled in a streaming fashion
(which hasn't been done so far, without causing any trouble, but might
be done in the future without breaking any interfaces).

What are the breaking changes?

  • Signatures of functions pushing to Pushgateway have changed to allow
    arbitrary grouping (which was planned for a long time anyway, and
    now that I had to work on the Push code anyway for the registry
    refurbishment, I finally did it,
    cf. Change signature for functions pushing to pushgateway. #100).
    With the gained insight that pushing to the default registry is almost
    never the right thing, and now that we are breaking the Push call
    anyway, all the Push functions were moved to their own package,
    which cleans up the namespace and is more idiomatic (pushing
    Collectors is now literally done by push.Collectors(...)).
  • The registry is doing more consistency checks by default now. Past
    creators of inconsistent metrics could have masked the problem by
    not setting EnableCollectChecks. Those inconsistencies will now be
    detected. (But note that a "best effort" metrics collection is now
    possible with HandlerOpts.ErrorHandling = ContinueOnError.)
  • EnableCollectChecks is gone. The registry is now performing some
    of those checks anyway (see previous item), and a registry with all
    of those checks can now be created with NewPedanticRegistry (only
    used for testing).
  • PanicOnCollectError is gone. This behavior can now be configured
    when creating a custom HTTP handler.

This change is Reviewable

@beorn7
Copy link
Member Author

beorn7 commented Aug 1, 2016

Oops... will fix test soon.

// protobufs are consistent and valid for Prometheus to ingest (e.g. no
// duplicate metrics, no invalid identifiers). In scenarios where
// complete collection is critical, the returned MetricFamily protobufs
// should be disregareded if the returned error is non-nil.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

disregarded

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@brian-brazil
Copy link
Contributor

brian-brazil commented Aug 1, 2016

It's great to finally see this.

There is a strong and understandable urge to divide the prometheus package into a number of sub-packages (like registry, collectors, http, metrics, …). However, to not run into a multitude of

It boils down mostly to a question of dependencies. Like Python, Go comes with HTTP support out of the box so having it all in one package is practical. By contrast that's not the case with Java where we need to pull in external dependencies to work.

Where we run into issues here is protobuf, as that shouldn't be a dependency of the instrumentation (registry+collector+metrics+http instrumentation) part of the library.

As just one example, if the ubiquitious prometheus.MustRegister (with more than 2,000 uses on GitHub alone)

To be honest, we should be replacing most of those uses in favour of the metrics auto-registering by default.

What I've done elsewhere is to have the register method on the registry, but also a utility method on the metric for convenience.

// default registry. The other global functions are not methods in the Registry
// interface but there are utility functions that implement the same
// functionality on top of the interface and take a Registry instance as a
// parameter. Those are: MustRegister → MustRegisterWith, RegisterOrGet →
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems a rather big interface, and not following a OOP model by having them as standalone functions rather than methods.

Can we remove the Get methods? The only time I've seen them used is in cases where a ConstMetric should have been used instead.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this is a problem with Go's desire for lean interfaces and at the same time not being a "proper" OOP language. This is essentially a side effect of the absence of implementation inheritance. The standalone functions contain the code that would otherwise be repeated in every implementation of the Registry interface. While this pattern might appear weird, it's very common in the standard library, e.g. ioutiles.ReadAll(io.Reader) instead of having a ReadAll method in the Reader interface.

WRT removing the Get methods: We can definitely consider that. I had a use case, but if it's rare, it can still be implemented by the few users in the same way as it is done here.

However, I don't want to have any avoidable breaking changes in this PR. So that people can have the public registry without breaking changes. Future iterations will be more aggressive WRT breaking existing usage, and then we can consider removal.

Copy link
Contributor

@brian-brazil brian-brazil Aug 1, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How many registry implementations are you expecting? We've not seen a need in the other languages, and of all the other instrumentation libraries out there I can think of only one which is even vaguely this general.

I think we can simplify by having Registry as a type rather than an interface.

However, I don't want to have any avoidable breaking changes in this PR. So that people can have the public registry without breaking changes. Future iterations will be more aggressive WRT breaking existing usage, and then we can consider removal.

I'm expecting us to have to leave a lot of cruft lying around for a while for the purposes of transition.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having an interface is the "fail safe" approach. Because this library is used quite widely, I assume a lot of requirements. I have observed the recent tendency to go for interfaces in doubt, and may people complaining about std library types that were not done as interfaces.

The normal use case is anyway to use the default registry. So only power user are even concerned with the registry as an own type. So I expect a higher percentage of "weird" use cases there, which we cannot anticipate right now. In any case, we don't have to optimize the registry itself for the naive user, as the latter will use the top-level functions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My presumption would be that anything custom could be done without needing an interface. You could accept collectors and then be a standard collector yourself, and go out via a normal Registry to the exposition formats.

I don't at all like all these top-level functions, it feels very disjointed. A user should never have to touch the top-level functions, only instrumentation.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like the number of top-level identifiers, either.
However, bundling them up in classes in a language that doesn't even have proper classes is not really improving anything.

A real solution would be a meaningful re-packaging and of course avoiding many top-level identifiers in the first place. As re-packaging is only an option later (at the price of breaking the world), let's apply extra scrutiny to reducing the number of identifiers.

In that spirit, I'll declare the old ...OrGet top level function as deprecated, and will not have a utility function for custom registries anymore (will be done in the next push).

Making the exported Registry a struct rather than an interface is a different question. I'll think about it... Other opinions welcome.

@beorn7
Copy link
Member Author

beorn7 commented Aug 1, 2016

Where we run into issues here is protobuf, as that shouldn't be a dependency of the instrumentation (registry+collector+metrics+http instrumentation) part of the library.

In the view of both the server and the Go client, the protobuf is the central data model of Prometheus.

As just one example, if the ubiquitious prometheus.MustRegister (with more than 2,000 uses on GitHub alone)
To be honest, we should be replacing most of those uses in favour of the metrics auto-registering by default.

I'm planning something like that for the next iteration. However, as Go has no exceptions, and panicking in a library if not explicitly called out (via Must…) is frowned upon in Go. So we cannot really auto-register in the same way as in Python because there is no way to report the error.

What I've done elsewhere is to have the register method on the registry, but also on the metric for convenience.

My plan is essentially to have a convenience .MustRegister() method on the metrics that return the registered metric so that you can write

var (
    cpuTemp = prometheus.NewGauge(prometheus.GaugeOpts{
        Name: "cpu_temperature_celsius",
        Help: "Current temperature of the CPU.",
    }).MustRegister()
)

But yeah, not really part of this PR.

// parameter if metrics should be pushed with the hostname as label. The
// returned map is created upon each call so that the caller is free to add more
// labels to the map.
func GroupByHostname() map[string]string {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is called instance_ip_grouping_key and instanceIPGroupingKey in other clients, we should be consistent.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is also the IP in other clients, as this is meant to be replacing the old pgw behaviour of adding in the client IP.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, definitely.

Question: Did you intentionally pick the IP number rather than the hostname? That needed to be changed here then, too.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Race condition :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason for attaching the IP number was that we had no other information on the Pushgateway.

Picking the IP-number on the side of the client is a bit hard. It's not trivial through with IP-number the push will go out. And IP-numbers might get translated, or they might be long IPv6 numbers...

I'd actually favor the hostname here. (It's a convenience function anyway, and might even not be used a lot.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd expect IPs to be more unique than hostnames, as it's not a FQDN.

I think we should aim for consistency here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How have you picked the IP number in the other clients? The first one that doesn't look like localhost?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Java has methods for it. For Python we bind to a socket and see what the IP is (which turns out not to be terribly portable).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, so the Go stdlibrary allows to list all network interfaces. Picking one seems really adventurous to me. I dislike the IP number idea more and more. The hostname may be pretty arbitrary, too, but at least it's well defined. I like that more than the IP number. But I'm tempted to remove this convenience function completely (it is rarely useful anyway).

Does anybody else have an opinion on this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, and I strongly prefer hostname over IP.

With mutliple NICs, bonding, docker bridges, etc. there is an endless possibility to pick the wrong IP. On the other side, the hostname has a clear definition and retrieval interface. I've had to write code to retrieve the internal network IP of all our nodes before and it required many exceptions and knowledge about our network to not pick a wrong one. I've seen the same issues with other software trying to auto-pick the right IP (e.g. Consul).

@brian-brazil
Copy link
Contributor

Where we run into issues here is protobuf, as that shouldn't be a dependency of the instrumentation (registry+collector+metrics+http instrumentation) part of the library.
In the view of both the server and the Go client, the protobuf is the central data model of Prometheus.

Basically if I'm exporting to Graphite via a Prometheus client library, why should I be pulling in protobuf?

This is a really important use case for me, I don't want any friction to users using our instrumentation library.

To be honest, we should be replacing most of those uses in favour of the metrics auto-registering by default.
I'm planning something like that for the next iteration. However, as Go has no exceptions, and panicking in a library if not explicitly called out (via Must…) is frowned upon in Go. So we cannot really auto-register in the same way as in Python because there is no way to report the error.

I don't think it'd be inappropriate to fatal when this happens. We should design the library with the presumption that this is the way things are meant to be used.

My plan is essentially to have a convenience .MustRegister() method on the metrics that return the registered metric so that you can write

That's how Java works, but that's because of how we have to structure the options in Java.

In Go we don't have that limitation, so I expect that there'd be no such a method. Rather you'd pass a registry in with the rest of the options, and provide a special no-registry value in case that's needed.

// the injected protobufs.
//
// The hook function must be callable at any time and concurrently.
func NewRegistryWithInjectionHook(hook func() []*dto.MetricFamily) Registry {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure about this. This would require a custom registry for the node exporter, which doesn't feel right.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a valid concern.

But I really don't want the super-special use-case of the injection hook in the interface.

On the other hand, we probably want error logging in the node exporter anyway, with all the possible things that can go wrong during collection. In that case, you need a custom registry anyway to configure the logger.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could adjust the node exporter not to use it, and then remove the feature.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pushgateway uses it, too.

Conclusion for now: NodeExporter will need to use a custom registry anyway (for logging).
So I'll leave it here as it is. Fine?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also remove it from the pgw then, and use a collector instead. Given how the node exporter already works it doesn't require a custom registry, just a custom collector.

Needing 2^n factory functions depending on options doesn't seem like a good idea. We've already two options.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On 2 August 2016 at 14:28, Brian Brazil notifications@github.com wrote:

We're creating a new API, it's the time to look any any ways we can simplify that interface. Two use cases that could be implemented in another fashion doesn't justify a public API to me.

Yes, that mirrors an argument we had in the discussion back in early
2013. As said, I thought about changes in the circumstances, but I
couldn't detect any relevant changes, so I don't see a different
outcome of the discussion.

I tried to keep it out of the default path, but as you have brought up yourself, it would require a custom registry for the node exporter.

I never stated it must require a custom registry, merely that the current proposed implementation forces it. I want to continue using use the default registry in the node exporter.

Yes, that's what I meant. By using the default registry in the node
exporter, we need a way to inject metric families with the default
registry. Since the conclusion after a discussion with many
stakeholders was that we need an entry point for that separate from
regular registration and collection, a first-class support in the
interface has to be the way until somebody comes up with a better idea
that changes the odds.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That discussion was 3 years ago, lots has happened since then.

The Collect interface is already defined in terms of protos via Metric, converting doesn't seem like it should be a big deal.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On 2 August 2016 at 14:56, Brian Brazil notifications@github.com wrote:

That discussion was 3 years ago, lots has happened since then.

The Collect interface is already defined in terms of protos via Metric, converting doesn't seem like it should be a big deal.

To have a meaningful discussion, you have to trust me. I'm not trying
to deceive you.

I considered exactly what you said. While the code back then (as
sketched out by Bernerd) is not line-by-line exactly the same as it
would be today, it would be very similar in essence. We would still
convert a good part of a protobuf object into a different
representation, stuff it into (nowadays) a ConstMetric of the
appropriate type, just to convert everything back then.

Björn Rabenstein, Engineer
http://soundcloud.com/brabenstein

SoundCloud Ltd. | Rheinsberger Str. 76/77, 10115 Berlin, Germany
Managing Director: Alexander Ljung | Incorporated in England & Wales
with Company No. 6343600 | Local Branch Office | AG Charlottenburg |
HRB 110657B

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We would still convert a good part of a protobuf object into a different
representation, stuff it into (nowadays) a ConstMetric of the
appropriate type, just to convert everything back then.

You can guess what my opinion on how we resolve that is :)

Given the current code, it's not that bad. You need to create a Desc from the MetricFamily, and can then send each Metric down the channel.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On 2 August 2016 at 15:29, Brian Brazil notifications@github.com wrote:

Given the current code, it's not that bad. You need to create a Desc from
the MetricFamily, and can then send each Metric down the channel.

Yes, I'm a decent enough programmer to know how I would implement that. It
was considered in my decision.

Björn Rabenstein, Engineer
http://soundcloud.com/brabenstein

SoundCloud Ltd. | Rheinsberger Str. 76/77, 10115 Berlin, Germany
Managing Director: Alexander Ljung | Incorporated in England & Wales with
Company No. 6343600 | Local Branch Office | AG Charlottenburg | HRB 110657B

@beorn7
Copy link
Member Author

beorn7 commented Aug 1, 2016

Basically if I'm exporting to Graphite via a Prometheus client library, why should I be pulling in protobuf?

Because the canonical Prometheus data model is destribed in metrics.proto.

This is a really important use case for me, I don't want any friction to users using our instrumentation library.

Writing a consumer of the new Registry interface that calls Collect and sends the metrics to Graphite will be trivial now. The only friction is that the Go representation of protobuf is (arguably) difficult to work with, but it would only affect the one programmer who write the code exporting to Graphite (i.e. probably myself), and not our users.

The discussion if metrics.proto should really be the canonical representation of the Prometheus data model is a legitimate one, but way way out of scope of this PR.

@beorn7
Copy link
Member Author

beorn7 commented Aug 1, 2016

I don't think it'd be inappropriate to fatal when this happens. We should design the library with the presumption that this is the way things are meant to be used.

We are creating a library that aspires to be used in a massive amount of Go programs. I don't think we can work against fundamental conventions in the Go community while doing so.

In Go we don't have that limitation, so I expect that there'd be no such a method. Rather you'd pass a registry in with the rest of the options, and provide a special no-registry value in case that's needed.

Yes, that was my initial thought, too, but I couldn't find a way to do this in agreement with Go conventions.

But this is another discussion I deliberately didn't want to have now. Let's have it later when I going for the metrics themselves.

@brian-brazil
Copy link
Contributor

The discussion if metrics.proto should really be the canonical representation of the Prometheus data model is a legitimate one, but way way out of scope of this PR.

The question here is what dependencies the instrumentation library should have, which is a different question to how we mentally think about Prometheus data.

We need to accept that some users have aversions to protobuf (in the same way you or I would about XML), and aversions to seemingly unnecessary dependencies as that's a sign of a bloated library. As you say "We are creating a library that aspires to be used in a massive amount of Go programs."

// Metric instances “on the fly” using NewConstMetric, NewConstHistogram, and
// NewConstSummary (and their respective Must… versions). That will happen in
// the Collect method, while the Describe method has to return separate Desc
// instances, the very same you later use to create the constant “throw-away”
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit hard to parse

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tried to reword.

(Note that this part will definitely be changed in later iterations. No need to touch Desc directly for const metrics.)

@beorn7
Copy link
Member Author

beorn7 commented Aug 1, 2016

To be clear: The question if protobuf should disappear from client_golang is not on the table right now. That discussion has to happen outside of this PR (and in fact outside of the refurbishment I'm planning).

type HandlerOpts struct {
// ErrorLog specifies an optional logger for errors collecting and
// serving metrics. If nil, errors are not logged at all.
ErrorLog *log.Logger
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should take an interface.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it should.

Sadly, the standard library only offers the log.Logger type, which is a struct. I'd love to use an interface like https://github.com/go-kit/kit/blob/master/log/log.go#L10 but I don't think we can coerce users into using a particular logging library. By using the standard library log.Logger, we allow everybody to use it with any library that provides an adapter, e.g. https://github.com/go-kit/kit/blob/master/log/stdlib.go#L54 .

Full example:

package main

import (
    stdlog "log"
    "os"

    "github.com/go-kit/kit/log"
    "github.com/prometheus/client_golang/prometheus"
)

func main() {
    logger := log.NewLogfmtLogger(os.Stdout)

    handler := prometheus.HandlerFor(prometheus.DefaultRegistry, prometheus.HandlerOpts{
        ErrorLog: stdlog.New(log.NewStdlibAdapter(logger), "", 0),
    })
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is also in line with https://golang.org/pkg/net/http/#Server , i.e. it's the devil people know.

Copy link
Contributor

@fabxc fabxc Aug 2, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Known problem that stdlib should've made it an interface. I think so far a good practice was just adding your own interface with the minimal methods being used, i.e. type Logger interface { Println(...interface{}) error }.

Allows passing in almost all loggers without relying on go-kit (but can still be used).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And BTW, @brian-brazil , this is an example, where historically, a struct was chosen over an interface, and now everybody is sad about it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fabxc Since we are only using Println, that's indeed doable. I'll rework in the next push. (There is the potential for a future problem should we decide to use more methods of the logger. But that's unlikely.)

General context and approch
===========================

This is the first part of the long awaited wider refurbishment of
`client_golang/prometheus/...`. After a lot of struggling, I decided
to not go for one breaking big-bang, but cut things into smaller steps
after all, mostly to keep the changes manageable and easy to
review. I'm aiming for having the invasive breaking changes
concentrated in as few steps as possible (ideally one). Some steps
will not be breaking at all, but typically there will be breaking
changes that only affect quite special cases so that 95+% of users
will not be affected. This first step is an example for that, see
details below.

What's happening in this commit?
================================

This step is about finally creating an exported registry
interface. This could not be done by simply export the existing
internal implementation because the interface would be _way_ too
fat. This commit introduces a qutie lean `Registry` interface
(compared to the previous interval implementation). The functions that
act on the default registry are retained (with very few exceptions) so
that most use cases won't see a change. However, several of those are
deprecated now to clean up the namespace in the future.

The default registry is kept in the public variable
`DefaultRegistry`. This follows the example of the http package in the
standard library (cf. `http.DefaultServeMux`, `http.DefaultClient`)
with the same implications. (This pattern is somewhat disputed within
the Go community but I chose to go with the devil you know instead of
creating something more complex or even disallowing any changes to the
default registry. The current approach gives everybody the freedom to
not touch DefaultRegistry or to do everything with a custom registry
to play save.)

Another important part in making the registry lean is the extraction
of the HTTP exposition, which also allows for customization of the
HTTP exposition. Note that the separation of metric collection and
exposition has the side effect that managing the MetricFamily and
Metric protobuf objects in a free-list or pool isn't really feasible
anymore. By now (with better GC in more recent Go versions), the
returns were anyway dimisishing. To be effective at all, scrapes had
to happen more often than GC cycles, and even then most elements of
the protobufs (everything excetp the MetricFamily and Metric structs
themselves) would still cause allocation churn. In a future breaking
change, the signature of the Write method in the Metric interface will
be adjusted accordingly. In this commit, avoiding breakage is more
important.

The following issues are fixed by this commit (some solved "on the
fly" now that I was touching the code anyway and it would have been
stupid to port the bugs):

#46
#100
#170
#205

Documentation including examples have been amended as required.

What future changes does this commit enable?
============================================

The following items are not yet implemented, but this commit opens the
possibility of implementing these independently.

- The separation of the HTTP exposition allows the implementation of
  other exposition methods based on the Registry interface, as known
  from other Prometheus client libraries, e.g. sending the metrics to
  Graphite.
  Cf. #197

- The public `Registry` interface allows the implementation of
  convenience tools for testing metrics collection. Those tools can
  inspect the collected MetricFamily protobufs and compare them to
  expectation. Also, tests can use their own testing instance of a
  registry.
  Cf. #58

Notable non-goals of this commit
================================

Non-goals that will be tackled later
------------------------------------

The following two issues are quite closely connected to the changes in
this commit but the line has been drawn deliberately to address them
in later steps of the refurbishment:

- `InstrumentHandler` has many known problems. The plan is to create a
  saner way to conveniently intrument HTTP handlers and remove the old
  `InstrumentHandler` altogether. To keep breakage low for now, even
  the default handler to expose metrics is still using the old
  `InstrumentHandler`. This leads to weird naming inconsistencies but
  I have deemed it better to not break the world right now but do it
  in the change that provides better ways of instrumenting HTTP
  handlers.
  Cf. #200

- There is work underway to make the whole handling of metric
  descriptors (`Desc`) more intuitive and transparent for the user
  (including an ability for less strict checking,
  cf. #47). That's
  quite invasive from the perspective of the internal code, namely the
  registry. I deliberately kept those changes out of this commit.

- While this commit adds new external dependency, the effort to vendor
  anything within the library that is not visible in any exported
  types will have to be done later.

Non-goals that _might_ be tackled later
---------------------------------------

There is a strong and understandable urge to divide the `prometheus`
package into a number of sub-packages (like `registry`, `collectors`,
`http`, `metrics`, …). However, to not run into a multitude of
circular import chains, this would need to break every single existing
usage of the library. (As just one example, if the ubiquitious
`prometheus.MustRegister` (with more than 2,000 uses on GitHub alone)
is kept in the `prometheus` package, but the other registry concerns
go into a new `registry` package, then the `prometheus` package would
import the `registry` package (to call the actual register method),
while at the same time the `registry` package needs to import the
`prometheus` package to access `Collector`, `Metric`, `Desc` and
more. If we moved `MustRegister` into the `registry` package,
thousands of code lines would have to be fixed (which would be easy if
the world was a mono repo, but it is not). If we moved everything else
the proposed registry package needs into packages of their own, we
would break thousands of other code lines.)

The main problem is really the top-level functions like
`MustRegister`, `Handler`, …, which effectively pull everything into
one package. Those functions are however very convenient for the easy
and very frequent use-cases.

This problem has to be revisited later.

For now, I'm trying to keep the amount of exported names in the
package as low as possible (e.g. I unexported expvarCollector in this
commit because the NewExpvarCollector constructor is enough to export,
and it is now consistent with other collectors, like the goCollector).

Non-goals that won't be tackled anytime soon
--------------------------------------------

Something that I have played with a lot is "streaming collection",
i.e. allow an implementation of the `Registry` interface that collects
metrics incrementally and serves them while doing so. As it has turned
out, this has many many issues and makes the `Registry` interface very
clunky. Eventually, I made the call that it is unlikely we will really
implement streaming collection; and making the interface more clunky
for something that might not even happen is really a big no-no. Note
that the `Registry` interface only creates the in-memory
representation of the metric family protobufs in one go. The
serializaton onto the wire can still be handled in a streaming fashion
(which hasn't been done so far, without causing any trouble, but might
be done in the future without breaking any interfaces).

What are the breaking changes?
==============================

- Signatures of functions pushing to Pushgateway have changed to allow
  arbitrary grouping (which was planned for a long time anyway, and
  now that I had to work on the Push code anyway for the registry
  refurbishment, I finally did it,
  cf. #100).
  With the gained insight that pushing to the default registry is almost
  never the right thing, and now that we are breaking the Push call
  anyway, all the Push functions were moved to their own package,
  which cleans up the namespace and is more idiomatic (pushing
  Collectors is now literally done by `push.Collectors(...)`).

- The registry is doing more consistency checks by default now. Past
  creators of inconsistent metrics could have masked the problem by
  not setting `EnableCollectChecks`. Those inconsistencies will now be
  detected. (But note that a "best effort" metrics collection is now
  possible with `HandlerOpts.ErrorHandling = ContinueOnError`.)

- `EnableCollectChecks` is gone. The registry is now performing some
  of those checks anyway (see previous item), and a registry with all
  of those checks can now be created with `NewPedanticRegistry` (only
  used for testing).

- `PanicOnCollectError` is gone. This behavior can now be configured
  when creating a custom HTTP handler.
@beorn7
Copy link
Member Author

beorn7 commented Aug 2, 2016

Sooo, several things updated:

  • Setting the injection hook is now a 1st class method again. The discussion convinced me that it is required.
  • The ...OrGet type of functions are now deprecated, and no versions for a custom registry are provided.
  • I added a bit of notes about the messy deprecation situation with Handler vs. UninstrumentedHandler.
  • I implemented @fabxc suggestion about the logger interface.
  • I moved the push related stuff into its own package (which I really like).

That leaves us with really only one function that could be a method on the Registry interface (or struct, see below): MustRegister. In the struct situation, I would totally add it as a method, no controversy about that. Now that it's only one method left, we might even add it to the interface to avoid the one oddball. I'm still reluctant to make the interface a struct as this rules out things like creating test mocks or really plain implementation of your own registry, even if we don't see that as likely. Still thinking about that one. If more people than just Brian want to chime in, please do so.

A number of other top-level identifiers will disappear with future iterations (like NewDesc, BuildFQName, ...). If you also mentally remove the now deprecated functions and then look at the Godoc, it doesn't look that bad anymore.

Which leaves us with three remaining open questions (if I haven't missed anything):

  1. Registry interface vs. struct, see above
  2. The hostname (or IP) convenience grouping key for pushing (I renamed it HostnameGroupingKey for now, but we have to decide if we (a) leave it like it is now (b) remove it entirely or (c) try to guess a meaningful IP number like the other client libraries. (BTW: This is my order of preference, where (a) and (b) are not much different, but I'd really hate (c) as it seems very fragile.)
  3. The multi-error question: Should we (a) keep it as now (pulling in two hashicorp packages and potentially internally vendor them later) (b) copying over the code (which is poor-man's vendoring) or (c) implement our own solution. (This is again my preference, but here I tend quite strongly towards (a) and would really not like (b) and (c).

Since Brian and I have apparently exchanged all our arguments already, it would be good if others could chime in.

Note that I have changed the commit description according to all of this. I'll update the PR description manually soon. The breaking changes are really quite minimal by now.

@beorn7
Copy link
Member Author

beorn7 commented Aug 2, 2016

So this would fix #216, too. :)

@beorn7
Copy link
Member Author

beorn7 commented Aug 4, 2016

Minor doc improvement in the latest commit.

@beorn7
Copy link
Member Author

beorn7 commented Aug 4, 2016

In person discussion with @fabxc @juliusv @brancz resulted in:

  • let's have a trivial own multi-error implementation
  • Deliver is not a good name. It's kind of 2nd order collecting and merging. Brainstorm resulted in Gather to imply the collect-and-then-merge-together aspect.

I'll change that in two follow-up commits.

beorn7 added 2 commits August 4, 2016 15:26
The own implementation is much easier as it only has to serve our one
use case.
@beorn7
Copy link
Member Author

beorn7 commented Aug 4, 2016

OK, done. This should settle the fundamentals.
Please review at your earliest convenience as this fixes the most pressing issues of users.
I'd like to release quickly so that I can then work on those parts that will create more breakage (while the users can still benefit from the many low hanging fruit harvested here).

@crosbymichael
Copy link

This looks good to me. I was just coming to this repo to ask for these types to be exported so we don't always have this global registry.

// the injected protobufs.
//
// The hook function must be callable at any time and concurrently.
func (r *Registry) SetInjectionHook(hook func() []*dto.MetricFamily) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you thought about just using a custom gatherer instead of adding another function to the interface?

Copy link
Member Author

@beorn7 beorn7 Aug 4, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: This is not part of any of the interfaces. It's a use case so special that we don't need to offer that level of abstraction.

For the real question: What do you mean by "custom gatherer"? A gatherer is doing the job of gathering and merging metric families (and return the result). An injection hook is the source of more metric families to merge.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excuse the ambiguous wording, I was referring to the package interface of client library.

I've never used the injection hook so far, so excuse if I'm missing something, but from looking at the signatures of the current functions, I assumed one could just write a MergeGatherer or so which would call Gather() on two provided gatherer objects (or one gatherer object, one function) and do the merging.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. Yeah, that's actually a neat idea. The injection hook is essentially a gatherer. This sounds so neat that I'm afraid people will abuse it... ;) But it will make the semantics quite clear. I'll play with it. Thanks for the inspiration.

@grobie
Copy link
Member

grobie commented Aug 5, 2016

Reviewed 13 of 28 files at r1, 2 of 15 files at r2, 4 of 6 files at r4, 2 of 4 files at r5, 1 of 7 files at r6, 9 of 9 files at r8.
Review status: all files reviewed at latest revision, 9 unresolved discussions.


Comments from Reviewable

@beorn7
Copy link
Member Author

beorn7 commented Aug 5, 2016

I implemented @grobie 's idea about the "merge gatherer". That turned out really nice. On the way, I refactored and then improved the consistency checking.

@beorn7
Copy link
Member Author

beorn7 commented Aug 9, 2016

I'm on vacation with only intermittent internet access. I'd still appreciate feedback about the latest version so that I can work on it when I find time and in particular wrap this up soon after being back. Thanks everybody.

@beorn7 beorn7 mentioned this pull request Aug 9, 2016
// uses HTTP method 'POST' to push to the Pushgateway.)
func RegistryAdd(job string, grouping map[string]string, url string, reg prometheus.Gatherer) error {
return push(job, grouping, url, reg, "POST")
// AddFromGatherer works like Gatherer, but only previously pushed metrics with
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FromGatherer

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@grobie
Copy link
Member

grobie commented Aug 10, 2016

Cool! The only thing I don't like are the function names of the push package, but maybe others have some opinions or suggestions here. Otherwise LGTM.

This allows to finally get rid of the infamous injection hook in the
interface. The old SetMetricFamilyInjectionHook still exist as a
deprecated function but is now implemented with the new plumbing under
the hood.

Now that we have multiple Gatherer implementation, I renamed
push.Registry to push.FromGatherer.

This commit also improves the consistency checks, which happened as a
byproduct of the refactoring to allow checking in both the "merge
gatherer" Gatherers as well as in the normal Registry.
@beorn7
Copy link
Member Author

beorn7 commented Aug 12, 2016

Thanks, @grobie . I'll wait for ideas and objections from others over the weekend.

@beorn7
Copy link
Member Author

beorn7 commented Aug 15, 2016

Apparently, we have found the least bad name. ;) Merging now.

@beorn7 beorn7 merged commit 66058aa into master Aug 15, 2016
@beorn7 beorn7 deleted the beorn7/registry branch August 15, 2016 11:47
vbehar added a commit to vbehar/caddy-prometheus that referenced this pull request Sep 28, 2016
Following prometheus/client_golang#214
the new recommended way of using the prometheus Handler is through the promhttp package
miekg pushed a commit to miekg/caddy-prometheus that referenced this pull request Sep 28, 2016
Following prometheus/client_golang#214
the new recommended way of using the prometheus Handler is through the promhttp package
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants