Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for Prometheus remote read and write API. #8784

Merged
merged 8 commits into from
Sep 7, 2017

Conversation

pauldix
Copy link
Member

@pauldix pauldix commented Sep 3, 2017

Adds a new package prometheus for converting from remote reads and writes to Influx queries and points. Adds two new endpoints to the httpd handler to support prometheus remote read at /api/v1/prom/read and remote write at /api/v1/prom/write.

I've tested just against a local Prometheus server. Will have to prove it out over a large installation so we'd probably want to release it as a beta feature.

I wanted to get this up for review for the approach. Could add unit tests to the prometheus package, but the handler test should hit the main points. Was thinking that I may just hold this and base the query stuff off what @stuartcarnie is doing for the v2 read API. It'll have a similar structure as Prometheus to support the v2 data model.

Adds a new package prometheus for converting from remote reads and writes to Influx queries and points. Adds two new endpoints to the httpd handler to support prometheus remote read at /api/v1/prom/read and remote write at /api/v1/prom/write.
@phemmer
Copy link
Contributor

phemmer commented Sep 4, 2017

Just curious, why add this here instead of as a telegraf input?
Does this mean that we might see current telegraf listener inputs ported to InfluxDB in the future?

@pauldix
Copy link
Member Author

pauldix commented Sep 4, 2017

@phemmer It's just another input method kind of like we support OpenTSDB, Carbon, and CollectD. It's writing data directly to the DB, where Telegraf is more of a collector. Although admittedly Telegraf has gained more than that over time. Which listener inputs were you thinking of in Telegraf?

Copy link
Contributor

@e-dard e-dard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks pretty good to me @pauldix 👍 . I just left some suggestions, but nothing significant to the workings of the PR.

@@ -0,0 +1,170 @@
package prometheus
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this should be under services/prometheus to match the location of other packages such as opentsdb, graphite etc?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe? I didn't put it under there because there wasn't an associated service running. I think of things under services as things that have network listeners.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point.

FieldName = "f64"
)

var ErrNaNDropped = errors.New("")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a non-empty error string would be more helpful 😄

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in a2d7024

// WriteRequestToPoints converts a Prometheus remote write request of time series and their
// samples into Points that can be written into Influx
func WriteRequestToPoints(req *remote.WriteRequest) ([]models.Point, error) {
var points []models.Point
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe worth pre-allocating points? Since it's simple to get the total number of samples you could do:

var maxPoints int
for _, ts := range req.Timeseries {
    for _, s := range ts.Samples {
        maxPoints += len(s.Samples)
    }
}
points := make([]models.Point, 0, maxPoints)


const (
// MeasurementName is where all prometheus time series go to
MeasurementName = "_"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this be needed outside of the package? Could it be unexported?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably :)

Database: db,
RetentionPolicy: rp,
}},
Dimensions: []*influxql.Dimension{{Expr: &influxql.Wildcard{}}},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't a Wildcard expression only used when doing a SELECT *?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also used when doing a GROUP BY *, which this needs to do.

}

for _, v := range s.Values {
timestamp := v[0].(time.Time).UnixNano() / int64(time.Millisecond) / int64(time.Nanosecond)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s.Values is a type from the prometheus package right? If so, I would be inclined to avoid the unsafe type assertion on elements in Values. Only in case a third party bug causes v[0] to not be a time.Time (which would cause a panic).

I'd use ts, ok := v[0].(time.Time) I think and check ok.

One other thing—time.Nanosecond == 1. You don't need to divide by it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call on the check. On the divide by time.Nanosecond, I did that even though it's a 1 because apparently those underlying things aren't guaranteed. Based on the pedantic conversation in the answers of this post: https://stackoverflow.com/questions/24122821/go-golang-time-now-unixnano-convert-to-milliseconds

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in f0ffe20

timestamp := v[0].(time.Time).UnixNano() / int64(time.Millisecond) / int64(time.Nanosecond)
ts.Samples = append(ts.Samples, &remote.Sample{
TimestampMs: timestamp,
Value: v[1].(float64),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar concern here with v[1].(float64)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in f0ffe20


data, err := proto.Marshal(resp)
if err != nil {
http.Error(w, err.Error(), http.StatusInternalServerError)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

h.httpError


compressed = snappy.Encode(nil, data)
if _, err := w.Write(compressed); err != nil {
http.Error(w, err.Error(), http.StatusInternalServerError)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

h.httpError

h.PointsWriter.WritePointsFn = func(db, rp string, _ models.ConsistencyLevel, _ meta.User, points []models.Point) error {
called = true
point := points[0]
if point.UnixNano() != 1*int64(time.Millisecond) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1* is redundant.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, was because that was the timestamp that was passed in, but will remove

@e-dard
Copy link
Contributor

e-dard commented Sep 5, 2017

@pauldix I updated LICENSE_OF_DEPENDENCIES.md in e7a85a1

// servePromRead will convert a Prometheus remote read request into an InfluxQL query and
// return data in Prometheus remote read protobuf format.
func (h *Handler) servePromRead(w http.ResponseWriter, r *http.Request, user meta.User) {
compressed, err := ioutil.ReadAll(r.Body)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it appropriate to use h.httpError here? That will output the error in a format for InfluxDB. While at the moment we use JSON, it could be any format like CSV. If we are attempting to send an error back that matches what Prometheus is expecting, it probably isn't correct to use h.httpError.

}

// Execute query.
results := h.QueryExecutor.ExecuteQuery(q, opts, closing)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This entire section here should probably be refactored because it is about 95% in common with serveQuery, but that doesn't have to be done now. I think #8725 makes it a lot easier.

So no action on this, but I just wanted to get it on the record.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the same is true of the servePromWrite method, but I wanted to get this in without modifying any of the existing code paths to make it a very low risk merge.

}
promQuery := req.Queries[0]

q := &influxql.Query{}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would probably be easier to just create this at the bottom when you need it with a literal. The literal will likely allocate less memory to boot since it doesn't use append.

Copy link
Contributor

@stuartcarnie stuartcarnie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pauldix, really just a nit, but given our only dependency is the .proto definition file, we could pull only the remote.proto file in. This would allow us to generate our own .pb.go using gogo, which is more efficient and would avoid pulling entire prometheus repository.

The only thing used from Prometheus was the storage/remote files that are generated from the remote.proto file. Copied that file into promtheus package and removed the dependency.
@pauldix
Copy link
Member Author

pauldix commented Sep 7, 2017

@stuartcarnie @e-dard @jsternberg ok, removed the prometheus dependency and just copied the remote.proto file. Should be ready for final review.

Copy link
Contributor

@stuartcarnie stuartcarnie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍🏻


syntax = "proto3";

package prometheus; // change package from remote to prometheus
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we put this file in an internal package named remote? I do not want us doing any manual modifications to this file if at all possible. Something like this should work:

prometheus/
  internal/
    remote/
      remote.proto

And then modify the go generate command to output the remote.pb.go file to this internal directory. Since it's in the remote folder, the package name will still be remote and it will be accessible to the prometheus package, but it won't be accessible to anyone else. That will make updating that file easier when we need to because there are no modifications.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the internal before? Why not just put it in remote?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It means it's not an interface we expose. While not necessary, I think it would be a good habit for us to start getting into.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That feels odd to me. Go already has the public vs. private in a package. If we go down this route we'll end up having internal in every package in the project, no?

"github.com/influxdata/influxdb/models"
)

//go:generate protoc -I$GOPATH/src -I. --gogofaster_out=. remote.proto
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In influxql, we just use --gogo_out. Are those different generators and is this adding a new dependency? We likely should avoid using multiple protoc generators if we can. Also, is the -I needed? I don't see any included dependencies in the proto file.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gogofaster is a superset of functionality of gogo with additional optimizations in the generated code and less pointer fields. This generally makes the structs easier and more efficient to work with in Go.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The protobuf wire format is unchanged

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we be using it instead of gogo everywhere and avoid having multiple generators? I'm not opposed to switching generators, I would just like it if we used one. It also doesn't have to be this PR that switches everything to a new generator.

Copy link
Contributor

@stuartcarnie stuartcarnie Sep 7, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree that we should be consistent and should move to the new generators. In many cases, we will need to do a significant amount of refactoring but the effort will be worth while for performance and ease of use.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to make sure the other part of the comment isn't lost, is the -I needed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, so use the vanilla generator for now and we can evaluate gogofaster later?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Go with gogo_faster. We are using it for the new storage APIs and should plan on migrating others over. It does not introduce additional dependencies. The -I includes mean that if future versions of the .proto file import other proto files, you won't have to update the generate line, so it is safe to leave it in.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, should be all moved around and good now

@pauldix pauldix merged commit f30eba3 into master Sep 7, 2017
@pauldix pauldix deleted the pd-prometheus-remote branch September 7, 2017 17:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants