Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make logging output location more programmatically configurable #6213

Merged
merged 25 commits into from
Apr 20, 2016

Conversation

emidoots
Copy link
Contributor

@emidoots emidoots commented Apr 5, 2016

Overview

This PR includes various improvements to make InfluxDB's logging output location more programmatically configurable in general. This feature is only important for users who embed InfluxDB into their Go programs, and was already half-supported. This change just improves the existing feature.

For an in-depth example of why this change is important, please see the description of sourcegraph/appdash#131

Required for all non-trivial PRs
  • Rebased/mergable
  • Tests pass
  • CHANGELOG.md updated
  • Sign CLA (if not already signed)

@emidoots
Copy link
Contributor Author

emidoots commented Apr 5, 2016

The CI failure is now listed as:

--- FAIL: TestServer_BackupAndRestore (2.05s)
    backup_restore_test.go:97: query results wrong:
            exp: {"results":[{"series":[{"name":"myseries","columns":["time","host","value"],"values":[["1970-01-01T00:00:00.001Z","A",23]]}]}]}
            got: {"results":[{}]}

But unless I am crazy wrong, I don't see how this change could cause it exactly? Is it perhaps a flaky test?

@jonseymour
Copy link
Contributor

@slimsag might be worth repushing your change - it seems the CI failure affected more than just your test and I wasn't able to reproduce it locally.

I am intrigued by your use case. Are you intending to imbed a full influxdb server in instrumented apps rather than just an influx client? Or is influxdb just an example of an app that you could instrument with your solution?

@e-dard
Copy link
Contributor

e-dard commented Apr 6, 2016

@slimsag that's a flaky test. I'll run a rebuild of your branch.

@@ -184,7 +184,10 @@ func (e *Engine) Close() error {
}

// SetLogOutput is a no-op.
func (e *Engine) SetLogOutput(w io.Writer) {}
func (e *Engine) SetLogOutput(w io.Writer) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's no longer a no-op.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, either need a note on usage (that SetLogOutput should not be called once the shard is open) or we're going to need a lot more locking around access to e.logger.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in d0283c9

@e-dard
Copy link
Contributor

e-dard commented Apr 6, 2016

@slimsag I like the idea of making it easier to make the logging output more configurable.

Since I doubt anyone would want to dynamically update the logging locations once influx is running, some notes about calling SetLogOutput on each service, before calling Open on the server would be useful.

@e-dard
Copy link
Contributor

e-dard commented Apr 15, 2016

@slimsag needs a rebase. Happy to look at this again when you've taken a look through my comments.

@emidoots
Copy link
Contributor Author

Thanks for the CI help! I apologize for the delays in updating this PR -- really sorry about that :)

I've rebased against master and will address all comments now.

@emidoots
Copy link
Contributor Author

emidoots commented Apr 15, 2016

@jonseymour

I am intrigued by your use case. Are you intending to imbed a full influxdb server in instrumented apps rather than just an influx client? Or is influxdb just an example of an app that you could instrument with your solution?

So Appdash, as you may or may not know, is an open-source project for tracing the performance of applications (both standalone and distributed ones). Basically we measure real-time performance of operations for e.g. everything that happens in serving an HTTP request, and then we need a place to store those numbers for later viewing by the user (typically a developer, but sometimes even the end user of a product). In turn, we have a massive number of writes (~1KB of data per HTTP request), and not very many reads (maybe 5% of data inserted is actually read).

In the past, we used some rather simple but primitive storage mechanisms. Like literally just a Go map[int64][]byte in memory. As the project has grown, we want to answer more important queries like "what are the slowest HTTP endpoints in the entire system as a whole?", so we wrote some more primitive storage mechanisms to handle this sort of data-aggregation, but it quickly became very complex and very non-performant. Of course, I know of one other open-source time series database written in Go and so @chris-ramon has been working hard over the past few months to see how compatible it is with Appdash, and the results are looking very promising.

One major constraint for Appdash, however, is that it is typically no-setup-required. For example, Appdash should ideally scale to multi-node distributed setups like what you might observe inside high-throughput environments like Google, Facebook, etc. But very importantly, it also needs to be literally zero-setup for use within small web servers, CLI tools, and in the case of Sourcegraph, the company I work at, easy to setup on-premises and local dev machine deployments (because when users see our app going slow, asking them to set up tracing software + database software is not a nice response compared to "just copy+paste the JSON trace" or something like that =) ).

This last no-setup-required case, is exactly where Appdash needs to be able to embed InfluxDB within the process itself. To do this we're using github.com/influxdata/influxdb/cmd/influxd/run.NewServer to spin up a new InfluxDB server in-process; and this works surprisingly well! The only issue is that InfluxDB logs pollute our app's logs, and thus PR will let us direct them to a log file and make debugging issues much easier.

Hopefully this clarifies everything and, while I know it was very verbose, I hope this wasn't an unwarranted response! Hopefully I didn't come across as trying to pitch anything, and if this type of comment isn't right to put here just let me know and I'll be glad to remove it.

@emidoots
Copy link
Contributor Author

CHANGELOG.md updated

It's not 100% clear to me if I should update the changelog for this PR; and would I update the Features or Bug Fixes section?

@emidoots
Copy link
Contributor Author

The AppVeyor failure looks fishy:

=== RUN TestTimer_Elapsed
--- FAIL: TestTimer_Elapsed (2.00s)
    stress_test.go:58: expected around 2s got 1.9998922s

Is this an open issue somewhere else?

@e-dard
Copy link
Contributor

e-dard commented Apr 16, 2016

@slimsag you can update CHANGELOG to say you added this as a feature. Yes the flaky Windows test is known to us. We will sort it next week.

@e-dard
Copy link
Contributor

e-dard commented Apr 16, 2016

@slimsag I just need to get another reviewer to look at this before it can be accepted. Unless you decide to do it yourself and repush, we will also squash your commits down when we merge.

@emidoots
Copy link
Contributor Author

Got it; I've updated the changelog. Please squash as you see fit :)

@jwilder
Copy link
Contributor

jwilder commented Apr 19, 2016

👍

@jwilder jwilder added this to the 0.13.0 milestone Apr 19, 2016
@e-dard
Copy link
Contributor

e-dard commented Apr 19, 2016

@slimsag we're ready to merge this. You may want to diff with master and see if anything has landed in the last two weeks that involves logging stuff you might want to make more customisable.

When you're ready ping me and I'll merge this (or any updates you add).

@e-dard e-dard self-assigned this Apr 19, 2016
…SetLogger

This change causes all services to be added at the time `NewServer` is called.
Effectively, this enables writing code that calls `NewServer`, iterates over
each `Server.Services` in order to specify where log output should go via `Service.SetLogger`.

Without this change, it is not possible to catch the output of various startup
logs like these:

```
[subscriber] 2016/04/04 18:33:59 opened service
[monitor] 2016/04/04 18:33:59 Starting monitor system
[monitor] 2016/04/04 18:33:59 'build' registered for diagnostics monitoring
[monitor] 2016/04/04 18:33:59 'runtime' registered for diagnostics monitoring
[monitor] 2016/04/04 18:33:59 'network' registered for diagnostics monitoring
[monitor] 2016/04/04 18:33:59 'system' registered for diagnostics monitoring
[cluster] 2016/04/04 18:33:59 Starting cluster service
[shard-precreation] 2016/04/04 18:33:59 Starting precreation service with check interval of 10m0s, advance period of 30m0s
[snapshot] 2016/04/04 18:33:59 Starting snapshot service
[copier] 2016/04/04 18:33:59 Starting copier service
[admin] 2016/04/04 18:33:59 Starting admin service
[admin] 2016/04/04 18:33:59 Listening on HTTP: [::]:8083
[monitor] 2016/04/04 18:33:59 Storing statistics in database '_internal' retention policy 'monitor', at interval 10s
[continuous_querier] 2016/04/04 18:33:59 Starting continuous query service
```

because these services were previously appended at, and wrote these log
messages at, `Server.Open` time.
This adds a top-level `Server.Logger` field which can be overridden after a call
to `NewServer` such that all `Server`-related logs end up being sent to this
`Logger`. This is primarily useful for embedding InfluxDB within other Go
programs.
This has various benefits:

- Users embedding InfluxDB within other Go programs can specify a different logger / prefix easily.
- More consistent with code used elsewhere in InfluxDB (e.g. services, other `run.Server.*` fields, etc).
- This is also more efficient, because it means `executeQuery` no longer allocates a single `*log.Logger` each time it is called.
… Logger too

Prior to this change, you could invoke `SetLogger` but it would only omit certain
logs, not inclusive of the underlying HTTP handler logs for basic route logging
or panic logging. These logs looked like:

```
[http] 2016/04/04 19:27:15 127.0.0.1 - demo [04/Apr/2016:19:27:15 -0700] GET /query?db=&q=CREATE+DATABASE+IF+NOT+EXISTS+appdash+WITH+DURATION+1d+NAME+one_day_only HTTP/1.1 200 40 - InfluxDBClient e9147081-fad5-11e5-8001-000000000000 165.773786ms
```

After this change, invoking `SetLogger` properly sets the underlying `Handler.Logger`
too, such that it is possible to change the HTTPD logger post-creation and direct its
logs somewhere else (this is valuable when embedding InfluxDB in other Go programs).
Prior to this change the log output location for tsdb was fixated to `os.Stderr`
in various locations. After this change, a `SetLogOutput` on `tsdb.Store` will
effectively set the log output location for underlying shards / tsdb engines.

In an ideal world, the `tsdb.Store.Logger` would bubble down, but because `tsdb/engine`
can be an abstract implementation, these engines need their own log prefix and
thus this is not easily achievable.
emidoots added 18 commits April 19, 2016 18:23
In 9cf4baea223ae972b24f34eceab20980f24e2467 I mistakenly copied the wrong
code around. Bring back the exact service appension code from latest `master`
as it should have been originally.
Given that tsdb/engine is an abstract implementation and that it is not
really valuable to strip out the `[servicename]` prefix to all logs, instead
of adding `SetLogger` everywhere add a `SetLogOutput` everywhere instead.
This lets the service still choose their name prefix, and clients can choose
at a high level (or lower level, if they choose) where logs go.

Actual implementation of services SetLogger -> SetLogOutput change left for
followup commit for readability purposes.
@emidoots
Copy link
Contributor Author

Hi @e-dard, and thanks for the help on this! I'm sorry to leave a larger change for later, I hope it's okay, but I've made a somewhat verbose change that I think is better than having SetLogger methods.

At this point I can easily call Server.SetLogOutput(somefile) to direct all my InfluxDB server output to a log file; so I am 100% content with this PR as-is.

I've rebase, so PTAL at my new commits starting with commit 8c43c75 (tsdb/engine/tsm1: add and use CacheLoader.SetLogOutput).

@emidoots
Copy link
Contributor Author

I'm thinking this is unrelated, let me know if not:

--- FAIL: TestServer_BackupAndRestore (2.03s)
    backup_restore_test.go:97: query results wrong:
            exp: {"results":[{"series":[{"name":"myseries","columns":["time","host","value"],"values":[["1970-01-01T00:00:00.001Z","A",23]]}]}]}
            got: {"results":[{}]}

@e-dard e-dard merged commit 9dc09c5 into influxdata:master Apr 20, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants