Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up shutdown #7441

Merged
merged 1 commit into from
Oct 10, 2016
Merged

Speed up shutdown #7441

merged 1 commit into from
Oct 10, 2016

Conversation

mark-rushakoff
Copy link
Contributor

@mark-rushakoff mark-rushakoff commented Oct 10, 2016

Required for all non-trivial PRs
  • Rebased/mergable
  • Tests pass
  • CHANGELOG.md updated
  • Sign CLA (if not already signed)

On my machine with about 20 shards, it would take 10+ seconds to shut
down InfluxDB with SIGINT. After this change, it shuts down in nearly
instantly.

(*tsdb.Store).Close was shutting down each of its shards sequentially.
Each shard's engine would signal to its compaction goroutines to quit,
and because each compaction goroutine has a hardcoded 1-second sleep in
between checks, waiting for the goroutines would often block for up to a
second.

This change closes all of the TSDB store's shards in parallel. This
means it's possible that multiple close values could error at once, but
we're still only returning the first error, consistent with previous
behavior. That being said, the return value of (*tsdb.Store).Close is
ignored in (*cmd/influxd/run.Server).Close anyway.

@mark-rushakoff mark-rushakoff added this to the 1.1.0 milestone Oct 10, 2016
// Close all the shards in parallel.
// If we get any errors, return the first one.
var wg sync.WaitGroup
errs := make(chan error)
for _, sh := range s.shards {
Copy link
Contributor

@jwilder jwilder Oct 10, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason this could not use the walkShards funcs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't aware of walkShard. I'll amend the commit to use that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amended and pushed.

On my machine with about 20 shards, it would take 10+ seconds to shut
down InfluxDB with SIGINT. After this change, it shuts down in nearly
instantly.

(*tsdb.Store).Close was shutting down each of its shards sequentially.
Each shard's engine would signal to its compaction goroutines to quit,
and because each compaction goroutine has a hardcoded 1-second sleep in
between checks, waiting for the goroutines would often block for up to a
second.

This change closes all of the TSDB store's shards in parallel. This
means it's possible that multiple close values could error at once, but
we're still only returning the first error, consistent with previous
behavior. That being said, the return value of (*tsdb.Store).Close is
ignored in (*cmd/influxd/run.Server).Close anyway.
@mark-rushakoff mark-rushakoff merged commit 89c7572 into master Oct 10, 2016
@mark-rushakoff mark-rushakoff deleted the mr-speedup-shutdown branch October 10, 2016 16:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants