-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merge CI.next into Master #161
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Contributor
shermdog
commented
Jul 25, 2016
- all-or-nothing checkouts
- statsd metrics
- improved error handling and testing
* (QENG-3919) spike for implementation of all-or-nothing checkout * Fix two botched variable references * Aggregate API helper methods * Add specs for failed multi-vm allocation API endpoints * (QENG-3919) Add tests for multiple vm requests * (QENG-3919) Add (failing) specs for POST /vm/pool1+pool2 usages This exposes the old (bad) behavior on this other code path. Will fix this up next. * (QENG-3919) Bring query params version in line with JSON post version Not clear to me why these had to be implemented so differently. * (QENG-3919) extract common method from both methods of VM allocation * (QENG-3919) Naming fix, cosmetic cleanups I mean, I presume all these commits are going to get squashed away on merge anyway. * (QENG-3919) Update API docs We consider it a bug that the actual behavior was not this behavior, but the documentation was also silent on this point. * (QENG-3919) minor readability tweak in refactored method * (QENG-3919) Clean up interim comments re: status codes * (QENG-3919) Drop now-orphaned `checkout_vm` method We kept this up-to-date while we were upgrading and refactoring, but, turns out, this method is no longer called anywhere. 💀 🔥 * (QENG-3919) Return 503 status on failed allocation Making sure we go back to the original functionality, which was: - status 200 when vms successfully allocated - status 404 when a pool name is unknown - status 404 when no pool name is specified - status 503 when vm allocation failed * (QENG-3919) add net-ldap to Gemfile Maybe we shouldn't foil-ball gems onto servers. * (QENG-3919) Turns out, spush isn't a redis command And hence we see once again the weakness of mockist tests. * (QENG-3919) Pin the net-ldap gem to 0.11 for the jrubies, etc. * (QENG-3919) Correct an old spelling error in spec descriptions * (QENG-3919) Further tweak net-ldap version * (QENG-3919) return_single_vm -> return_vm_to_ready_state cc @shermdog
They way we were using graphite was incorrect for the type of data we were sending it. statsd is the appropriate mechanism for our needs. statsd and graphite are mutually exclusive and configuring statsd will take precendence over Graphite. Example of configuration in vmpooler.yaml.example
Add the tracking of successful, failed, invalid, and empty pool vm gets. It is possible we may want to tweak this, but have validated with spec tests and pcaps. ``` vmpooler-tmp-dev.ready.debian-7-x86_64:1|c vmpooler-tmp-dev.running.debian-7-x86_64:1|c vmpooler-tmp-dev.checkout.invalid:1|c vmpooler-tmp-dev.checkout.success.debian-7-x86_64:1|c vmpooler-tmp-dev.checkout.empty:1|c vmpooler-tmp-dev.running.debian-7-x86_64:1|c vmpooler-tmp-dev.clone.debian-7-x86_64:12.10|ms vmpooler-tmp-dev.ready.debian-7-x86_64:1|c ```
Cleaned up some code review nitpicks and added pool_manager_spec for empty pool.
Previously was using increment which was incorrect for that particular application.
(RE-7014) add statsd support
There were several problems with how the pooler checked out vms with respect to empty pools, invalid pools, and aliases: - If the vmpooler config did not contain any aliases and the caller requested a vm from an empty pool or a non-existent one, the vmpooler would error with: NoMethodError - undefined method `[]' for nil:NilClass If the config contained a non-nil alias section, then: - If the caller requested a vm from an empty pool and either the vm didn't have an alias or the aliased pool was empty or non-existent, then the request for that vm would be silently ignored. The vmpooler would return 200 if the caller asked for multiple vms and the vmpooler was able to checkout at least one vm. Otherwise it would return 404. - Similarly, if the caller requested a vm from a non-existent pool, then the request was silently ignored. This commit adds a `pool_names` Set to the config containing all valid pool names including aliases. This is used to determine whether a requested template name is valid or not. This is necessary because redis does not distinguish between empty and non-existent sets, e.g. the following returns false in both cases: backend.exists('vmpooler__ready__' + key) If the caller requests a vm (single or multiple), and any vm references an invalid pool name, we immediately return 404. Otherwise, we know the request is for valid pool names, since the vmpooler requires a restart to change pool names and counts. We then attempt to acquire each vm, trying to match on pool name or failing back to aliased pool name, as was the previous behavior. The resulting behavior is: - If the caller asks for at least one vm from an unknown pool, then don't try to checkout any vms and respond with 404. - If the caller asks for a vm, and at least one pool is empty, then respond with 503, returning checked out vms back to the pool. - Otherwise return 200 with the list of checked out vms. This commit also makes `alias` optional again. This commit also re-enables tests that were merged in from master, but originally commented out due to the bugs described above..
json 2.0.x was released on July 1 and is not compatible with ruby < 2.0. Since we still support that version, add a pessimistic pin, which is what we were using prior to July 1.
(QENG-4070) Consistently return 503 if valid pool is empty
This reverts commit 0fd6fff.
These were caused in part by dropping changes from the original PR when we dropped the v1_spec.rb master test file (in favor of the updated and separated versions).
We're returning [nil,nil] in this case, meaning that name will not be set. This means we'll get an error trying to concatenate the stats string. Use the requested template name here instead.
Prior to this we could easily run into situations where `statds_prefix` would be `nil` (and possibly the `statsd` handle itself). There was some significant complexity and brittleness in how statsd was set up. Refactored so that: - `statsd_prefix` is no longer exposed to any callers of statsd methods - there is now a `Vmpooler::DummyStatsd` class which can be returned when we are not actually going to publish stats, but would like to keep the calling interface consistent - setup of the statsd handle is via just passing in `config[:statsd]`, if `nil`, this will result in a dummy handle being return - defaulting of `server` values was fixed -- this did not actually work in the previous implementation. `config[:statsd][:server]` is now required. - tests use a `DummyStatsd` instance instead of an rspec double. - calls to `statsd.increment` were taking incorrect arguments (some our fault, some part of the prior implementation), and were not collecting data on which pools were "invalid" or "empty". Fixed this and are now explicitly tracking the invalid/empty pool names.
Prior to this, the `pool_manager.rb` library could take handles for both graphite and statsd endpoints (which were considered mutually exclusive), and then would use one. There was a bevy of conditional logic around sending metrics to the graphite/statsd handles (and actually at least one bug of omission). Here we refactor more, building on earlier work: - Our graphite class comes into line with the API of our Statsd and DummyStatsd classes - In `pool_manager.rb` we now accept a single "metrics" handle, and we drop all the conditional logic around statsd vs. graphite - We move the inconsistent error handling out of the calling classes and into our metrics classes, actually logging to `$stderr` when we can't publish metrics - We unify the setup code to use `config` to determine whether statsd, graphite, or a dummy metrics handle should be used, and make that happen. - Cleaned up some tests. We could probably stand to do a bit more work in this area.
Prior to this, `pool_manager.rb` allowed the `metrics` argument to be optional, but at this point it will be an instance of `Vmpooler::Statsd`, 'Vmpooler::Graphite', or `Vmpooler::DummyStatsd`, so making this non-optional. Cleaned up that file's tests, cosmetically, as well as recognizing that the behavioral difference between graphite and statsd does not depend on the pool manager.
This documents the changes to :server being mandatory for all metrics endpoints, as well as the graphite endpoint supporting an optional :port configuration value.
Really, let's just support a generic metrics interface.
We've managed to move mentions of this out of the calling code, so let's move the require.
We missed this during the refactoring. Bringing this up to date.
Prior to this the dashboard front-end would use the configuration settings for `graphite[:server]`/`graphite[:prefix]` to locate a graphite server to use for rendering graphs. Now that we have multiple possible metrics backends, the front-end graph host for the dashboard could be entirely different from the back-end metrics server that we publish to (if any). This decouples those settings: - use `graphs[:server]` / `graphs[:prefix]` for the graphite-compatible web front-end to use for dashboard display graphs - fall back to `graphite[:server]`/`graphite[:prefix]` if `graphs` is not specified, in order to support legacy `vmpooler.yaml` configurations. Note that since `statsd` takes precedence over `graphite`, it's possible to specify both `statsd` (for publishing) and `graphite` (for reading). We still prefer `graphs` over `graphite`. Updated the example `vmpooler.yaml` config file.
This was referencing config directly, when what we want is for a hash to be passed in (derived from config).
The things you find through manual QA 🧌
Nested hash data comes back with string keys, not symbols. Be consistent.
This makes it visible to lib/vmpooler.rb, as well as putting this dummy metrics endpoint in its own file for easier discovery.
The library is actually required as 'statsd' and not 'ruby-statsd', best I can tell.
Because it's ambiguous in this scope, and, well, it doesn't actually work in production.
When we don't even get a pool name we still want metrics to be recorded.
[QENG-4075] Land statsd support after earlier ci.next changes
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.