-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
(SLV-653) Add a script for generating system stats #28
Conversation
@billclaytor @johnduarte I just put you two on as reviewers for now. Please don't merge. After you two approve I will add Reid. |
files/gen_sys_stats.rb
Outdated
# @author Randell Pelak | ||
# | ||
# @param [integer] polling_interval Time in seconds between calls to poll the system for data. | ||
# @param [integer] file_interval Tim in seconds between the creation of each output file. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/Tim /Time /
files/gen_sys_stats.rb
Outdated
@metrics_dir = metrics_dir | ||
@verbose = verbose | ||
|
||
@hostname = %x[hostname].strip |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this work as expected on the primary master? I thought the primary master was being defined at 127.0.0.1
by the metrics collector. This will use the hostname for that node instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reid pushed a change to fix the 127.0.0.1 thing... he considered that a bug. I don't think it got released yet though. The change was in the hosts_with_profile function there were using to get the hosts and put them in the params. I thought about making host a parameter to the script but decided it shouldn't really matter what the dir is called as long as it is mappable to the host. https://github.com/puppetlabs/puppetlabs-puppet_metrics_collector/pull/26/files
files/gen_sys_stats.rb
Outdated
|
||
@hostname = %x[hostname].strip | ||
puts "Hostname is: #{@hostname}" if @verbose | ||
FileUtils.mkdir_p(metrics_dir) unless File.directory?(metrics_dir) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
⛏️ Shouldn't the instance variable be used here since it has been assigned?
files/gen_sys_stats.rb
Outdated
# | ||
# @return [string] raw output from sar | ||
def run_sar | ||
times_to_poll = ((@file_interval / @polling_interval) + 0.5).to_i |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should use the ceil
method rather than fudging the numbers
times_to_poll = (@file_interval / @polling_interval).ceil
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use round. https://apidock.com/ruby/Float/round
files/gen_sys_stats.rb
Outdated
# | ||
# @return [boolean] | ||
def sar_output_valid?(sar_output) | ||
last_line = sar_output.split("\n")[-1] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Refactor suggestion: You are performing string manipulation to get data out of the sar output in several methods. You already have code to structure the data theconvert_sar_output_to_json
method. I suggest breaking out the code that creates the metrics_data
hash into its own method combined with the string manipulation. If the sar output changes in the future all code related to parsing is from string output should be encapsulated in one place.
This would make this method unnecessary (raise error during hash creation if headers != averages). The hash can then be passed to the get_* methods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, I didn't like the way this worked out. Thinking about it now, I am having trouble coming up with method names that make sense.
I feel like I want to write get_sar_data_hash... have it call run_sar, and then convert... with convert checking for errors and raising if needed. This of course eliminates "sar_output_valid?" But does that sound right? Feels like get_sar_data_hash doesn't communicate that it will actually run sar. Guess I could go with run_sar_and_get_data_hash???
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried out somethings, didn't like them, and tried some more. I think I found something that works better. But the convert_sar_output_to_json ends up being a bit of a wall of text... but it isn't very complex so that might be okay. Tell me what you think.
files/gen_sys_stats.rb
Outdated
def convert_sar_output_to_json(sar_output, time_stamp_obj) | ||
header = get_header_fields(sar_output) | ||
averages = get_averages_fields(sar_output) | ||
hostkey = @hostname.gsub('.', '-') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why change the hostname? The key is a string, so the '.' characters should not cause any issue with Ruby or JSON.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this was to be consistent with the rest of PMC. The value is actually the default tag if the data gets loaded into influxdb... so maybe it cares... dunno for sure, but wanted it to be consistent.
files/gen_sys_stats.rb
Outdated
# @author Randell Pelak | ||
# | ||
# @return [void] | ||
def go |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe "generate_system_metrics"?
files/gen_sys_stats.rb
Outdated
|
||
options = {} | ||
|
||
OptionParser.new do |opts| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is kind of a path of least resistance at this point. We have used it in several places and are at least familiar with it. We could make a ticket to investigate others at some point...
script name and class should maybe be the same |
f628765
to
c801b23
Compare
@johnduarte Ready for another look |
files/generate_system_metrics
Outdated
hostkey = @hostname.gsub('.', '-') | ||
dataset = {'time_stamp_obj' => time_stamp_obj.utc.iso8601, 'servers' => {}} | ||
#combine the arrays into a hash and convert into json | ||
metrics_data = Hash[header.zip(averages)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is passing raw STDOUT string data to three methods. It would be preferable to encapsulate the conversion of the sar string output to a defined data set. That way if the output format of sar changes in the future only one method would need to be updated.
Suggestion:
# @return [hash] metrics_data
def parse_sar_output(sar_output)
sar_data = {}
sar_arr = sar_output.split(/\n+|\r+/).reject(&:empty?).map { |line| line.split }
averages = sar_arr.find { |e| e.include? "Average:" }
headers = sar_arr.find { |e| e.include? "kbmemfree" }
metrics_data = Hash[headers.zip(averages)]
end
Then pass the data to convert_metrics_to_json
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@johnduarte hmmm. I played around with that. I run into the problems getting useful error messages out. So it got a bit longer. But it encapsulated all the processing in one method. See what you think.
files/generate_system_metrics
Outdated
def parse_sar_output(sar_output) | ||
sar_output_arr = sar_output.split(/\n+|\r+/).reject(&:empty?).map { |line| line.split } | ||
|
||
headers_line = sar_output_arr.find { |e| e.include? "%user" } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is "%user"? This is not included in my version of sar.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is what I see
[root@ip-10-227-0-89 ~]# sar 1 3
Linux 3.10.0-957.1.3.el7.x86_64 (ip-10-227-0-89.amz-dev.puppet.net) 11/01/2019 x86_64 (8 CPU)
04:47:43 PM CPU %user %nice %system %iowait %steal %idle
04:47:44 PM all 0.63 0.00 0.13 0.00 0.00 99.25
04:47:45 PM all 0.12 0.00 0.00 0.00 0.00 99.88
04:47:46 PM all 0.75 0.00 0.12 0.00 0.00 99.12
Average: all 0.50 0.00 0.08 0.00 0.00 99.42
[root@ip-10-227-0-89 ~]#
the kbmemfree you posted before sounded like you were doing the mem version "sar -r 1 3" But the point is good... cause the script supports both, and would have failed to find %user in the mem version... So doing your update for the reverse zip is critical.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually, the reverse thing doesn't solve the %user... I will have to choose what to look for based on what metrics type is being run
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sar_output_arr.find { |e| e.include? "%user" or e.include? "kbmemfree" }
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did it in a more wordy way. and tied it to @metric_type
files/generate_system_metrics
Outdated
"\nFull output:\n#{sar_output}" | ||
raise(sar_error_header_mismatch) unless headers.count == averages.count | ||
#combine the arrays into a hash | ||
Hash[headers.zip(averages)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are magic numbers for array segments and much validation. The later is fine. The former is undesirable.
There appear to be two issues that you are addressing.
- The headers and the averages start with some BS you don't want.
- Due to the BS, there is a risk of the fields being mismatched.
Given that all of the BS is at the beginning of the headers and the averages field lists, I think that zipping these fields in reverse and tossing away any results that do not have a numeric value solves both problems.
If no data arrays are found for the desired lines (e.g. headers, averages), then the call to zip
will throw an exception without checking them beforehand.
With those things in mind, I am wondering if the following refactor may simplify things. Please feel free to disregard if it does not or is not conservative enough.
headers_line = ... # as above
averages_line = ... # as above
# This will zip the data in reverse order to ensure a correct mapping.
# Since it is being converted to a hash, we don't care about order afterwards.
# If either the headers_line or averages_line is nil, then an error will be raised
sar_data = Hash[headers_line.reverse.zip(averages_line.reverse)]
# Now we throw away any data whose value is not numeric.
sar_data = sar_data.select{ |k,v| v =~ /\A[-+]?[0-9]*\.?[0-9]+\Z/ }
return sar_data
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
great idea, I really like it. implemented.
@johnduarte ready for another look |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🎉 no more magic numbers!
files/generate_system_metrics
Outdated
DESCRIPTION | ||
|
||
DEFAULTS = <<-DEFAULTS | ||
The following defaults values are used if the options are not specified: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The following defaults values are used if the options are not specified: | |
The following default values are used if the options are not specified: |
files/generate_system_metrics
Outdated
|
||
polling_interval = options[:polling_interval] || POLLING_INTERVAL_DEFAULT | ||
file_interval = options[:file_interval] || FILE_INTERVAL_DEFAULT | ||
metric_type= options[:metric_type] || METRIC_TYPE_DEFAULT |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
metric_type= options[:metric_type] || METRIC_TYPE_DEFAULT | |
metric_type = options[:metric_type] || METRIC_TYPE_DEFAULT |
files/generate_system_metrics
Outdated
end | ||
end | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this extra line intentional?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nope, removing
files/generate_system_metrics
Outdated
file_path = "#{dirname}/#{filename}" | ||
FileUtils.mkdir_p(dirname) unless File.directory?(dirname) | ||
puts "Creating json file: #{file_path}" if @verbose | ||
File.open(file_path, 'w') do |file| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would the 'short version' be better here or is it better to use the longer form?
File.write(file_path, json_dataset)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I often like the long version, but in this case I don't think it adds anything... so changing to the short version
@billclaytor updated based on your feedback... take another look, thanks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome!
The script is intended to be used as part of an optionial feature to gate system stats the same way other stats are gathered and managed. It uses sar which will poll every x seconds and generate an average, which can then be captured into a file at the same interval as the puppet-metrics-collector polling interval. This keeps the total data down while not missing things that cause short duration spikes (like compiles that last 6 seconds). Initial use will be in performace testing. Had to disable GetText/DecorateString cop for the entire script in the rubocop.yml because inline disables aren't working for some reason. And rubocop is parsing for ruby 2.1 where the fix works, but the same fix doesn't work with later versions of ruby, like the one installed by puppet...
@RandellP This looks like a great addition. Is there supposed to be a |
@jarretlavallee that will be in the next PR... :) Trying to keep them small so our team reviewing is easier... We did several laps on this one to get it solid before sending it to you and Reid. |
BTW, John and Bill both approved... but it reset their status when I pushed up the changes to the comments that reid asked for. So it sounds like it is ok to merge by the next person who sees this. :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is piecemeal, but I'm okay with merging it now since it doesn't conflict with existing content, and reviewing the part that makes use of it whenever you have that ready.
The script is intended to be used as part of an optional feature to get system stats the same way other stats are gathered and managed.
It uses sar which will poll every x seconds and generate an average, which can then be captured into a file at the same interval as the puppet-metrics-collector polling interval. This keeps the total data down while not missing things that cause short duration spikes (like compiles that last 6 seconds). Initial use will be in performance testing.