Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[extract_log] Improve extract_log script #559

Merged
merged 3 commits into from
Apr 3, 2018
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 11 additions & 3 deletions ansible/library/extract_log.py
Original file line number Diff line number Diff line change
Expand Up @@ -92,7 +92,12 @@ def extract_line(directory, filename, target_string):
file = open(path)
result = None
with file:
result = [(filename, line) for line in file if target_string in line and 'nsible' not in line]
# This might be a gunzip file or logrotate issue, there has
# been '\x00's in front of the log entry timestamp which
# messes up with the comparator.
# Prehandle lines to remove these sub-strings
result = [(filename, line.replace('\x00', '')) for line in file if target_string in line and 'nsible' not in line]

return result


Expand Down Expand Up @@ -159,9 +164,12 @@ def extract_latest_line_with_string(directory, filenames, start_string):
for filename in filenames:
target_lines.extend(extract_line(directory, filename, start_string))

sorted_target_lines = sorted(target_lines, cmp=comparator)
target = target_lines[0] if len(target_lines) > 0 else None
for line in target_lines:
if comparator(line, target) > 0:
target = line
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the difference from

sorted(target_lines, cmp=comparator) 

?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sort is sorting the whole array. The best of sorting algorithm is O(N * Log(N)).

We only need the latest entry, which can be obtained by the loop in O(N).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a huge optimization by itself. I had to open the sort call to debug. And since we could get what we need faster, I figured why not :-)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in this case you can use max() function from python which is O(N) and just one-liner

Copy link
Collaborator Author

@yxieca yxieca Apr 3, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we discussed in person. max doesn't take cmp=... parameter and the (key= lamda) method needs to extend string class.


return sorted_target_lines[-1]
return target


def calculate_files_to_copy(filenames, file_with_latest_line):
Expand Down