Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add DB scanner module (mongo support only) #99

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

tenrowd
Copy link
Contributor

@tenrowd tenrowd commented Apr 19, 2020

No description provided.

@manmolecular
Copy link
Collaborator

Thanks, I will take a look soon 👌

@manmolecular manmolecular self-requested a review April 20, 2020 06:56
Copy link
Collaborator

@manmolecular manmolecular left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a lot of things that need to be re-done. The main problems are:

  1. The code is too over-complicated, we need to make it simpler, much simpler because there is a lot of useless checks.
  2. The nesting is too big and totally uncontrollable, separate big functions into small sub-functions and sub-methods
  3. The code is really hard-readable. Subfunctions, private methods, and other stuff, again.
  4. The output JSON is wrong, it has an improper format. Check the code review for more information.
  5. The JSON file (one file) is written in a cycle, in multiple processes, for hundreds of hosts. It is kinda dangerous in many ways, and totally not effective. Read the review for more information. We can save JSON per host, for example, or can do something else, idk.
  6. Recursive functions. Just in case.

That's all for the first iteration because without fixes of the stated problems code is really hard to read (and understand).

I will update the review when the problems above will be fixed.

custom_scripts/py_scripts/db_scanner/db_scanner.py Outdated Show resolved Hide resolved
custom_scripts/py_scripts/db_scanner/db_scanner.py Outdated Show resolved Hide resolved
custom_scripts/py_scripts/db_scanner/db_scanner.py Outdated Show resolved Hide resolved
custom_scripts/py_scripts/db_scanner/db_scanner.py Outdated Show resolved Hide resolved
custom_scripts/py_scripts/db_scanner/db_scanner.py Outdated Show resolved Hide resolved
custom_scripts/py_scripts/db_scanner/db_scanner.py Outdated Show resolved Hide resolved
custom_scripts/py_scripts/db_scanner/db_scanner.py Outdated Show resolved Hide resolved
custom_scripts/py_scripts/db_scanner/db_scanner.py Outdated Show resolved Hide resolved
custom_scripts/py_scripts/db_scanner/db_scanner.py Outdated Show resolved Hide resolved
custom_scripts/py_scripts/db_scanner/db_scanner.py Outdated Show resolved Hide resolved
@manmolecular manmolecular added refactoring Something needs to be changed enhancement New feature or request labels May 26, 2020
@manmolecular
Copy link
Collaborator

Any news? @tenrowd

@manmolecular manmolecular marked this pull request as ready for review June 22, 2020 16:47
@manmolecular
Copy link
Collaborator

Thanks for the updates. 👍
I will take an accurate look at the code as soon as possible.

Copy link
Collaborator

@manmolecular manmolecular left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's start with some basic fixes because they are everywhere and can be seen through the rest of the code. This review is really small, so we can fix something step-by-step to not waste the time on big reviews (for now).

Also, please, take a look at the problem with the temporary JSON files (e.g. they are corrupted sometimes).
And think about database operations optimization - maybe it's better to use something like $in operators and so on to make one request instead of len(keys) requests.

:param filename: filename of resulting file
:return: None
"""
with open(filename, mode="a") as result_file:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sometimes the resulting file (the one that looks like /temporaries/ip.json) is not close properly, and in this case, the structure of the file is wrong - we have the invalid JSON file in this case. So, I don't know why, but maybe this happens because of the Mongo timeout, or any other timeout, and the file doesn't write properly. Please, think about the idea of how can we fix it.

ip_values = DBScannerDefaultValues.MODULE_RESULTS.keys()
for ip in ip_values:
filename = str(temporary_directory.joinpath(str(ip) + ".json"))
with open(filename, mode="a") as result_file:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, sorry, I believe that note about the invalid output JSON belongs here. But still, please, check all the functions that are related to the file saving (it's important to have the valid JSONs).

"""
ip_values = DBScannerDefaultValues.MODULE_RESULTS.keys()
for ip in ip_values:
filename = str(temporary_directory.joinpath(str(ip) + ".json"))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use format strings with .format() or f"{}"

Comment on lines +147 to +150
if result:
return result
else:
return None
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if result:
    return result
return

You can use None in return statement if you want to do it, but you know that return by default returns None, sooooo... Yep. Think about it.

Also, you can skip else statement here, because if you skip if result: part (when you don't have result), you will automatically go to the part which is the next and the final in terms of the function (or maybe I missed something? Please tell me If so 😄 )

"""
for info in DBScannerDefaultValues.CRITICAL_INFORMATION:
regular = findall(info + DBScannerDefaultValues.NOT_INCLUDED, key)
if regular:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To reduce the quantity of the nested loops, you can check the edge cases first, like:

if not regular:
    continue
if isinstance(document...):
    ...

result_list = search_in_list(document[key])
if result_list:
document_dict.update(result_list)
else:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can omit the else: here, please check it

Comment on lines +269 to +272
self.port = 27017
self.server_timeout = 10000 # serverSelectionTimeoutMS
self.socket_timeout = 300000 # socketTimeoutMS
self.connection_timeout = 10000 # connectTimeoutMs
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's better to move this stuff in class like DbScannerDefaultConfiguration or something like this

try:
self.check_time_of_connection()
except DBScannerConnectionTimeout as err:
if err:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need to check that? 😺
I believe that we almost always have the err object. Or something special here?

return None, err
if collection_list:
return collection_list, None
else:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can omit else: here, check it please

keys_to_take = []
regulars = []
projection = {}
for key in keys:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@manmolecular manmolecular changed the title WIP: Db scanner Add DB scanner module (mongo support only) Jun 24, 2020
@manmolecular
Copy link
Collaborator

Wassup? How's your progress with fixes? Do you need any help? DM me on TG if you need anything or have any questions. ✌️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request refactoring Something needs to be changed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants