-
Notifications
You must be signed in to change notification settings - Fork 792
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mongo key encryption #1506
Mongo key encryption #1506
Conversation
…nd PasswordBasedByteEncryptor This change allows to encrypt strings and bytes without any additional conversion done on the caller
This change enables the encryption/decryption of mongo key with a custom secret
…ongo key initialization that happens during launch and initialization of key which happens after login or registration
@@ -19,21 +21,13 @@ def get(self): | |||
return {"needs_registration": is_registration_needed} | |||
|
|||
def post(self): | |||
credentials = _get_user_credentials_from_request(request) | |||
# TODO delete the old key here, before creating new one |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it worth it? I think it is. It will allow us to remove the extra step of delete this mongo key
from credential reset procedure
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense, as long as it's appropriately encapsulated.
Codecov Report
@@ Coverage Diff @@
## develop #1506 +/- ##
===========================================
+ Coverage 42.69% 42.82% +0.12%
===========================================
Files 476 478 +2
Lines 14202 14242 +40
===========================================
+ Hits 6064 6099 +35
- Misses 8138 8143 +5
Continue to review full report at Codecov.
|
@mssalvatore take a look at commit 4f17693.
Separating these changes into different commits post-factum seems more work than it's worth. Doing refactoring first requires a lot of forethought. We could separate the things roughly by modified files, but the it will introducuse system-breaking commits. |
return f"{username}:{password}" | ||
|
||
|
||
def get_creds_from_request(_request: Request) -> Tuple[str, str]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dnt abbrv.
Or at least abbreviate consistently. This function is get_creds...()
, whereas above there is get_user_credentials...()
My preference is to use "credentials" in these names.
def get_secret_from_request(_request) -> str: | ||
username, password = get_creds_from_request(_request) | ||
return f"{username}:{password}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be handled in setup_datastore_key()
. Pass it the username and password and let it worry about what to do with them.
key_file = os.path.join(key_file_dir, self._KEY_FILENAME) | ||
def __init__(self, key_file_dir: str): | ||
self.key_file_path = os.path.join(key_file_dir, self._KEY_FILENAME) | ||
self._key_base_encryptor = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"_key_based"
class PasswordBasedStringEncryptor(IEncryptor): | ||
|
||
_BUFFER_SIZE = pyAesCrypt.crypto.bufferSizeDef | ||
|
||
def __init__(self, password: str): | ||
self.password = password | ||
|
||
def encrypt(self, plaintext: str) -> str: | ||
plaintext_stream = io.BytesIO(plaintext.encode()) | ||
ciphertext = PasswordBasedByteEncryptor(self.password).encrypt(plaintext_stream) | ||
|
||
return base64.b64encode(ciphertext.getvalue()).decode() | ||
|
||
def decrypt(self, ciphertext: str) -> str: | ||
ciphertext = base64.b64decode(ciphertext) | ||
ciphertext_stream = io.BytesIO(ciphertext) | ||
|
||
plaintext_stream = PasswordBasedByteEncryptor(self.password).decrypt(ciphertext_stream) | ||
return plaintext_stream.getvalue().decode("utf-8") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
class PasswordBasedStringEncryptor(IEncryptor): | |
_BUFFER_SIZE = pyAesCrypt.crypto.bufferSizeDef | |
def __init__(self, password: str): | |
self.password = password | |
def encrypt(self, plaintext: str) -> str: | |
plaintext_stream = io.BytesIO(plaintext.encode()) | |
ciphertext = PasswordBasedByteEncryptor(self.password).encrypt(plaintext_stream) | |
return base64.b64encode(ciphertext.getvalue()).decode() | |
def decrypt(self, ciphertext: str) -> str: | |
ciphertext = base64.b64decode(ciphertext) | |
ciphertext_stream = io.BytesIO(ciphertext) | |
plaintext_stream = PasswordBasedByteEncryptor(self.password).decrypt(ciphertext_stream) | |
return plaintext_stream.getvalue().decode("utf-8") | |
class PasswordBasedStringEncryptor(IEncryptor): | |
_BUFFER_SIZE = pyAesCrypt.crypto.bufferSizeDef | |
def __init__(self, password: str): | |
self._bytes_encryptor = PasswordBasedByteEncryptor(password) | |
def encrypt(self, plaintext: str) -> str: | |
plaintext_stream = io.BytesIO(plaintext.encode()) | |
ciphertext = self._bytes_encryptor.encrypt(plaintext_stream) | |
return base64.b64encode(ciphertext.getvalue()).decode() | |
def decrypt(self, ciphertext: str) -> str: | |
ciphertext = base64.b64decode(ciphertext) | |
ciphertext_stream = io.BytesIO(ciphertext) | |
plaintext_stream = self._bytes_encryptor.decrypt(ciphertext_stream) | |
return plaintext_stream.getvalue().decode("utf-8") |
@@ -17,36 +17,28 @@ | |||
# Note: password != key | |||
|
|||
|
|||
class PasswordBasedEncryptor(IEncryptor): | |||
class PasswordBasedByteEncryptor(IEncryptor): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about "PasswordBasedBytes
Encryptor"? "Byte" makes it seem like it encrypts a single byte.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The file name should match the class name.
|
||
_BUFFER_SIZE = pyAesCrypt.crypto.bufferSizeDef | ||
|
||
def __init__(self, password: str): | ||
self.password = password | ||
|
||
def encrypt(self, plaintext: str) -> str: | ||
plaintext_stream = io.BytesIO(plaintext.encode()) | ||
def encrypt(self, plaintext: BytesIO) -> BytesIO: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should take bytes
, not BytesIO
. Anything that has to use it (such as PasswordBasedStringEncryptor
) has to convert first to bytes
and then to BytesIO
. Anything that already has data as bytes has to then convert it to BytesIO
.
The only reason that it needs to be BytesIO
is because that's what pyAesCrypt expects. This signature requiring BytesIO
leaks implementation details to the callers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree on this. It will also improve readability where we use the functions.
def init_key(self, secret: str): | ||
if os.path.exists(self.key_file_path): | ||
self._load_existing_key(secret) | ||
else: | ||
self._init_key(key_file) | ||
|
||
self._key_base_encryptor = KeyBasedEncryptor(self._cipher_key) | ||
|
||
def _init_key(self, password_file_path: str): | ||
self._cipher_key = Random.new().read(self._BLOCK_SIZE) | ||
with open_new_securely_permissioned_file(password_file_path, "wb") as f: | ||
f.write(self._cipher_key) | ||
|
||
def _load_existing_key(self, key_file): | ||
with open(key_file, "rb") as f: | ||
self._cipher_key = f.read() | ||
self._create_new_key(secret) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this two-phased construction approach adds unnecessary complexity to the encryptor. I think we can solve this in a better way by using a factory. We can discuss options on Zoom.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean that database initializer is constructed first and the key is initialized later? Or that we either load the key or create one?
def test_create_new_password_file(tmpdir): | ||
initialize_datastore_encryptor(tmpdir) | ||
|
||
assert os.path.isfile(os.path.join(tmpdir, PASSWORD_FILENAME)) | ||
setup_datastore_key(ENCRYPTOR_SECRET) | ||
assert os.path.isfile(os.path.join(tmpdir, DataStoreEncryptor._KEY_FILENAME)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While we're here:
def test_create_new_password_file(tmp_path):
initialize_datastore_encryptor(tmp_path)
setup_datastore_key(ENCRYPTOR_SECRET)
assert (tmp_path / DataStoreEncryptor._KEY_FILENAME).is_file()
|
||
|
||
def _credentials_match_registered_user(username, password): | ||
def _credentials_match_registered_user(username: str, password: str): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def _credentials_match_registered_user(username: str, password: str): | |
def _credentials_match_registered_user(username: str, password: str) -> bool: |
logger = logging.getLogger(__name__) | ||
|
||
|
||
class PasswordBasedStringEncryptor(IEncryptor): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The file name should match the class name.
…ests for data store encryptor Data store key needs to be deleted upon registration to create a new one.
…ptor from credential_utils.py
…hange PasswordBasedBytesEncryptor interface to use bytes instead of io.BytesIO
…e() syntax to check for presence of files
…s credentials.)
…thods into EncryptorFactory This makes encryptor initialization workflow more straight-forward and the files become smaller, easier to read
…equest This better indicates that get_username_password_from_request returns a username/password pair rather than UserCreds structure
assert (initialized_key_dir / EncryptorFactory._KEY_FILENAME).isfile() | ||
|
||
|
||
def test_key_removal(initialized_key_dir, monkeypatch): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def test_key_removal(initialized_key_dir, monkeypatch): | |
def test_key_removal(initialized_key_dir): |
initialize_encryptor_factory(tmpdir) | ||
assert not (tmpdir / EncryptorFactory._KEY_FILENAME).isfile() | ||
initialize_datastore_encryptor(MOCK_USERNAME, MOCK_PASSWORD) | ||
assert (tmpdir / EncryptorFactory._KEY_FILENAME).isfile() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should add an encryption/decryption test.
if not factory: | ||
raise FactoryNotInitializedError | ||
secret = get_secret_from_credentials(username, password) | ||
_encryptor = DataStoreEncryptor(factory.key_file_path, secret) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Factories are responsible for constructing objects. We should have factory.set_key_file_path()
and factory.set_secret()
and then call factory.construct_data_store_encryptor()
or similar.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is not a factory, it's a class that handles initialization parameters for data store encryptor. I'm not sure what would be a better name, maybe DataStoreEncryptorInitializer
? I don't see any benefits in making this an actual factory, since the output object will not be different depending on key file path and secret. What is more, the usage of such factory is less intuitive. The registration resource would have to factory.set_secret()
then built_encryptor = factory.construct_data_store_encryptor()
and finally set_encryptor(built_encryptor)
. Now it's only one call - initialize_datastore_encryptor(username, password)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can simplify this way down. I'm going to dig into it a bit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll try to refactor and see how it looks
_KEY_FILENAME = "mongo_key.bin" | ||
|
||
def __init__(self, key_file_dir: str): | ||
self.key_file_path = os.path.join(key_file_dir, self._KEY_FILENAME) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Prefer pathlib
… data_store_encryptor.py
def __init__(self, key_based_encryptor: KeyBasedEncryptor): | ||
self._key_based_encryptor = key_based_encryptor |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the DataStoreEncryptor is a thin wrapper around KeyBasedEncryptor with no logic, I'd rather get_datastore_encryptor()
return an IEncryptor and just return a concrete KeyBasedEncryptor. We can refactor everything to use encrypt()
and decrypt()
instead of enc()
and dec()
, which is better anyway.
This separates encryptor classes from other encryption related infrastructure that we have cc\server_utils\encryption
…e encryptor creation related code to data_store_encryptor.py, move the reponsibility to initialize data store encryptor to AuthenticationService
Database initialization can not be done because island doesn't know the key needed for encrypting collections. Since the key only appears after registration, database setup also should happen only after registration
from monkey_island.cc.server_utils.encryption.encryptors.password_based_bytes_encryption import ( | ||
PasswordBasedBytesEncryptor, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can't do from monkey_island.cc.server_utils.encryption import PasswordBasedBytesEncryptor
because of circular imports. 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this iteration looks much better. I have some thoughts on cleaning this up a bit further, but if all of the automated and manual tests pass, let's merge it.
a8f875d
to
ddff2f0
Compare
What does this PR do?
Fixes #1463
PR Checklist
Testing Checklist