Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor scrypt hashing and source app session mgmt pt 1/3 #5692

Merged

Conversation

nabla-c0d3
Copy link
Contributor

@nabla-c0d3 nabla-c0d3 commented Jan 6, 2021

Status

Ready.

Description of Changes

This PR started with the goal of refactoring CryptoUtil.hash_codename() to move it out of CryptoUtil (for #5599).

However, I ran into some problems as I was trying to do this, and saw some opportunities for larger improvements. The downside is that the changes are bigger, so I split them into 3 PRs so they're a bit easier to review. While I know big changes are annoying to review, merge and ultimately deal with, I think these ones are important and worth it.

Overall I felt that a crucial concept was "missing"/invisible in the code base: a user who is currently logged into the source application.

So the main change of these PRs is the introduction of a SourceUser class, which ties it all together: how a user's session is managed (when they create an account, login, etc.), how the user's crypto works (filesystem_id, gpg_secret, etc.), and the user's corresponding "Source" record in the DB.

My overall goal is to make all this logic self-documenting so it is clear to the reader how these users (and more generally the source app) work. Currently, this logic is spread throughout the code base and also repeated a lot, which makes it hard to understand and hard to change. The main example of this is how hash_codename() is used in a lot of code locations, but in each of them it is hard to understand what is getting hashed, and why.

While I split the changes into 3 PRs, each PR brings something useful independently of the other ones:

If everything gets merged, the following issues will be completed or will see progress: #5455, #5599, #5613. I also think the source app's code base will be significantly easier to approach, understand, and maintain (at least that's the main goal).

About this PR which is part 1/3:

  • Implement SourceUser and create_source_user() to consolidate source creation. In the current code base there is some duplicate code (in the tests, the apps, etc.) for creating a new source. This PR replaces all of them with a call to create_source_user() so there's one consistent way of creating a source.
  • Fixes Improve logic for generate_unique_codename #5455.

@nabla-c0d3 nabla-c0d3 force-pushed the refactor-scrypt-hashing-pt-1 branch 2 times, most recently from c82e8be to 9e2a118 Compare January 9, 2021 03:12
@nabla-c0d3 nabla-c0d3 changed the title [WIP] Refactor scrypt hashing pt 1 Refactor scrypt hashing and source app session mgmt pt 1/3 Jan 11, 2021
@nabla-c0d3 nabla-c0d3 force-pushed the refactor-scrypt-hashing-pt-1 branch from 9e2a118 to 5ed7813 Compare January 12, 2021 04:09
passphrase=hashed_secret,
name_email=name,
passphrase=source_user.gpg_secret,
name_email=source_user.filesystem_id,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes the logic self-documenting and the docstring is no longer needed.

p=1,
backend=backend,
)
return scrypt_instance.derive(password.encode("utf-8"))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needed for #5613

@@ -13,7 +13,7 @@
DicewarePassphrase = NewType("DicewarePassphrase", str)


Copy link
Contributor Author

@nabla-c0d3 nabla-c0d3 Jan 12, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only updated a variable's name in this file.

# Issue 2386: don't log in on duplicates
del session['codename']
except (SourcePassphraseCollisionError, SourceDesignationCollisionError) as e:
current_app.logger.error("Could not create a source: {}".format(e))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This consolidates the app's behavior when something goes wrong during account creation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

behavior change from the perspective of the user when SourcePassphraseCollisionError is raised since they will now get a flashed message, this is fine since the codename is no longer in the session when the exception is raised

filesystem_id=filesystem_id,
gpg_secret=crypto_util_.hash_codename(codename, salt=crypto_util_.scrypt_gpg_pepper)
)
crypto_util_.genkeypair(source_user)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

genkeypair() now takes a source user.

return SourceUser(source_db_record, filesystem_id, gpg_secret)


class _SourceScryptManager:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The scrypt hashing logic becomes private and an "implementation detail" of how the source app and users work.

assert expected_filesystem_id == filesystem_id

expected_gpg_secret = "AWCRZVPA6YTQ2A3552LZJW3VO7L3ZONDFT6A6VPRGPGQQSNENRLA3EVRW4LZYNSUV5QIKNFZMJ2BMOVORG43ZETV5ZCRQKLJNOC2BXY=" # noqa: E501
assert expected_gpg_secret == gpg_secret
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test vectors in this test were generated using the previous / non-cryptography implementation of scrypt, to ensure that the new implementation returns the same hashes when given the same input.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sick

@emkll
Copy link
Contributor

emkll commented Jan 14, 2021

Hi @nabla-c0d3 , thank you for your continued contributions to the project. We are currently preparing a release this week and next, but will get around to providing comments/review to this PR after that. Given the size and the areas of the code this PR is affecting, it may take us a while to review this series of proposed changes.

@nabla-c0d3
Copy link
Contributor Author

Given the size and the areas of the code this PR is affecting, it may take us a while to review this series of proposed changes.

@emkll No problem and makes sense!

@eloquence
Copy link
Member

Hi @nabla-c0d3, thanks a lot for taking on this complex refactoring work. I just want to manage expectations on review -- we're currently very focused on finishing Ubuntu 20.04 support, and these PRs touch a lot of key areas of the code, so it may still take us several weeks to complete a thorough review of these changes. That said, if you have any questions in the meantime, please don't hesitate to ping.

@nabla-c0d3
Copy link
Contributor Author

No rush on my side

@eloquence
Copy link
Member

Thanks for rebasing @nabla-c0d3 -- we're just coming out of the 1.8.0 release cycle and should be able to take a closer look in the next few days, once tests are passing. (Let us know if we can help with that.)

@nabla-c0d3
Copy link
Contributor Author

@eloquence Sounds good.

The test that is failing doesn't seem to be related to my changes. The latest run on the develop branch failed in a similar fashion (different test but same error "Directory not empty": https://app.circleci.com/pipelines/github/freedomofpress/securedrop/2119/workflows/c6371546-78d2-4050-8513-b33960083a7d/jobs/52269).

I will try rebasing again on the next commit that gets merged into develop, and hopefully the tests will pass,

@nabla-c0d3 nabla-c0d3 force-pushed the refactor-scrypt-hashing-pt-1 branch from c2b8415 to 8b32fb5 Compare March 17, 2021 02:06
@nabla-c0d3
Copy link
Contributor Author

Only one test failed, but I don't think it's related to my changes. AFAIK I haven't touched apparmor or anything related to it.

Copy link
Contributor

@redshiftzero redshiftzero left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hey thanks for this! I definitely agree the SourceUser object makes the logic clearer. dropped a few notes inline, nits are optional

# Issue 2386: don't log in on duplicates
del session['codename']
except (SourcePassphraseCollisionError, SourceDesignationCollisionError) as e:
current_app.logger.error("Could not create a source: {}".format(e))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

behavior change from the perspective of the user when SourcePassphraseCollisionError is raised since they will now get a flashed message, this is fine since the codename is no longer in the session when the exception is raised

resp = app.post(url_for('main.create'), data={'tab_id': tab_id}, follow_redirects=True)
assert resp.status_code == 200
assert 'codename' not in session
assert 'logged_in' not in session
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note to other maintainers: this PR does preserve the duplicate tab behavior implemented in the test steps in #5075


# When deriving the passphrase's filesystem ID and GPG secret
scrypt_mgr = _SourceScryptManager(
salt_for_gpg_secret="YrPAwKMyWN66Y2WNSt+FS1KwfysMHwPISG0wmpb717k=".encode(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: slight preference for an all caps constant for clarity at the top of the file or at the top of the function, e.g. TEST_SALT_GPG_SECRET for the values for salt_for_gpg_secret and similar for salt_for_filesystem_id

assert expected_filesystem_id == filesystem_id

expected_gpg_secret = "AWCRZVPA6YTQ2A3552LZJW3VO7L3ZONDFT6A6VPRGPGQQSNENRLA3EVRW4LZYNSUV5QIKNFZMJ2BMOVORG43ZETV5ZCRQKLJNOC2BXY=" # noqa: E501
assert expected_gpg_secret == gpg_secret
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sick

@nabla-c0d3 nabla-c0d3 force-pushed the refactor-scrypt-hashing-pt-1 branch 2 times, most recently from c847406 to 99dd2c8 Compare April 1, 2021 04:33
@nabla-c0d3 nabla-c0d3 force-pushed the refactor-scrypt-hashing-pt-1 branch from 99dd2c8 to 83a026a Compare May 5, 2021 02:04
@nabla-c0d3 nabla-c0d3 requested a review from a team as a code owner May 5, 2021 02:04
@nabla-c0d3
Copy link
Contributor Author

@nabla-c0d3 If you have cycles, would you mind rebasing your branch against develop once more? Major code removal just landed which will also ensure that tests pass for future changes. Thanks, and sorry this is taking a while - big changes :)

No problem - I just did the rebase and it went smoothly. There are two tests failing in testinfra now, but as mentioned before I have not been able to setup a VM that can run make testinfra.

@nabla-c0d3 nabla-c0d3 force-pushed the refactor-scrypt-hashing-pt-1 branch from 83a026a to a7c8dc5 Compare August 14, 2021 12:26
@lgtm-com
Copy link

lgtm-com bot commented Aug 14, 2021

This pull request introduces 2 alerts when merging a7c8dc5 into 46e993b - view on LGTM.com

new alerts:

  • 2 for Unused import

@nabla-c0d3 nabla-c0d3 force-pushed the refactor-scrypt-hashing-pt-1 branch from a7c8dc5 to 0d988d5 Compare August 14, 2021 12:50
@lgtm-com
Copy link

lgtm-com bot commented Aug 14, 2021

This pull request introduces 2 alerts when merging 0d988d5 into 46e993b - view on LGTM.com

new alerts:

  • 2 for Unused import

@nabla-c0d3 nabla-c0d3 force-pushed the refactor-scrypt-hashing-pt-1 branch 3 times, most recently from 3c5595f to 653357b Compare August 14, 2021 17:44
@nabla-c0d3
Copy link
Contributor Author

I have rebased this PR, but the same test_aa_no_denies_in_syslog() is still failing.

I haven't been able to figure it out because the CircleCI log does not show the full output (Full output truncated (4869 lines hidden), use '-vv' to show). I also haven't been able to get make testinfra to work in an Ubuntu VM - all the tests in testinfra just fail right away.
Being able to see the full output of the test would already help a lot, but I am not sure how to get to that.

@zenmonkeykstop
Copy link
Contributor

zenmonkeykstop commented Aug 17, 2021

Hi @nabla-c0d3 - looks like a file securedrop/source_user.py was added - for staging and prod environments, new files need to be explicitly included in the apache2 apparmor profile at install_files/ansible-base/roles/build-securedrop-app-code-deb-pkg/files/usr.sbin.apache2. Not sure what's going on with make testinfra, will look into it, but hopefully CI should get a little further now.

(Also rebased again to pull in an unrelated CI fix)

@zenmonkeykstop zenmonkeykstop force-pushed the refactor-scrypt-hashing-pt-1 branch from 5e722ce to e2a11fa Compare August 17, 2021 05:27
@nabla-c0d3
Copy link
Contributor Author

for staging and prod environments, new files need to be explicitly included in the apache2 apparmor profile

Thanks for looking into this. Looks like CI is now passing! 🎉

@eloquence eloquence added this to the 2.1.0 milestone Aug 18, 2021
@kushaldas kushaldas self-assigned this Aug 23, 2021
Copy link
Contributor

@kushaldas kushaldas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is one comment from @redshiftzero https://github.com/freedomofpress/securedrop/pull/5692/files#r601688777 which can be addressed in this while doing the conflict fix. After that we can merge this.

To test

  • I created a staging instance from develop and created users.
  • created sources on that
  • build debian package from this branch and redeployed
  • old users can login, new users can be created and login properly

The code changes look good to me. @nabla-c0d3 after you update the branch, I will do one quick round of check and merge. For now moving to the next PR. Thank you once again for doing this.

@@ -224,7 +224,13 @@ def test_were_there_submissions_today(source_app, config):
args = argparse.Namespace(data_root=data_root, verbose=logging.DEBUG)

count_file = os.path.join(data_root, 'submissions_today.txt')
source, codename = db_helper.init_source_without_keypair()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

codename was not used, so this is fine.

@nabla-c0d3 nabla-c0d3 force-pushed the refactor-scrypt-hashing-pt-1 branch from e2a11fa to 436a804 Compare August 25, 2021 19:53
@nabla-c0d3
Copy link
Contributor Author

nabla-c0d3 commented Aug 25, 2021

@kushaldas Thanks for looking into this and testing it.

There is one comment from @redshiftzero https://github.com/freedomofpress/securedrop/pull/5692/files#r601688777 which can be addressed in this while doing the conflict fix. After that we can merge this.

Ok I have made the change and fixed the conflict.



TEST_SALT_GPG_SECRET = "YrPAwKMyWN66Y2WNSt+FS1KwfysMHwPISG0wmpb717k="
TEST_SALT_FOR_FILESYSTEM_ID = "mEFXIwvxoBqjyxc/JypLdvgMRNRjApoaM0OBNrxJM2E="
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved these to the top of the file

Copy link
Contributor

@zenmonkeykstop zenmonkeykstop left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One minor point re: exception error messages

Fix test

Fix tests

added securedrop/source_user.py to Apache Apparmor profile

minor fix to dev data loader
@nabla-c0d3 nabla-c0d3 force-pushed the refactor-scrypt-hashing-pt-1 branch from 50a7aaf to 203a873 Compare August 28, 2021 14:02
@zenmonkeykstop zenmonkeykstop dismissed kushaldas’s stale review August 30, 2021 13:59

changes applied since.

@zenmonkeykstop zenmonkeykstop merged commit 8fa4a1d into freedomofpress:develop Aug 30, 2021
@nabla-c0d3 nabla-c0d3 deleted the refactor-scrypt-hashing-pt-1 branch September 25, 2021 04:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve logic for generate_unique_codename
6 participants