Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add encryption to diskqueue #31961

Merged
merged 7 commits into from
Jun 21, 2022
Merged

Conversation

leehinman
Copy link
Contributor

What does this PR do?

Adds support for AES 128 CTR encryption to disk queue.

Why is it important?

Necessary for endpoint running under elastic-agent

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
    - [ ] I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

How to test this PR locally

go test -bench=1M -benchtime 1x -count 10 -timeout 600m -benchmem > results.txt

benchstat results.txt
name               time/op
Async1M-16          22.0s ± 1%
EncryptAsync1M-16   20.5s ± 1%
Sync1M-16           25.2s ± 1%
EncryptSync1M-16    27.5s ± 0%

name               alloc/op
Async1M-16         3.43GB ± 0%
EncryptAsync1M-16  3.43GB ± 0%
Sync1M-16          3.42GB ± 0%
EncryptSync1M-16   3.42GB ± 0%

name               allocs/op
Async1M-16          40.6M ± 0%
EncryptAsync1M-16   40.5M ± 0%
Sync1M-16           39.5M ± 0%
EncryptSync1M-16    40.1M ± 0%

Related issues

@botelastic botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Jun 16, 2022
@leehinman leehinman added enhancement Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team labels Jun 16, 2022
@botelastic botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label Jun 16, 2022
@leehinman leehinman marked this pull request as ready for review June 16, 2022 21:54
@leehinman leehinman requested a review from a team as a code owner June 16, 2022 21:54
@leehinman leehinman requested review from fearful-symmetry and faec and removed request for a team June 16, 2022 21:54
@elasticmachine
Copy link
Collaborator

Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane)

@leehinman leehinman mentioned this pull request Jun 16, 2022
5 tasks
@leehinman leehinman force-pushed the diskqueue_encryption_v2 branch from 8026f5e to 8e5d606 Compare June 16, 2022 22:41
@elasticmachine
Copy link
Collaborator

elasticmachine commented Jun 16, 2022

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

Build stats

  • Start Time: 2022-06-20T22:17:37.742+0000

  • Duration: 92 min 12 sec

Test stats 🧪

Test Results
Failed 0
Passed 22282
Skipped 1937
Total 24219

💚 Flaky test report

Tests succeeded.

🤖 GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

  • /package : Generate the packages and run the E2E tests.

  • /beats-tester : Run the installation tests with beats-tester.

  • run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

@cmacknz
Copy link
Member

cmacknz commented Jun 17, 2022

Benchmarks for this implementation:

name               time/op
Async1M-16          22.0s ± 1%
EncryptAsync1M-16   20.5s ± 1%
Sync1M-16           25.2s ± 1%
EncryptSync1M-16    27.5s ± 0%

name               alloc/op
Async1M-16         3.43GB ± 0%
EncryptAsync1M-16  3.43GB ± 0%
Sync1M-16          3.42GB ± 0%
EncryptSync1M-16   3.42GB ± 0%

name               allocs/op
Async1M-16          40.6M ± 0%
EncryptAsync1M-16   40.5M ± 0%
Sync1M-16           39.5M ± 0%
EncryptSync1M-16    40.1M ± 0%

Compared to the previous implementation: #31949

name           time/op
V1Async1M-16    38.8s ± 2%
V2Async1M-16    37.9s ± 8%
V1Sync1M-16     37.7s ± 1%
V2Sync1M-16     40.3s ± 0%

name           alloc/op
V1Async1M-16   3.47GB ± 0%
V2Async1M-16   3.46GB ± 0%
V1Sync1M-16    3.44GB ± 0%
V2Sync1M-16    3.45GB ± 0%


name           allocs/op
V1Async1M-16    43.4M ± 0%
V2Async1M-16    42.0M ± 0%
V1Sync1M-16     40.5M ± 0%
V2Sync1M-16     41.2M ± 0%

This version seems almost twice as fast ad allocates slightly less. What do you think caused this @leehinman? I want to make sure the improvement isn't a mirage :)

@cmacknz cmacknz requested a review from rdner June 17, 2022 19:51
@cmacknz
Copy link
Member

cmacknz commented Jun 17, 2022

go test -bench=1M -benchtime 1x -count 10 -timeout 600m -benchmem Can we make this a mage target or even just a script in the disk queue directory? You've already figured out a reasonable config for running tests and we should save future developers from having to do that themselves.


func (er *EncryptionReader) Read(buf []byte) (int, error) {
if cap(er.ciphertext) >= len(buf) {
er.ciphertext = er.ciphertext[:len(buf)]
Copy link
Member

@cmacknz cmacknz Jun 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there any security concerns with the re-slicing here? Part of the previous ciphertext will potentially hang around at the end of the underlying array.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, having ciphertext around is acceptable. We expect Malory to be able to "see" cipher text. Also I would assume anyone that could view memory of the process could also view the segment file which would have the cipher text as well.

@@ -134,15 +135,26 @@ type segmentHeader struct {
// If the segment file has not been completely written, this field is zero.
// Only present in schema version >= 1.
frameCount uint32

// options holds flags to enable features, for example encryption.
options uint8
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason not to just make this larger for more future proofing? Probably nobody will notice 1-3 extra bytes in each segment file depending on uint16 or uint32 as the larger size.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I waffled on this. In the end I picked uint8 because I couldn't think of even 8 options, but we could easily go with uint32, that would match version & count.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

switched to uint32

@leehinman
Copy link
Contributor Author

This version seems almost twice as fast ad allocates slightly less. What do you think caused this @leehinman? I want to make sure the improvement isn't a mirage :)

So this is because #31935 wasn't merged when the benchmark was run for #39149, that is what accounts for the speed up and drop in allocations.

One thing that is directly comparable between #31935 and this PR is

"1M_10k-16 21.7s ± 0%" is the same test as "Async1M-16 22.0s ± 1%" in this PR. The times are close enough that I would say the changes in this PR that add the options field, and switches to a segmentReader instead of direct io.File, don't change the speed significantly. The allocations are the same between the two.

leehinman and others added 3 commits June 17, 2022 19:19
Co-authored-by: Craig MacKenzie <craig.mackenzie@elastic.co>
Co-authored-by: Craig MacKenzie <craig.mackenzie@elastic.co>
@leehinman
Copy link
Contributor Author

go test -bench=1M -benchtime 1x -count 10 -timeout 600m -benchmem Can we make this a mage target or even just a script in the disk queue directory? You've already figured out a reasonable config for running tests and we should save future developers from having to do that themselves.

Short term, I added comment to header of benchmark_test.go about how to run. I'll work with team to see what people want long term.

@leehinman leehinman force-pushed the diskqueue_encryption_v2 branch from e7e45c6 to 3d28c63 Compare June 18, 2022 01:17
@@ -15,8 +15,18 @@
// specific language governing permissions and limitations
// under the License.

// example: go test -bench=. -benchtime 3x -timeout 60m
// Usage:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

}
dir, err := os.MkdirTemp("", t.Name())
assert.Nil(t, err)
// defer os.RemoveAll(dir)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this supposed to stay?

Copy link
Member

@cmacknz cmacknz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, wait for @faec to review before merging.

Copy link
Contributor

@faec faec left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good, thanks!

@leehinman leehinman merged commit 0e4a36b into elastic:main Jun 21, 2022
@leehinman leehinman mentioned this pull request Jun 21, 2022
5 tasks
chrisberkhout pushed a commit that referenced this pull request Jun 1, 2023
* Add encryption to diskqueue

Co-authored-by: Craig MacKenzie <craig.mackenzie@elastic.co>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants