Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tls: fix SslSocket read resumption after readDisable when processing the SSL record that contains the last bytes of the HTTP message #13234

Closed

Conversation

antoniovicente
Copy link
Contributor

Commit Message:
tls: fix SslSocket read resumption after readDisable when processing the SSL record that contains the last bytes of the HTTP message

Drain SSL_pending additional bytes from RawSSL after hitting buffer high watermark to avoid read resumption error when all remaining bytes of an HTTP message are trapped in SSL internal buffers.
Additional Description:
Risk Level: medium, small localized fixed to IO resumption.
Testing: new unit and integration tests
Docs Changes: n/a
Release Notes: added
Fixes #12304

…fer high watermark to avoid read resumption error when all remaining bytes of an HTTP message are trapped in SSL internal buffers.

Signed-off-by: Antonio Vicente <avd@google.com>
Signed-off-by: Antonio Vicente <avd@google.com>
@antoniovicente
Copy link
Contributor Author

/assign @alyssawilk

Copy link
Contributor

@alyssawilk alyssawilk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fantastic! thanks so much for sorting this out

ASSERT(pending_bytes < 16 * 1024, "SSL record should be at most 16KB");
if (pending_bytes > 0) {
ASSERT(keep_reading);
reserve_size = pending_bytes;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I get the need to drain from the ssl buffer. What I'm less sure of is if we explicitly set pending_bytes to the bytes already buffered, if we're guaranteed that SSL_read won't read more from transport, but will only consume the current record which is already buffered. I was think of paging boringssl devs for that, but I think it'd be better to just regression test the behavior.

Is it possible to have a test where we send [lots of data][evenmoredata]
and verify that if the first read buffers [lots of data] and we tickle this corner case to drain it, that we won't also read and buffer evenmoredata? Otherwise I'm concerned we'll trade lack of resumption for lack of ceding. I think HighWatermarkReadResumptionProcessingHeaders probably lets you visually verify, but if SSL_read got greedy I don't think that test would fail (I think we'd need a unit test, but correct me if I'm wrong!)

Copy link
Contributor

@davidben davidben Sep 23, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(NB: I don't know anything about how Envoy handles I/O.)

This is sufficient of an edge case on the abstraction that testing things seems worthwhile (for better or worse, OpenSSL's SSL API is supposed to look vaguely like a UNIX TCP socket which abstracts over most buffer handling). But SSL_read will currently never touch the transport if there's is already an unconsumed, decrypted record in the buffer. I wouldn't anticipate us changing that. Likewise, when readahead is off (currently always off and even if we did implement it, it'd be optional), non-zero SSL_pending implies there isn't also an undecrypted record in the buffer.

Though I think the intention of the API (especially if, like OpenSSL but unlike BoringSSL, you believe in renegotiation) is that callers not make as strong of assumptions about the correlation between SSL reads and transport reads. If there's a renegotiation (again, not applicable for BoringSSL), SSL_read may be blocked on the handshake, which may be blocked on transport write or some random callback. A more general implementation (closer to what Chromium does) might look something like:

RunReadStateMachine()

  • If we're not ready to consume read from SSL (application layer doesn't want data, too much is buffered already, etc.), return. Ensure that RunReadStateMachine() is called again when this changes.
  • Call SSL_read, independent of the state of the transport.
  • If SSL_read returned data, great! Pass along to the application. Go back to the top of RunReadStateMachine() in case the application still wants data.
  • If SSL_read failed with SSL_ERROR_WANT_READ, return. Ensure that RunReadStateMachine() is called again when the transport is readable.
  • Ditto for other SSL_ERROR_WANT_* codes that we care to support. Though without renego or client 0-RTT, I think only WANT_READ matters here.

This doesn't depend on the state of the buffers, and it's always safe to call RunReadStateMachine(). However, one difference is RunReadStateMachine() will not pass data to an overbuffered application, even if the data is secretly available in SSL_read already. I think this is fine and conceptually simpler, but Antonio mentioned to me the intent was to pass that data along if available?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

alyssawilk: SslReadBufferLimitTest.DrainToSslRecordBoundary attempts to cover the case you describe by doing 10 10kb writes and verifying that the calls to doRead on the server connection with a configured high-watermark of 16KB deliver the data in 20KB chunks.

I reached out to davidben about to help confirm my understanding of the current implementation and how it's likely to evolve going forward since the fix implemented here ends up depending on some internal details how boringssl deals with partial reads. Seems like the assumptions I'm making are fairly reasonable, but we may want to also think about implications they have on the transport socket contract and how likely it is for other transport socket implementations to fall on similar traps.

There is one other possible way to fix resumption in this case: explicitly schedule read resumption in cases where the read buffer is empty.
Change set: master...antoniovicente:ssl_read_resumption_alternate
Related comments: #12304 (comment)
This approach seems similar to the idea that davidben describes on how to ideally interact with SSL read/write.

A benefit of this alternate approach is that imposes less strict requirements on transport socket implementations but it does involve some possibly spurious synthetic fd readable events. I have an old draft PR that optimizes away fd re-registrations on readDisable transitions which seems to fix this resumption issue, but its blocked on performance data and debugging of some test failures: master...antoniovicente:optimize_read_disable

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I have a preference for @davidben solution as it avoid assuming something about internal behavior that may change. What if we add a method to the transport socket to tell us if the has some buffered bytes, in this case you can avoid useless read events.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yanavlasov So it would be better to do something similar to this alternate solution? master...antoniovicente:ssl_read_resumption_alternate

The issue with a state machine that specific to SSL handling is that it doesn't generalize well to other transport socket implementations. I think the more eager event scheduling proposed in my alternate fix is the closest we can get to davidben's suggestions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm mildly inclined to avoid wake-ups every time. If we don't think we can do a clean and cheap "is there buffered data to process" check I'd lean towards the original solution with extensive testing in case boring or openssl ever change.

I'd definitely like alternate opinions here on which of the ugly options are the least ugly. cc @mattklein123 @snowp @ggreenway for thoughts

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is related to the my question above. I'm confused why we need this in the first place. If the old impl set read to ready, why wouldn't the data be sucked out the next time read is ready?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mattklein123 In the normal case the current resumption logic works correctly. The problem is when we factor in calls to readDisable(true) + readDisable(false) done as we trigger and untrigger the buffer high watermark

Calls to readDisable unregister and re-register the fd for events. When fds are unregistered, any pending synthetic events on the fd are cleared, so the synthetic read event scheduled via setReadBufferReady is never delivered. Resumption after readDisable(false) depends on either there being some bytes in the user-space read buffer or some bytes in the kernel's socket receive buffer. When handling SSL traffic it is possible to have readable bytes in SSL internal buffers while there are 0 bytes in the stream read buffer and 0 bytes in the kernel buffer; when that happens fd re-registration does not lead to a wakeup despite the transport socket being able to produce bytes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Calls to readDisable unregister and re-register the fd for events. When fds are unregistered, any pending synthetic events on the fd are cleared, so the synthetic read event scheduled via setReadBufferReady is never delivered. Resumption after readDisable(false) depends on either there being some bytes in the user-space read buffer or some bytes in the kernel's socket receive buffer. When handling SSL traffic it is possible to have readable bytes in SSL internal buffers while there are 0 bytes in the stream read buffer and 0 bytes in the kernel buffer; when that happens fd re-registration does not lead to a wakeup despite the transport socket being able to produce bytes.

Thanks this makes sense. At minimum, can you add more comments?

Another implementation idea: what if we had a transport socket specific way of asking "are there more bytes?" This could be overridden for TLS to look for kernel/user/within SSL. Then on readEnable() the check would pass and we would read again?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See #13772 instead. Your comment inspired me to think a bit more about cases where the resumption event is lost due to calls to readDisable. The connection impl can detect and remember when that happens and schedule resumption when needed.

// Regression test for https://github.com/envoyproxy/envoy/issues/12304
TEST_P(RawWriteSslIntegrationTest, HighWatermarkReadResumptionProcessingHeaders) {
// The raw writer will perform a separate SSL_write for each of the chunks below. Chunk sizes were
// picked such that the connection's high watermark will trigger while processing the last SSL
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way to make sure this remains true even if we changed the default buffer limits / read limits? I think probably the unit test would do it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test SslReadBufferLimitTest.DrainToSslRecordBoundary further down ensures that when processing a stream of 10KB SSL records with a read limit of 16KB we end up reading 2 full records per read operation (e.i. 20KB)

That was one of the better unit test ideas that I could come up with. Another potentially good unit test would be to create a full server connection with SSL transport and verify behavior when readDisable(true) + readDisable(false) is called after each of the read wakeups, and that the drain operation runs to completion over a sequence of wakeups. I'll try to get that implemented.

last_connection_event_ = event;
closed_ |= (event == Network::ConnectionEvent::RemoteClose ||
event == Network::ConnectionEvent::LocalClose);
connected_ |= (event == Network::ConnectionEvent::Connected);
}
void onAboveWriteBufferHighWatermark() override {}
void onBelowWriteBufferLowWatermark() override {}
void onAboveWriteBufferHighWatermark() override { high_watermark_triggered_ = true; }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, or maybe you planned to use this to make sure it was regression-proof? Looks like it may not be used?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought I needed to pass this info to the write function, but it ended up not being necessary. Cleaned up.

Signed-off-by: Antonio Vicente <avd@google.com>
@yanavlasov yanavlasov self-assigned this Sep 24, 2020
// important to ensure read resumption works correctly after calls to
// Network::Connection::readDisable().
int pending_bytes = SSL_pending(rawSsl());
ASSERT(pending_bytes < 16 * 1024, "SSL record should be at most 16KB");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be <= ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The record size is at most 16KB, but at least 1 byte from the record must have been consumed by an earlier SSL_read, so SSL_pending can return at most 16KB-1

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK makes sense. Can you add a small comment and adjust the assert error to s/at most/less than?

Comment on lines +167 to +169
// internal buffers, and do one last read iteration if bytes are available. This is
// important to ensure read resumption works correctly after calls to
// Network::Connection::readDisable().
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mind adding a bit more color here on the control flow that requires this? Naively I would assume that in the previous logic we would signal to read again later, then we would attempt to read again, and get the data?

ASSERT(pending_bytes < 16 * 1024, "SSL record should be at most 16KB");
if (pending_bytes > 0) {
ASSERT(keep_reading);
reserve_size = pending_bytes;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is related to the my question above. I'm confused why we need this in the first place. If the old impl set read to ready, why wouldn't the data be sucked out the next time read is ready?

@stale
Copy link

stale bot commented Oct 11, 2020

This pull request has been automatically marked as stale because it has not had activity in the last 7 days. It will be closed in 7 days if no further activity occurs. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions!

@stale stale bot added the stale stalebot believes this issue/PR has not been touched recently label Oct 11, 2020
@antoniovicente
Copy link
Contributor Author

Back from vacation, will continue working on this PR soon.

@stale stale bot removed the stale stalebot believes this issue/PR has not been touched recently label Oct 12, 2020
@antoniovicente
Copy link
Contributor Author

Abandoning this change in favor of #13772

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

shouldDrainReadBuffer followed up by readDisable(true) and readDisable(false) strange behaviour
6 participants