Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't call user callbacks on MsQuic worker thread. #98361

Conversation

rzikm
Copy link
Member

@rzikm rzikm commented Feb 13, 2024

Closes #98039.

This should also help with #55979, as previously the callbacks were delaying MsQuic threads which led to MsQuic thinking the workers were overloaded.

@ghost
Copy link

ghost commented Feb 13, 2024

Tagging subscribers to this area: @dotnet/ncl
See info in area-owners.md if you want to be subscribed.

Issue Details

Closes #98039.

Author: rzikm
Assignees: -
Labels:

area-System.Net.Quic

Milestone: -

@rzikm
Copy link
Member Author

rzikm commented Feb 14, 2024

For now, one test keeps failing due to microsoft/msquic#4132. We'll have to wait until that issue is fixed and new MsQuic bits flow to all CI images (or disable the test meanwhile)

@wfurt
Copy link
Member

wfurt commented Feb 15, 2024

For now, one test keeps failing due to microsoft/msquic#4132. We'll have to wait until that issue is fixed and new MsQuic bits flow to all CI images (or disable the test meanwhile)

Would we need to bump minimal expected msquic version? If not how would we expect it to work with current versions?

@rzikm
Copy link
Member Author

rzikm commented Feb 15, 2024

Would we need to bump minimal expected msquic version? If not how would we expect it to work with current versions?

I would prefer to bump the minimum required version. But we can alternatively do a version check and do the work inline as we did until now.

@CarnaViire
Copy link
Member

CarnaViire commented Feb 19, 2024

I would prefer to bump the minimum required version. But we can alternatively do a version check and do the work inline as we did until now.

Just a reminder, that Alpine Arm32 is pinned on an older msquic version due to a regression microsoft/msquic#3958

Copy link
Member

@ManickaP ManickaP left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, small nits and one last question.

@rzikm rzikm requested a review from ManickaP February 20, 2024 19:16
@rzikm rzikm added the NO-MERGE The PR is not ready for merge yet (see discussion for detailed reasons) label Feb 20, 2024
@rzikm
Copy link
Member Author

rzikm commented Feb 20, 2024

I think the PR is ready for the final round of review. However, we need to figure out how to deal with microsoft/msquic#4132. Given that some images are pinned on an older version due to microsoft/msquic#3958, I think we should add a version check and do the async cert validation inline on older version (with minimal code changes), and later bump minimum required MsQuic version and remove the version check.

@ManickaP
Copy link
Member

I think we should add a version check and do the async cert validation inline on older version (with minimal code changes), and later bump minimum required MsQuic version and remove the version check.

👍

Copy link
Member

@ManickaP ManickaP left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@ManickaP
Copy link
Member

There are some CI failures that might be relevant though.

@rzikm
Copy link
Member Author

rzikm commented Feb 21, 2024

There are some CI failures that might be relevant though.

This is the failure caused by microsoft/msquic#4132. Once microsoft/msquic#4145 is merged I can add a version check which will make the test pass again.

System.Net.Quic.Tests.MsQuicTests.CertificateCallbackThrowPropagates [FAIL]
Assert.Throws() Failure: Exception type was not an exact match
Expected: typeof(System.Security.Authentication.AuthenticationException)
Actual:   typeof(System.Net.Quic.QuicException)
---- System.Net.Quic.QuicException : Connection handshake was canceled due to the configured timeout of 00:00:10 seconds elapsing.
-------- System.OperationCanceledException : The operation was canceled.

@CarnaViire
Copy link
Member

CarnaViire commented Feb 21, 2024

This is the failure caused by microsoft/msquic#4132. Once microsoft/msquic#4145 is merged I can add a version check which will make the test pass again.

Then we need to either create an issue on updating the version check after MsQuic is updated (I think it's not just a matter of merging the PR, we need a new release, right?), and disable the test; or we need to put this PR into draft and "do-not-merge" to block until we can address it here.

UPD: Ah, I see that "no merge" is already here; but the label is not checked as part of CI, and it's not easy to notice it 🙈

@CarnaViire CarnaViire changed the title Don't call user callbacks on MsQuic worker thread. [NO-MERGE] Don't call user callbacks on MsQuic worker thread. Feb 21, 2024
@CarnaViire
Copy link
Member

@rzikm I'll go ahead and mark it as draft while it's blocked -- so that it will not appear in the stale PRs list (and will not get accidentally merged)

@rzikm rzikm force-pushed the 98039-QuicConnection-Validate-certificates-asynchronously branch from 52cc33b to 9fdb790 Compare February 26, 2024 15:03
@rzikm
Copy link
Member Author

rzikm commented Feb 26, 2024

microsoft/msquic#4132 was fixed, targeting 2.4 release, so I added version check to do the asynchronous validation only on 2.4 and higher. We can remove the version check once we update minimum required MsQuic version.

@rzikm rzikm marked this pull request as ready for review February 26, 2024 15:05
@rzikm rzikm removed the NO-MERGE The PR is not ready for merge yet (see discussion for detailed reasons) label Feb 26, 2024
@wfurt
Copy link
Member

wfurt commented Feb 26, 2024

MsQuicTests.CertificateCallbackThrowPropagates failure seems related.

@rzikm rzikm changed the title [NO-MERGE] Don't call user callbacks on MsQuic worker thread. Don't call user callbacks on MsQuic worker thread. Feb 27, 2024
@rzikm
Copy link
Member Author

rzikm commented Feb 27, 2024

Interesting, the test should not be failing because it should be still running inline. I can't even reproduce the failures on my machine with the same msquic versions as in the test runs

@rzikm
Copy link
Member Author

rzikm commented Feb 27, 2024

Looks like the failure condition is still there even if we invoke the callback synchronously, so we need to return the result directly from the handler function.

Comment on lines +581 to 585
var task = _sslConnectionOptions.StartAsyncCertificateValidation((IntPtr)data.Certificate, (IntPtr)data.Chain);
if (task.IsCompletedSuccessfully)
{
return _sslConnectionOptions.ValidateCertificate((QUIC_BUFFER*)data.Certificate, (QUIC_BUFFER*)data.Chain, out _remoteCertificate);
}
catch (Exception ex)
{
_connectedTcs.TrySetException(ex);
return QUIC_STATUS_HANDSHAKE_FAILURE;
return task.Result ? QUIC_STATUS_SUCCESS : QUIC_STATUS_BAD_CERTIFICATE;
}
Copy link
Member Author

@rzikm rzikm Feb 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

once we update to 2.4, we can remove the if and change the StartAsyncCertificateValidation to async void.

{
wrapException = false;
if (_isClient)
{
throw new AuthenticationException(SR.net_quic_cert_custom_validation);
}

status = QUIC_STATUS_USER_CANCELED;
result = QUIC_TLS_ALERT_CODES.BAD_CERTIFICATE;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we losing now the specific errors we were returning? Like discerning user_cancelled? Does it matter (affects the other side and what info they get), or not?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MsQuic does not use the specific error code value, it only checks for PENDING for async validation, SUCCESS for accept, and any error simply means reject.

With the async validation, we will actually be able to select the right TLS alert code which will go out over the wire, If the user validation callback throws, this will default to USER_CANCELLED as before, but if the callback returns false then it becomes BAD_CERTIFICATE. We can perhaps be more specific when no custom validation is present (there are alerts for stuff like UNTRUSTED_CERTIFICATE and similar) but I don't think we do that even in SslStream on some platforms.

@rzikm
Copy link
Member Author

rzikm commented Feb 28, 2024

/azp run runtime-libraries-coreclr outerloop

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@rzikm
Copy link
Member Author

rzikm commented Feb 28, 2024

CI Failures are known and unrelated

@rzikm rzikm merged commit 56af107 into dotnet:main Feb 28, 2024
105 of 119 checks passed
@github-actions github-actions bot locked and limited conversation to collaborators Mar 30, 2024
@karelz karelz added this to the 9.0.0 milestone May 14, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

QuicConnection: Validate certificates asynchronously.
6 participants