Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

QUIC: AcceptStream_ConnectionAborted_ByClient_Throw is failing randomly #55242

Closed
wfurt opened this issue Jul 7, 2021 · 6 comments · Fixed by #55291 or #56590
Closed

QUIC: AcceptStream_ConnectionAborted_ByClient_Throw is failing randomly #55242

wfurt opened this issue Jul 7, 2021 · 6 comments · Fixed by #55291 or #56590
Assignees
Labels
area-System.Net.Quic disabled-test The test is disabled in source code against the issue os-linux Linux OS (any supported distro) tracking-external-issue The issue is caused by external problem (e.g. OS) - nothing we can do to fix it directly
Milestone

Comments

@wfurt
Copy link
Member

wfurt commented Jul 7, 2021

This is one of disabled tests mentioned in #49157.
The test is pretty simple as it closed since with particular error and it expects peer to see it.
Note that the RunClientServer has synchronization so the connection should not be touched/Disposed until both test sides are done.

        [Fact]
        public async Task AcceptStream_ConnectionAborted_ByClient_Throws()
        {
            const int ExpectedErrorCode = 1234;

            using var sync = new SemaphoreSlim(0);

            await RunClientServer(
                async clientConnection =>
                {
                    await clientConnection.CloseAsync(ExpectedErrorCode);
                    sync.Release();
                },
                async serverConnection =>
                {
                    await sync.WaitAsync();
                    QuicConnectionAbortedException ex = await Assert.ThrowsAsync<QuicConnectionAbortedException>(() => serverConnection.AcceptStreamAsync().AsTask());
                    Assert.Equal(ExpectedErrorCode, ex.ErrorCode);
                });
        }

In good case the event sequence is:

  [Connection#56140151] received event IDEAL_PROCESSOR_CHANGED
  [Connection#42931033] received event PEER_CERTIFICATE_RECEIVED
  [Connection#42931033] received event PEER_CERTIFICATE_RECEIVED
  [Connection#42931033] received event PEER_CERTIFICATE_RECEIVED
  [Connection#42931033] received event STREAMS_AVAILABLE
  [Connection#42931033] received event DATAGRAM_STATE_CHANGED
  [Connection#42931033] received event CONNECTED
  [Connection#42931033] received event IDEAL_PROCESSOR_CHANGED
  [Connection#56140151] received event CONNECTED
  [Connection#56140151] received event SHUTDOWN_INITIATED_BY_PEER
  [Connection#42931033] received event SHUTDOWN_COMPLETE
  [Connection#56140151] received event SHUTDOWN_COMPLETE

for the failure we get SHUTDOWN_INITIATED_BY_TRANSPORT instead:

  [Connection#56140151] received event IDEAL_PROCESSOR_CHANGED
  [Connection#1997173] received event PEER_CERTIFICATE_RECEIVED
  [Connection#1997173] received event PEER_CERTIFICATE_RECEIVED
  [Connection#1997173] received event PEER_CERTIFICATE_RECEIVED
  [Connection#1997173] received event STREAMS_AVAILABLE
  [Connection#1997173] received event DATAGRAM_STATE_CHANGED
  [Connection#1997173] received event CONNECTED
  [Connection#56140151] received event CONNECTED
  [Connection#56140151] received event SHUTDOWN_INITIATED_BY_TRANSPORT
  [Connection#56140151] received event IDEAL_PROCESSOR_CHANGED
  [Connection#1997173] received event SHUTDOWN_COMPLETE
  [Connection#56140151] received event [SHUTDOWN_COMPLETE]

Could you please take a look at attached logs @nibanks and @ThadHouse ???
Note from the code claims:

this started to fail after updating msquic from cc104e836a5d4a5e0d324bc08b42136d2acac997 to 8e21db733f22533a613d404d2a86e64588019132

quic.log.OK.txt
quic.log.fail.txt

@ghost
Copy link

ghost commented Jul 7, 2021

Tagging subscribers to this area: @dotnet/ncl
See info in area-owners.md if you want to be subscribed.

Issue Details

This is one of disabled tests mentioned in #49157.
The test is pretty simple as it closed since with particular error and it expects peer to see it.
Note that the RunClientServer has synchronization so the connection should not be touched/Disposed until both test sides are done.

        [Fact]
        public async Task AcceptStream_ConnectionAborted_ByClient_Throws()
        {
            const int ExpectedErrorCode = 1234;

            using var sync = new SemaphoreSlim(0);

            await RunClientServer(
                async clientConnection =>
                {
                    await clientConnection.CloseAsync(ExpectedErrorCode);
                    sync.Release();
                },
                async serverConnection =>
                {
                    await sync.WaitAsync();
                    QuicConnectionAbortedException ex = await Assert.ThrowsAsync<QuicConnectionAbortedException>(() => serverConnection.AcceptStreamAsync().AsTask());
                    Assert.Equal(ExpectedErrorCode, ex.ErrorCode);
                });
        }

In good case the event sequence is:

  [Connection#56140151] received event IDEAL_PROCESSOR_CHANGED
  [Connection#42931033] received event PEER_CERTIFICATE_RECEIVED
  [Connection#42931033] received event PEER_CERTIFICATE_RECEIVED
  [Connection#42931033] received event PEER_CERTIFICATE_RECEIVED
  [Connection#42931033] received event STREAMS_AVAILABLE
  [Connection#42931033] received event DATAGRAM_STATE_CHANGED
  [Connection#42931033] received event CONNECTED
  [Connection#42931033] received event IDEAL_PROCESSOR_CHANGED
  [Connection#56140151] received event CONNECTED
  [Connection#56140151] received event SHUTDOWN_INITIATED_BY_PEER
  [Connection#42931033] received event SHUTDOWN_COMPLETE
  [Connection#56140151] received event SHUTDOWN_COMPLETE

for the failure we get SHUTDOWN_INITIATED_BY_TRANSPORT instead:

  [Connection#56140151] received event IDEAL_PROCESSOR_CHANGED
  [Connection#1997173] received event PEER_CERTIFICATE_RECEIVED
  [Connection#1997173] received event PEER_CERTIFICATE_RECEIVED
  [Connection#1997173] received event PEER_CERTIFICATE_RECEIVED
  [Connection#1997173] received event STREAMS_AVAILABLE
  [Connection#1997173] received event DATAGRAM_STATE_CHANGED
  [Connection#1997173] received event CONNECTED
  [Connection#56140151] received event CONNECTED
  [Connection#56140151] received event SHUTDOWN_INITIATED_BY_TRANSPORT
  [Connection#56140151] received event IDEAL_PROCESSOR_CHANGED
  [Connection#1997173] received event SHUTDOWN_COMPLETE
  [Connection#56140151] received event [SHUTDOWN_COMPLETE]

Could you please take a look at attached logs @nibanks and @ThadHouse ???
Note from the code claims:

this started to fail after updating msquic from cc104e836a5d4a5e0d324bc08b42136d2acac997 to 8e21db733f22533a613d404d2a86e64588019132

quic.log.OK.txt
quic.log.fail.txt

Author: wfurt
Assignees: -
Labels:

area-System.Net.Quic

Milestone: -

@dotnet-issue-labeler dotnet-issue-labeler bot added the untriaged New issue has not been triaged by the area owner label Jul 7, 2021
@nibanks
Copy link

nibanks commented Jul 7, 2021

Just from the issue description, it seems like you are racing the shutdown with something. If you sometimes get app shutdown and sometimes transport shutdown that means you are likely shutting down right around handshake complete.

@wfurt
Copy link
Member Author

wfurt commented Jul 7, 2021

I assume the handshake complete is the CONNECTED event? And yes, the test will come connection after it is established using CloseAsync e.g. ShutdownAsync(QUIC_CONNECTION_SHUTDOWN_FLAGS.NONE, errorCode) .
The question is why the behavior is not predictable?

@nibanks
Copy link

nibanks commented Jul 7, 2021

Before I look at logs, would you mind trying the absolute latest (from today)? I actually just made some changes in that area.

@wfurt
Copy link
Member Author

wfurt commented Jul 7, 2021

looks promising @nibanks. I did ~ 300 test iterations and I did not see any failure.

@ghost ghost added the in-pr There is an active PR which will close this issue when it is merged label Jul 7, 2021
@ghost ghost removed the in-pr There is an active PR which will close this issue when it is merged label Jul 8, 2021
@wfurt
Copy link
Member Author

wfurt commented Jul 11, 2021

It is failing CI on Linux https://helixre8s23ayyeko0k025g8.blob.core.windows.net/dotnet-runtime-refs-pull-55424-merge-d316cd096c8e4b02b0/System.Net.Quic.Functional.Tests/console.c3c15e89.log?sv=2019-07-07&se=2021-07-31T02%3A18%3A16Z&sr=c&sp=rl&sig=7jvKN%2B7OoTI22YqQVhZ47HBJqzyoLWWCSoH4%2FqOUfpc%3D

This is because we do not have good mechanism in place to update MsQuic on Linux machines.
I will disable the test on Linux and we can re-enable when we have automation for updates or when we have another package drop.

@wfurt wfurt added the disabled-test The test is disabled in source code against the issue label Jul 11, 2021
@wfurt wfurt reopened this Jul 11, 2021
@ManickaP ManickaP removed the untriaged New issue has not been triaged by the area owner label Jul 12, 2021
@ManickaP ManickaP added this to the 6.0.0 milestone Jul 12, 2021
@wfurt wfurt added os-linux Linux OS (any supported distro) tracking-external-issue The issue is caused by external problem (e.g. OS) - nothing we can do to fix it directly labels Jul 18, 2021
@wfurt wfurt self-assigned this Jul 29, 2021
@ghost ghost added the in-pr There is an active PR which will close this issue when it is merged label Jul 29, 2021
karelz pushed a commit that referenced this issue Jul 30, 2021
With update updated MsQuic on Linux in place (PR #56429) we should pass few more disabled tests.
I did ~200 local runs and I did not see any problems

Fixes #55242
Fixes #55746
@ghost ghost removed the in-pr There is an active PR which will close this issue when it is merged label Jul 30, 2021
@ghost ghost locked as resolved and limited conversation to collaborators Aug 29, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-System.Net.Quic disabled-test The test is disabled in source code against the issue os-linux Linux OS (any supported distro) tracking-external-issue The issue is caused by external problem (e.g. OS) - nothing we can do to fix it directly
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants