Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dns.GetHostAddressesAsync gets stuck on Windows #63552

Closed
MartyIX opened this issue Jan 9, 2022 · 12 comments · Fixed by #63904
Closed

Dns.GetHostAddressesAsync gets stuck on Windows #63552

MartyIX opened this issue Jan 9, 2022 · 12 comments · Fixed by #63904
Assignees
Labels
area-System.Net bug tenet-reliability Reliability/stability related issue (stress, load problems, etc.)
Milestone

Comments

@MartyIX
Copy link
Contributor

MartyIX commented Jan 9, 2022

Description

The following program is supposed to resolve multiple hostNameOrAddresses by Dns.GetHostAddressesAsync in parallel. However, the program gets stuck when run on Windows 10 (or 11) but it works as expected on the latest Ubuntu.

Reproduction Steps

App.csproj

<Project Sdk="Microsoft.NET.Sdk">
	<PropertyGroup>
		<OutputType>Exe</OutputType>
		<TargetFramework>net6.0</TargetFramework>
		<ImplicitUsings>enable</ImplicitUsings>
		<Nullable>enable</Nullable>
	</PropertyGroup>
</Project>

Program.cs

using System.Net;

namespace DnsCancelling;

public class Program
{
    public static async Task Main(string[] args)
    {
        string invalidAddress = "59FB::1005:CC57::";
        if (args.Length == 0)
        {
            bool result = await ResolveManyAsync(invalidAddress).ConfigureAwait(false);
            Console.WriteLine("Result 1 is {0}.", result);
            Console.WriteLine();
        }

        string[] addresses = new[]
        {
            invalidAddress,
            "localhost",
        };

        bool result2 = await ResolveManyAsync(addresses).ConfigureAwait(false);
        Console.WriteLine("Result 2 is {0}.", result2);
    }

    private static async Task<bool> ResolveManyAsync(params string[] addresses)
    {
        using CancellationTokenSource cts = new();

        Console.WriteLine("Going to resolve {0} addresses.", addresses.Length);
        List<Task<bool>> resolveTasks = new(capacity: addresses.Length);
        foreach (string address in addresses)
        {
            Task<bool> resolveTask = ResolveOneAsync(address, cts);
            resolveTasks.Add(resolveTask);
        }

        bool[] results = await Task.WhenAll(resolveTasks).ConfigureAwait(false);
        return results.All(r => r);
    }

    private static async Task<bool> ResolveOneAsync(string address, CancellationTokenSource cancellationTokenSource)
    {
        bool result = true;

        IPAddress[] ipAddresses;
        try
        {
            Console.WriteLine("Resolving address '{0}'.", address);
            ipAddresses = await Dns.GetHostAddressesAsync(address, cancellationTokenSource.Token).ConfigureAwait(false);
            if (ipAddresses.Length > 0) Console.WriteLine("'{0}' resolved to {1}.", address, ipAddresses[0]);
            else Console.WriteLine("'{0}' can not be resolved.", address);
        }
        catch (Exception e)
        {
            Console.WriteLine("Canceling all due to an exception when resolving '{0}': {1}", address, e.Message);
            cancellationTokenSource.Cancel();
            Console.WriteLine("Cancellation complete.");
            result = false;
        }

        return result;
    }
}

Run the sample using dotnet build && dotnet run using CLI.

Expected behavior

The program should terminate. I verified that the program terminates on the latest Ubuntu version.

Actual behavior

The program gets stuck and never finishes when it is run on Windows.

Regression?

No response

Known Workarounds

No response

Configuration

  • .NET 6
  • Windows 10, 11.

Other information

No response

@dotnet-issue-labeler dotnet-issue-labeler bot added the untriaged New issue has not been triaged by the area owner label Jan 9, 2022
@dotnet-issue-labeler
Copy link

I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label.

@ghost
Copy link

ghost commented Jan 9, 2022

Tagging subscribers to this area: @dotnet/ncl
See info in area-owners.md if you want to be subscribed.

Issue Details

Description

The following program is supposed to resolve multiple hostNameOrAddresses by Dns.GetHostAddressesAsync in parallel. However, the program gets stuck when run on Windows 10 (or 11) but it works as expected on the latest Ubuntu.

Reproduction Steps

App.csproj

<Project Sdk="Microsoft.NET.Sdk">
	<PropertyGroup>
		<OutputType>Exe</OutputType>
		<TargetFramework>net6.0</TargetFramework>
		<ImplicitUsings>enable</ImplicitUsings>
		<Nullable>enable</Nullable>
	</PropertyGroup>
</Project>

Program.cs

using System.Net;

namespace DnsCancelling;

public class Program
{
    public static async Task Main(string[] args)
    {
        string invalidAddress = "59FB::1005:CC57::";
        if (args.Length == 0)
        {
            bool result = await ResolveManyAsync(invalidAddress).ConfigureAwait(false);
            Console.WriteLine("Result 1 is {0}.", result);
            Console.WriteLine();
        }

        string[] addresses = new[]
        {
            invalidAddress,
            "localhost",
        };

        bool result2 = await ResolveManyAsync(addresses).ConfigureAwait(false);
        Console.WriteLine("Result 2 is {0}.", result2);
    }

    private static async Task<bool> ResolveManyAsync(params string[] addresses)
    {
        using CancellationTokenSource cts = new();

        Console.WriteLine("Going to resolve {0} addresses.", addresses.Length);
        List<Task<bool>> resolveTasks = new(capacity: addresses.Length);
        foreach (string address in addresses)
        {
            Task<bool> resolveTask = ResolveOneAsync(address, cts);
            resolveTasks.Add(resolveTask);
        }

        bool[] results = await Task.WhenAll(resolveTasks).ConfigureAwait(false);
        return results.All(r => r);
    }

    private static async Task<bool> ResolveOneAsync(string address, CancellationTokenSource cancellationTokenSource)
    {
        bool result = true;

        IPAddress[] ipAddresses;
        try
        {
            Console.WriteLine("Resolving address '{0}'.", address);
            ipAddresses = await Dns.GetHostAddressesAsync(address, cancellationTokenSource.Token).ConfigureAwait(false);
            if (ipAddresses.Length > 0) Console.WriteLine("'{0}' resolved to {1}.", address, ipAddresses[0]);
            else Console.WriteLine("'{0}' can not be resolved.", address);
        }
        catch (Exception e)
        {
            Console.WriteLine("Canceling all due to an exception when resolving '{0}': {1}", address, e.Message);
            cancellationTokenSource.Cancel();
            Console.WriteLine("Cancellation complete.");
            result = false;
        }

        return result;
    }
}

Expected behavior

The program gets stuck and never finishes when it is run on Windows.

Actual behavior

The program should terminate. I verified that the program terminates on the latest Ubuntu version.

Regression?

No response

Known Workarounds

No response

Configuration

Tested on:

  • .NET 6
  • Windows 10, 11.

Other information

No response

Author: MartyIX
Assignees: -
Labels:

area-System.Net, untriaged

Milestone: -

@tomrus88
Copy link

tomrus88 commented Jan 9, 2022

The program gets stuck and never finishes when it is run on Windows.

I can't repro, program finishes nearly instantly for me on Win11.

@MartyIX
Copy link
Contributor Author

MartyIX commented Jan 9, 2022

@tomrus88 Ah, I forgot to add that I run the program using dotnet build && dotnet run in cmd / Terminal (maybe ideally using Release configuration). In VS2022, I cannot reproduce it either.

@MartyIX
Copy link
Contributor Author

MartyIX commented Jan 10, 2022

Also it's good idea to run the sample 10 times in a row.

@stephentoub
Copy link
Member

stephentoub commented Jan 10, 2022

@dotnet/ncl, took a quick look, and it seems likely this is a deadlock caused by lock inversion between a lock we take in managed and a critical section used in the Windows implementation:
image

@stephentoub stephentoub added bug tenet-reliability Reliability/stability related issue (stress, load problems, etc.) labels Jan 10, 2022
@geoffkizer
Copy link
Contributor

It certainly does look like that. That said, I'm surprised that the GetAddrInfoExCancel call is blocking waiting on the callback to complete.

Regardless, it seems like we need to avoid taking our managed lock during the callback.

@antonfirsov
Copy link
Member

antonfirsov commented Jan 10, 2022

Would it remove the need for the lock, if we used _cancellationRegistration.Dispose() instead of _cancellationRegistration.Unregister()? Dispose should wait for outstanding cancellation callbacks.

_cancellationRegistration = cancellationToken.UnsafeRegister(o =>
{
var @this = (GetAddrInfoExState)o!;
int cancelResult = 0;
lock (@this)
{
GetAddrInfoExContext* context = @this._cancellationContext;
if (context != null)
{
// An outstanding operation will be completed with WSA_E_CANCELLED, and GetAddrInfoExCancel will return NO_ERROR.
// If this thread has lost the race between cancellation and completion, this will be a NOP
// with GetAddrInfoExCancel returning WSA_INVALID_HANDLE.
cancelResult = Interop.Winsock.GetAddrInfoExCancel(&context->CancelHandle);
}
}
if (cancelResult != 0 && cancelResult != Interop.Winsock.WSA_INVALID_HANDLE && NetEventSource.Log.IsEnabled())
{
NetEventSource.Info(@this, $"GetAddrInfoExCancel returned error {cancelResult}");
}
}, this);

public CancellationToken UnregisterAndGetCancellationToken()
{
lock (this)
{
_cancellationContext = null;
_cancellationRegistration.Unregister();
}
return _cancellationRegistration.Token;
}

/cc @scalablecory as author of #33420.

@karelz
Copy link
Member

karelz commented Jan 11, 2022

Triage: Deadlock in new API in .NET 6.0. We should fix it and backport it to .NET 6.0 servicing if we have customers hitting it.
@MartyIX is it affecting your production in any way?

@karelz karelz added this to the 7.0.0 milestone Jan 11, 2022
@karelz karelz removed the untriaged New issue has not been triaged by the area owner label Jan 11, 2022
@MartyIX
Copy link
Contributor Author

MartyIX commented Jan 11, 2022

@karelz Our project is not yet released but we would love to use the code snippet in the original post once we are production ready.

@ghost ghost added the in-pr There is an active PR which will close this issue when it is merged label Jan 17, 2022
@karelz karelz modified the milestones: 7.0.0, 6.0.x Jan 18, 2022
@ghost ghost removed the in-pr There is an active PR which will close this issue when it is merged label Mar 21, 2022
@antonfirsov
Copy link
Member

Reopening to track 6.0.* backport. We need a validation with the original console app.

@antonfirsov antonfirsov reopened this Mar 21, 2022
@ghost ghost added the in-pr There is an active PR which will close this issue when it is merged label Mar 31, 2022
@ghost ghost removed the in-pr There is an active PR which will close this issue when it is merged label Apr 13, 2022
@karelz
Copy link
Member

karelz commented Apr 19, 2022

Fixed in 7.0 (main) in PR #63904 and in 6.0.5 in PR #67291.

@karelz karelz closed this as completed Apr 19, 2022
@ghost ghost locked as resolved and limited conversation to collaborators May 19, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-System.Net bug tenet-reliability Reliability/stability related issue (stress, load problems, etc.)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants