Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deadlock during Resolve #391

Closed
Artemigos opened this issue Apr 8, 2021 · 19 comments
Closed

Deadlock during Resolve #391

Artemigos opened this issue Apr 8, 2021 · 19 comments
Assignees
Labels
bug Something isn't working
Milestone

Comments

@Artemigos
Copy link

I've run into a weird deadlock when resolving one of the services manually.

First short background - we're using DryIoc in a WPF app, and it's working very nicely for us right now. As part of some concept work I'm trying to run the code as a netcoreapp3.1 console app. All of our registration code is being reused for this, so the container contents are very similar to what we have in WPF. The app resolves and configures some things and then one of the resolves deadlocks. I updated to 4.7.5 and it still happens, the call stack looks like this:

System.Private.CoreLib.dll!System.Threading.SpinWait.SpinOnceCore(int sleep1Threshold) Line 197	C#
DryIoc.dll!DryIoc.Scope.WaitForItemIsSet(ImTools.ImMapEntry<object> itemRef) Line 12019	C#
DryIoc.dll!DryIoc.Scope.TryGetOrAddWithoutClosure(int id, DryIoc.IResolverContext resolveContext, FastExpressionCompiler.LightExpression.Expression expr, bool useFec, System.Func<DryIoc.IResolverContext, FastExpressionCompiler.LightExpression.Expression, bool, object> createValue, int disposalOrder) Line 12109	C#
DryIoc.dll!DryIoc.Interpreter.TryInterpretMethodCall(DryIoc.IResolverContext r, FastExpressionCompiler.LightExpression.Expression expr, object paramExprs, object paramValues, DryIoc.Interpreter.ParentLambdaArgs parentArgs, bool useFec, ref object result) Line 3622	C#
DryIoc.dll!DryIoc.Interpreter.TryInterpret(DryIoc.IResolverContext r, FastExpressionCompiler.LightExpression.Expression expr, object paramExprs, object paramValues, DryIoc.Interpreter.ParentLambdaArgs parentArgs, bool useFec, out object result) Line 3023	C#
DryIoc.dll!DryIoc.Interpreter.TryInterpretAndUnwrapContainerException(DryIoc.IResolverContext r, FastExpressionCompiler.LightExpression.Expression expr, bool useFec, out object result) Line 2864	C#
DryIoc.dll!DryIoc.Container.ResolveAndCache(int serviceTypeHash, System.Type serviceType, DryIoc.IfUnresolved ifUnresolved) Line 409	C#
DryIoc.dll!DryIoc.Container.DryIoc.IResolver.Resolve(System.Type serviceType, DryIoc.IfUnresolved ifUnresolved) Line 356	C#
DryIoc.dll!DryIoc.Resolver.Resolve(DryIoc.IResolver resolver, System.Type serviceType) Line 8008	C#

I was trying to figure out what's causing this so I started reading DryIoc code and one method in particular looks suspicious:

image

The deadlock happens inside the WaitForItemIsSet method and I don't understand how it's ever going to get set. Is this maybe a bug?

@dadhi
Copy link
Owner

dadhi commented Apr 8, 2021

@Artemigos Got it. I will check.

@dadhi
Copy link
Owner

dadhi commented Apr 8, 2021

@Artemigos How many threads are accessing this part?
Is it just the UI and background thread?

Do you have some recursive dependencies, I mean by design?

The only thing I can imagine (now) to deadlock on this wait is the one thread failing to resolve scoped dependecy for any reason, so not setting the otherItemRef into any value. Fixing the reason for this fail should fix the deadlock.

@Artemigos
Copy link
Author

This is happening in a console app but with async Task Main() - the Resolve runs on a different thread already, because it's a continuation after a few awaits. The main thread is blocked waiting for the tasks to finish. I'm not actually doing anything in parallel, so exactly one thing is hitting DryIoc. Also the deadlock is consistent - happens every time.

There are no recursive dependencies, we had one case of that, but we solved it long time ago by doing a lazy resolve (manually).

I downgraded do 4.1.4 and with this version I'm getting an exception - one of the dependencies can't be resolved. I'm not sure why this is happening yet, I'll let you know what I find.

Also, I'm curious - where is the Value of the otherItemRef being set?

@dadhi
Copy link
Owner

dadhi commented Apr 8, 2021

I downgraded do 4.1.4 and with this version I'm getting an exception

This is already a good finding.

where is the Value of the otherItemRef being set?

By other thread. But you're saying you don't have one so it is suspicious. Will be staring more.. and I need to introduce some diagnostics for such cases.

@Artemigos
Copy link
Author

I was missing a project reference, so the dependency was actually missing. After upgrading to 4.7.5 again it works, so I can solve my problem now. The only thing left is the deadlock when things fail. I'll take a look at the threads a little more, until now I was looking at the "Parallel Stacks" window in VS, which seems to be a simplified view with some special Task handling.

@Artemigos
Copy link
Author

There are no other threads:

image

Main Thread - Main() waiting for tasks
GC Finalizer Thread - not relevant for us
worker thread in SpinWait - this one is deadlocked

@dadhi dadhi self-assigned this Apr 8, 2021
@dadhi
Copy link
Owner

dadhi commented Apr 8, 2021

worker thread in SpinWait - this one is deadlocked

Ok, thanks for the research. As I did not have any other reports on this (besides having the resolution issue as the root cause going in parallel), I am assuming you are doing something complex or unusual. Maybe you can provide more info on the failing dependency configuration or on this "manual lazy resolve" of the recursive dependency. Maybe some code to illustrate?

@Artemigos
Copy link
Author

Tomorrow I might try to reproduce this on some small sample, but it's hard when I don't know what the actual cause is. The weirdest thing we do that I can think of is registering a lot of singletons with asResolutionCall set to true and then the interfaces are a mapping to that singleton.

What is spawning the thread that should be filling the Value that we talked about before?

@dadhi
Copy link
Owner

dadhi commented Apr 8, 2021

What is spawning the thread that should be filling the Value that we talked about before?

Nothing from the DryIoc.
asResolutionCall is the interesting case. Are you using many reconfigurable dependencies?

@dadhi
Copy link
Owner

dadhi commented Apr 8, 2021

@Artemigos

Here is how the single-threaded recursive dependrency may cause the deadlock, but it should be prevented by the DryIoc check prior to the Resolve. So the cause maybe the check is not working in certain conditions.

P10408-205919

@Artemigos
Copy link
Author

By "manual lazy resolve" I just mean that we break the circular chain by doing a Resolve when the service is actually needed instead of injecting into the constructor.

As for the asResolutionCall - we use it almost everywhere. This codebase was using a different container before and to get DryIoc to work with existing code I had to think a little out of the box.

I don't think that a circular dependency is causing this here. After I added the missing project reference and everything got registered, the resolve works as intended, so the error was a missing registration (we have this automated based on a convention, that's why a missing reference can fail silently). I can actually prove this - when I move the Resolve call a few lines earlier in the program I get the exception saying that one of the dependencies cannot be resolved.

I'm trying to narrow this down, but I still can't reproduce this in an isolated console project. For now I have this, but this still works "as intended" (throws an exception):

using System;
using System.Threading.Tasks;
using DryIoc;

namespace ConsoleApp19
{
    public interface IA {}
    public interface IB {}
    public interface IC {}

    public class A : IA
    {
        private IB b;
        public A(IB b) => this.b = b;
    }

    public class C : IC {}

    class Program
    {
        static async Task Main(string[] args)
        {
            var container = new Container(rules => rules
                .With(FactoryMethod.ConstructorWithResolvableArguments)
                .WithoutEagerCachingSingletonForFasterAccess()
                .WithoutThrowOnRegisteringDisposableTransient()
                .WithDefaultIfAlreadyRegistered(IfAlreadyRegistered.Replace));

            container.Register(typeof(A), Reuse.Singleton, setup: Setup.With(asResolutionCall: true));
            container.RegisterMany(new [] { typeof(A), typeof(IA) }, typeof(A), Reuse.Singleton, setup: Setup.With(asResolutionCall: true));
            container.Register(typeof(C), Reuse.Singleton, setup: Setup.With(asResolutionCall: true));
            container.RegisterMany(new [] { typeof(C), typeof(IC) }, typeof(C), Reuse.Singleton, setup: Setup.With(asResolutionCall: true));

            var c = container.Resolve<IC>();

            await Task.Yield();

            var a = container.Resolve<IA>();
        }
    }
}

Maybe this is already a hint for you.

BTW I know it's not necessary to do a registration before the RegisterMany - I'm just trying to mimick what our automated registrations end up doing.

@dadhi
Copy link
Owner

dadhi commented Apr 9, 2021

@Artemigos Thanks for the code. From the first glance, it is nothing suspicious but will be digging further.

Did I understand correctly that you no longer have a deadlock?

@Artemigos
Copy link
Author

Yup, the deadlock is solved for me - providing all the dependencies got rid of the problem.

@Artemigos
Copy link
Author

Found it:

using System;
using DryIoc;

namespace ConsoleApp19
{
    // A -> B -> C -> D(missing)
    //   \----->
    public interface IA {}
    public interface IB {}
    public interface IC {}
    public interface ID {}

    public class A : IA
    {
        private IB b;
        private IC c;

        public A(IC c,IB b)
        {
            this.b = b;
            this.c = c;
        }
    }

    public class B : IB
    {
        private IC c;
        public B(IC c) => this.c = c;
    }

    public class C : IC
    {
        private ID d;
        public C(ID d) => this.d = d;
    }

    class Program
    {
        static void Main(string[] args)
        {
            var container = new Container(rules => rules
                .With(FactoryMethod.ConstructorWithResolvableArguments)
                .WithoutEagerCachingSingletonForFasterAccess()
                .WithoutThrowOnRegisteringDisposableTransient()
                .WithDefaultIfAlreadyRegistered(IfAlreadyRegistered.Replace));

            container.Register(typeof(A), Reuse.Singleton, setup: Setup.With(asResolutionCall: true));
            container.RegisterMany(new [] { typeof(A), typeof(IA) }, typeof(A), Reuse.Singleton, setup: Setup.With(asResolutionCall: true));
            container.Register(typeof(B), Reuse.Singleton, setup: Setup.With(asResolutionCall: true));
            container.RegisterMany(new [] { typeof(B), typeof(IB) }, typeof(B), Reuse.Singleton, setup: Setup.With(asResolutionCall: true));
            container.Register(typeof(C), Reuse.Singleton, setup: Setup.With(asResolutionCall: true));
            container.RegisterMany(new [] { typeof(C), typeof(IC) }, typeof(C), Reuse.Singleton, setup: Setup.With(asResolutionCall: true));

            Resolve(container);
            Resolve(container);
        }

        static void Resolve(IContainer container)
        {
            try
            {
                container.Resolve<IA>();
            }
            catch (ContainerException e)
            {
                Console.WriteLine(e);
            }
        }
    }
}
<Project Sdk="Microsoft.NET.Sdk">

  <PropertyGroup>
    <OutputType>Exe</OutputType>
    <TargetFramework>netcoreapp3.1</TargetFramework>
  </PropertyGroup>

  <ItemGroup>
    <PackageReference Include="DryIoc.dll" Version="4.7.5" />
  </ItemGroup>

</Project>

It deadlocks on the second resolve. The dependency structure is also important.

@dadhi
Copy link
Owner

dadhi commented Apr 9, 2021

@Artemigos Great, I will get it from here.

@dadhi dadhi added the bug Something isn't working label Apr 9, 2021
@dadhi dadhi added this to the 4.7.6 milestone Apr 9, 2021
dadhi added a commit that referenced this issue Apr 9, 2021
@dadhi dadhi closed this as completed in c1c89b6 Apr 12, 2021
@dadhi dadhi reopened this Apr 12, 2021
@dadhi
Copy link
Owner

dadhi commented Apr 12, 2021

@Artemigos

I have provided the fix introducing the waiting timeout because at the moment it is "the better than nothing" solution and maybe actually enough for everyone. Hope that the error message below will provide the clues for the problem reasons

WaitForScopedServiceIsCreatedTimeoutExpired = Of(
    "DryIoc have waited for the creation of the scoped service by 'other party' for the {0} ticks without the success." + NewLine +
    "It means that either the 'other party' is the parallel thread which has started! but unable to finish the creation of the service in the provided amount of time." + NewLine +
    "Or more likely there is an undetected recursive dependency and the 'other party' is the same thread." + NewLine +
    "Another reason may be that the previous scoped service resolution is failed with the exception and the exception was CATCHED " + NewLine +
    "but you are trying to resolve the failed service again." + NewLine + 
    "That's why for all these reasons we have a timeout to prevent the hanging due to the infinite waiting.");

I am open to ideas of error handling and its message improvements.

dadhi added a commit that referenced this issue Apr 12, 2021
dadhi added a commit that referenced this issue Apr 12, 2021
@Artemigos
Copy link
Author

The message looks fine - it's much more informative than a deadlock :)

When doing a "proper" fix I'd probably take inspiration from here: https://github.com/dotnet/runtime/blob/main/src/libraries/System.Private.CoreLib/src/System/Lazy.cs

It seems that exception is also considered a "finished" state for a resolution and I'd suspect that all potential threads will get the same exception. This would probably require some changes deep in DryIoc's thread-safety code, so I'm not sure if that's something you want to do, though.

dadhi added a commit that referenced this issue Apr 12, 2021
@dadhi
Copy link
Owner

dadhi commented Apr 12, 2021

When doing a "proper" fix I'd probably take inspiration from here: https://github.com/dotnet/runtime/blob/main/src/libraries/System.Private.CoreLib/src/System/Lazy.cs

Thanks for the idea. I will probably put it on hold until I get more evidence of the timeout inefficiency.

@dadhi
Copy link
Owner

dadhi commented Apr 21, 2021

@Artemigos I will close it for now but will link the issue from the discussion to not forget.

yallie pushed a commit to yallie/DryIoc that referenced this issue Apr 30, 2021
yallie pushed a commit to yallie/DryIoc that referenced this issue Apr 30, 2021
yallie pushed a commit to yallie/DryIoc that referenced this issue Apr 30, 2021
yallie pushed a commit to yallie/DryIoc that referenced this issue Apr 30, 2021
yallie pushed a commit to yallie/DryIoc that referenced this issue Apr 30, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants