Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use AsyncLocal to keep diagnostics instead of ThreadStatic #16602

Closed
wants to merge 5 commits into from

Conversation

majocha
Copy link
Contributor

@majocha majocha commented Jan 29, 2024

Introduction of Transparent Compiler revealed some instability w.r.t. DiagnosticsThreadStatics:
#16576 (comment)
#16576
#16589

Turns out we used [<ThreadStatic>] to keep a compilation-global DiagnosticsLogger, and a custom NodeCode CE to flow it across threads during execution.
AsyncLocal is a BCL class that provides exactly that functionality: flowing state along the async execution path.

This replaces the ThreadStatic diagnostics state for AsyncLocal.
NodeCode<'T> is no longer needed. For now we make it into an alias for Async<'T>, and the node CE is an alias for async builder. To be decided if we want to remove it in this or a separate PR.

Copy link
Contributor

github-actions bot commented Jan 29, 2024

❗ Release notes required


✅ Found changes and release notes in following paths:

Change path Release notes path Description
src/Compiler docs/release-notes/.FSharp.Compiler.Service/8.0.300.md

@majocha majocha mentioned this pull request Jan 29, 2024
@majocha
Copy link
Contributor Author

majocha commented Jan 29, 2024

Ok, I replaced it with AsyncLocal and for now tried to noop NodeCode. Let's see what breaks.
I have no idea if AsyncLocal works with F# async tbh.

@vzarytovskii
Copy link
Member

Ok, I replaced it with AsyncLocal and for now tried to noop NodeCode. Let's see what breaks. I have no idea if AsyncLocal works with F# async tbh.

I am not sure tbh. AsyncLocal manages locals via ExecutionContext, which is somewhat tied to the thread. I would expect it not to rely in the IAsyncStateMachine or anything task-related.

References: https://source.dot.net/#System.Private.CoreLib/src/libraries/System.Private.CoreLib/src/System/Threading/AsyncLocal.cs,ef9ce034697240ba
https://source.dot.net/#System.Private.CoreLib/src/libraries/System.Private.CoreLib/src/System/Threading/ExecutionContext.cs,33788b02dd361e94

@majocha
Copy link
Contributor Author

majocha commented Jan 29, 2024

This looks promising. Only one test fails.

@0101
Copy link
Contributor

0101 commented Jan 29, 2024

I will be amazed if this works.

🤞

@majocha
Copy link
Contributor Author

majocha commented Jan 29, 2024

That one error indicates a case where the AsyncLocal fails us where the normal ThreadStatic worked. It's worth it to dive to the bottom of this:

let ``In-memory cross-project references to projects using generative type provides should fallback to on-disk references`` () =

@vzarytovskii
Copy link
Member

@majocha getting rid of additional builder abstractions which are essentially just async is a good call I think. Wanted to do it for some time. Wondering if @0101 has any objections to that?

It will make it even easier to move to some state-machines based ones in future (or at least experiment with those).

@0101
Copy link
Contributor

0101 commented Jan 29, 2024

Definitely no objections. The whole point of it was just managing the thread statics. If we can solve it without the extra builder it will be a great improvement.

@majocha
Copy link
Contributor Author

majocha commented Jan 29, 2024

That one test fail is possibly something related to type providers in the checker.

@majocha
Copy link
Contributor Author

majocha commented Jan 29, 2024

Comparing the test in question with main, there are two diagnostics missing that should be there:

Error opening binary file 'E:\repos\fsharp\tests\service/DUMMY/TestProject.dll': Could not find a part of the path 'E:\repos\fsharp\tests\service\DUMMY\TestProject.dll'.

and

Problem reading assembly 'E:\repos\fsharp\tests\service/DUMMY/TestProject.dll': The exception has been reported. This internal exception should now be caught at an error recovery point on the stack. Original message: Error opening binary file 'E:\repos\fsharp\tests\service/DUMMY/TestProject.dll': Could not find a part of the path 'E:\repos\fsharp\tests\service\DUMMY\TestProject.dll'.)

That's some starting point for figuring this out.

@0101
Copy link
Contributor

0101 commented Jan 29, 2024

Looks like after the change the errors go to FinalizeTypeCheckTask logger instead of CombineImportedAssembliesTask one.

Before (works):
image

After (diagnostics lost):
image

@vzarytovskii
Copy link
Member

@majocha If you know which test are failing (or were failing rather) specifically, I can run them tomorrow on my Mac in the loop, they're quite fast here.

@majocha
Copy link
Contributor Author

majocha commented Jan 29, 2024

@vzarytovskii the original test is fixed now, with the holder. A few others popped up. I'll take a closer look later.

@vzarytovskii
Copy link
Member

Currently failing tests indicate that we are not (re)storing logger correctly somewhere and it hides issues.

@majocha
Copy link
Contributor Author

majocha commented Jan 30, 2024

Actually it looks worse.:
image

Why does the ExecutionContext set null there?

I'm also wondering, the ExecutionContext implementation changed quite a bit over the time:
https://referencesource.microsoft.com/#mscorlib/system/threading/executioncontext.cs,33788b02dd361e94
Which version do we actually have when we target netstandard2.0?

@vzarytovskii
Copy link
Member

Which version do we actually have when we target netstandard2.0?

The one from the target runtime. In VS it would be the one from .NET 4.7.2, in tests - either the same or 8.0.101

@majocha
Copy link
Contributor Author

majocha commented Jan 30, 2024

OK, now I'm stumped.
In a code like this, I would expect the static do and static let to run before the getter executes:

/// Type holds thread-static globals for use by the compiler.
type DiagnosticsThreadStatics =
    static let diagnosticsLogger = new AsyncLocal<DiagnosticsLogger option>()
    static let buildPhase = new AsyncLocal<BuildPhase option>()

    static do
        diagnosticsLogger.Value <- Some AssertFalseDiagnosticsLogger
        buildPhase.Value <- Some BuildPhase.DefaultPhase

    static member BuildPhase
        with get () = buildPhase.Value |> Option.get
        and set v = buildPhase.Value <- Some v

    static member DiagnosticsLogger
        with get () = diagnosticsLogger.Value |> Option.get
        and set v = diagnosticsLogger.Value <- Some v

But it doesn't happen.😶

@vzarytovskii
Copy link
Member

vzarytovskii commented Jan 30, 2024

OK, now I'm stumped. In a code like this, I would expect the static do and static let to run before the getter executes:

/// Type holds thread-static globals for use by the compiler.
type DiagnosticsThreadStatics =
    static let diagnosticsLogger = new AsyncLocal<DiagnosticsLogger option>()
    static let buildPhase = new AsyncLocal<BuildPhase option>()

    static do
        diagnosticsLogger.Value <- Some AssertFalseDiagnosticsLogger
        buildPhase.Value <- Some BuildPhase.DefaultPhase

    static member BuildPhase
        with get () = buildPhase.Value |> Option.get
        and set v = buildPhase.Value <- Some v

    static member DiagnosticsLogger
        with get () = diagnosticsLogger.Value |> Option.get
        and set v = diagnosticsLogger.Value <- Some v

But it doesn't happen.😶

What happens instead?

In the following code:

open System.Threading
type DiagnosticsThreadStatics() =
    static let diagnosticsLogger = new AsyncLocal<int option>()
    static let buildPhase = new AsyncLocal<int option>()

    static do
        diagnosticsLogger.Value <- Some (1)
        buildPhase.Value <- Some (2)

    static member BuildPhase
        with get () = buildPhase.Value |> Option.get
        and set v = buildPhase.Value <- Some v

    static member DiagnosticsLogger
        with get () = diagnosticsLogger.Value |> Option.get
        and set v = diagnosticsLogger.Value <- Some v

Initialisation will roughly happens in this order (via external (likely) static initializer):

DiagnosticsThreadStatics.diagnosticsLogger = new AsyncLocal<FSharpOption<int>>();
DiagnosticsThreadStatics.buildPhase = new AsyncLocal<FSharpOption<int>>();
DiagnosticsThreadStatics.diagnosticsLogger.Value = FSharpOption<int>.Some(1);
DiagnosticsThreadStatics.buildPhase.Value = FSharpOption<int>.Some(2);

Both getter and setter for properties will also have an intrinsic initialisation check, which will throw in case if initialization hasn't been done.

Keep in mind, that it will be executed once in the async context, static constructor is "running". So if you first accessed DiagnosticsThreadStatics on the AsyncContext-1, and then calling getter on AsyncContext-2, it will be a default value (None/null).

@majocha
Copy link
Contributor Author

majocha commented Jan 30, 2024

What happens instead?

Hm, just the debugger somehow does not break. I'm getting nulls just like you predicted so I was wondering what is going on.

@vzarytovskii
Copy link
Member

What happens instead?

Hm, just the debugger somehow does not break. I'm getting nulls just like you predicted so I was wondering what is going on.

Since it's kinda "external" init code, I'm not sure we emit any debug points for it. Can be verified by looking at the PDB info, I think.

@majocha
Copy link
Contributor Author

majocha commented Jan 30, 2024

Thanks @vzarytovskii, I think you clarified it to me now. I've been running in circles about it. This is now green locally, let's see what the CI says. 🤞🤞

@majocha
Copy link
Contributor Author

majocha commented Jan 30, 2024

Nope, still bad.

@majocha majocha changed the title Attempt to fix thread static woes Use AsyncLocal to keep diagnostics compile globals instead of ThreadStatic. Removes NodeCode Jan 31, 2024
@majocha majocha changed the title Use AsyncLocal to keep diagnostics compile globals instead of ThreadStatic. Removes NodeCode Use AsyncLocal to keep diagnostics compile globals instead of ThreadStatic Jan 31, 2024
@majocha majocha changed the title Use AsyncLocal to keep diagnostics compile globals instead of ThreadStatic Use AsyncLocal to keep diagnostics instead of ThreadStatic Jan 31, 2024
@majocha majocha marked this pull request as ready for review January 31, 2024 10:34
@majocha majocha requested a review from a team as a code owner January 31, 2024 10:34
@majocha
Copy link
Contributor Author

majocha commented Jan 31, 2024

Sigh, something is not right. The tests pass locally, but the CI stalls.

Is there a difference in tests run time before and after change locally?

Hard to tell, not noticeable, but I admit I run just the FCS usually, or a subset of failing ones, not the full suite. I'll try to run everything later when I'm on a desktop PC.

@dotnet dotnet deleted a comment from github-actions bot Jan 31, 2024
@vzarytovskii
Copy link
Member

vzarytovskii commented Jan 31, 2024

Changes look sane, but I have few concerns/questions.

  1. It touches not only new compiler pipeline, but modifies all of them by changing how wrapping works (well, non-wrapping now).
  2. Because of 1., I am a bit concerned about how will that work out in fully-async (as in non-task async) context, like FSAC, whether AsyncLocal will still work correctly there. Concern here is that in both tests and VS we launch things from within Task most of the times, whereas FSAC has fully Control.Async based stack.
  3. A wild idea (maybe not for this PR though?) - we should probably do the OTEL logging for the build phase and diagnostic loggers, including what do we set and get. By default its noop, but can be helpful when diagnosing some issues.

@vzarytovskii
Copy link
Member

vzarytovskii commented Jan 31, 2024

To illustrate, in "pure" async stack, when we run async (RunSycnronously and co) we pretty much queue computation via trampoline (one way or another):
image

However, from my understanding, if we are launched by tasks scheduler (via StartAsTask, as we do in VS and tests), we do use tasks directly, which will make AsyncLocal work as expected.

@majocha
Copy link
Contributor Author

majocha commented Jan 31, 2024

I did a quick experiment console app:

open System.Threading
open System.Collections.Concurrent

type Holder() =
    let data = ConcurrentQueue()
    member _.Add v = data.Enqueue v
    member _.GetAll = data |> List.ofSeq

module Locals =
    let a = new AsyncLocal<_>()
    let init () = a.Value <- Holder()

Locals.init()

let work tag =
    async {
        do! Async.Sleep 10
        let a = Locals.a.Value
        for i in 1 .. 5 do
            a.Add $"{tag} {i}"
            do! Async.Sleep 10
        return a.GetAll
    }

let result = ["A"; "B"; "C"] |> Seq.map work |> Async.Parallel |> Async.RunSynchronously

printfn "%A" Locals.a.Value.GetAll

It seems to work correctly:

["A 1"; "B 1"; "C 1"; "B 2"; "A 2"; "C 2"; "C 3"; "B 3"; "A 3"; "A 4"; "B 4";
 "C 4"; "C 5"; "B 5"; "A 5"]

@majocha
Copy link
Contributor Author

majocha commented Jan 31, 2024

My understanding it that ExecutionContext operates on ThreadPool level, and does not depend on tasks in any way.

@vzarytovskii
Copy link
Member

vzarytovskii commented Jan 31, 2024

My understanding it that ExecutionContext operates on ThreadPool level, and does not depend on tasks in any way.

Yeah, that's what I was trying to understand. Regarding the example above, what if you will mix in do! Async.SwitchToNewThread() and some |> AwaitTask

We use the former a lot in stack guards, which can lead to interesting consequences.

@majocha
Copy link
Contributor Author

majocha commented Jan 31, 2024

OK, my plan to deal with this is to reset everything, leave NodeCodeBuilder in place and only do the minimal changes required, i.e. DiagnosticsLogger.fs, the test fix in CompilerImports.fs.
main testFSharpQA took 14m 38s on my PC. I'll build and run this branch.

@majocha
Copy link
Contributor Author

majocha commented Jan 31, 2024

open System.Threading
open System.Threading.Tasks
open System.Collections.Concurrent

type Holder(name) =
    let data = ConcurrentQueue()
    member _.Add v = data.Enqueue v
    member _.GetAll = name, data |> List.ofSeq

module Locals =
    let a = new AsyncLocal<_>()
    let init () = a.Value <- Holder("init")

Locals.init()

let t tag i = 
    task {
        do! Task.Delay 10
        Locals.a.Value.Add $"{tag} {i}"
    }

let work tag =
    async {
        do! Async.Sleep 10
        Locals.a.Value <- Holder(tag)
        let a = Locals.a.Value
        for i in 1 .. 5 do
            do! Async.SwitchToNewThread()
            do! t tag i |> Async.AwaitTask
            do! Async.Sleep 10
        return a.GetAll
    }

let result = ["A"; "B"; "C"] |> Seq.map work |> Async.Parallel |> Async.RunSynchronously

printfn "%A" result

printfn "%A" Locals.a.Value.GetAll

still works fine, even in FSI.

@majocha
Copy link
Contributor Author

majocha commented Jan 31, 2024

I think I got it to pass CI: #16634

Is it ok to force push here?

@vzarytovskii
Copy link
Member

I think I got it to pass CI: #16634

Is it ok to force push here?

In the branch? Sure thing.

@majocha
Copy link
Contributor Author

majocha commented Jan 31, 2024

OK, diff looks reasonably small, fingers crossed.

@0101
Copy link
Contributor

0101 commented Feb 1, 2024

This looks really good! I installed it now and will be trying it out.

Wonder if we could come up with some meaningful tests for it. The ProjectWorkflowBuilder produces an Async workflow which can be executed with RunSynchronously (which happens by default) or any other means (specifying autoStart=false). And it can be run either with TransparentCompiler or IncrementalBuilder. We'd just have to come up with a scenario that would stress the diagnostic system somewhat.

@majocha
Copy link
Contributor Author

majocha commented Feb 1, 2024

NodeCode test we have does it to some extent. There are also some AsyncMemoize tests and BuildGraph tests using DiagnosticsThreadStatics, but a dedicated test would be good.

@majocha
Copy link
Contributor Author

majocha commented Feb 1, 2024

I'm also working on removal of node from the code base, but I'm hitting the same problem that showed up in this PR:
CI grinds to a halt while locally all tests are fine.
I have a suspicion that I'm trying to test now.

@vzarytovskii
Copy link
Member

@0101 does this test failure ring a bell by any chance?

https://dev.azure.com/dnceng-public/public/_build/results?buildId=550637&view=ms.vss-test-web.build-test-results-tab&runId=13090916&resultId=100659&paneView=debug

I recall seeing something similar. If it's just flaky, then we have to either figure out what's going on with it or disable it.

@0101
Copy link
Contributor

0101 commented Feb 2, 2024

@0101 does this test failure ring a bell by any chance?

https://dev.azure.com/dnceng-public/public/_build/results?buildId=550637&view=ms.vss-test-web.build-test-results-tab&runId=13090916&resultId=100659&paneView=debug

I recall seeing something similar. If it's just flaky, then we have to either figure out what's going on with it or disable it.

Yeah it does. If we had merged this #16010 then we'd see what's missing there.

@majocha
Copy link
Contributor Author

majocha commented Feb 2, 2024

I think before we use this, we must carefully think through and test how this behaves in parallel execution. The one initial test failure in MultiProjectAnalysisTests is indicative of different behavior and although I fixed it in-place, it doesn't fix that behavior in general.

It's possible we need to add something like ExecutionContext.SuppressFlow to any Async.Parallel invocations.

@majocha
Copy link
Contributor Author

majocha commented Feb 4, 2024

The problems are sorted out in #16645. We can either take this one or just go with #16645.

@T-Gro
Copy link
Member

T-Gro commented Feb 8, 2024

I propose to close this PR and continue with #16645 only.

@T-Gro T-Gro closed this Feb 8, 2024
@majocha majocha mentioned this pull request May 9, 2024
2 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

5 participants