Remove typeinfer lock altogether #46825

pchintalapudi · 2022-09-18T18:40:59Z

This is the last section of code that refers to the old type inference lock, so now it gets its own mutex to itself.

maleadt · 2022-09-18T20:27:58Z

GPUCompiler was using these. Can you explain if and how we need to do locking from there now?

pchintalapudi · 2022-09-18T20:34:08Z

If GPUCompiler is using the typeinf lock for type inference alone, it should no longer need to do that. If it was using it as a proxy for the codegen lock, there currently isn't a replacement, so I can just add the definitions back in (perhaps with a rename to make it more obvious?)

maleadt · 2022-09-19T14:51:15Z

We were also using this to lock codegen. But aren't the thread-safe context/modules enough for that?

pchintalapudi · 2022-09-19T16:27:23Z

Once the LLVM IR has been generated, there should be no need for locks besides the context lock itself. I can't guarantee that all accesses to codeinst fields are safe outside a lock though.

vtjnash

Yep, it is a simple change needed here now, almost NFC. This make the code look cleaner though.

pchintalapudi · 2022-09-21T22:46:59Z

@maleadt Do you still want the lock methods present or should we merge this as-is? Hopefully in a couple of PRs the codegen lock will entirely dissipate away, so it would be good to identify any external dependencies on it.

maleadt · 2022-09-22T13:09:04Z

Do you still want the lock methods present or should we merge this as-is?

You tell me, I really don't know anymore at this point what we should and shouldn't lock in GPUCompiler.jl. FWIW, we used to use the typinf lock to protect everything involving type inference (jl_type_intersection_with_env, jl_specializations_get_linfo) and codegen (jl_create_native). IIUC the former isn't needed anymore, and the latter takes an OrcThreadSafeModule.

Once the LLVM IR has been generated

But that's jl_create_native, which takes an OrcThreadSafeModule, so isn't it using that lock?

pchintalapudi · 2022-10-04T05:59:29Z

we used to use the typinf lock to protect everything involving type inference (jl_type_intersection_with_env, jl_specializations_get_linfo) and codegen (jl_create_native)

I also don't think this is necessary anymore, but it might be safer to drop the lock synchronously with julia dropping the lock in codegen around these functions (#46836) rather than having to do it as part of this PR. So I've added the typeinf lock functions back to avoid breaking GPUCompiler here, though they're uncalled by base.

But that's jl_create_native, which takes an OrcThreadSafeModule, so isn't it using that lock?

Internally create_native will take the lock, but the lock doesn't need to be held while calling the function. Also, the lock is released before function return, and the context lock is sufficient to protect the LLVM IR from the return onwards.

pchintalapudi · 2022-11-18T00:56:49Z

@vtjnash Could you review this PR again? I've added some code to deal with type inference recursion handling which should probably be backported to 1.9.

vtjnash · 2022-11-18T19:24:15Z

src/gf.c

 JL_DLLEXPORT void jl_typeinf_lock_begin(void)
 {
    JL_LOCK(&typeinf_lock);
+    //Although this is claiming to be a typeinfer lock, it is actually
+    //affecting the codegen lock count, not type inference's inferencing count
+    jl_task_t *ct = jl_current_task;
+    ct->reentrant_codegen++;
 }

 JL_DLLEXPORT void jl_typeinf_lock_end(void)
 {
+    jl_task_t *ct = jl_current_task;
+    ct->reentrant_codegen--;
    JL_UNLOCK(&typeinf_lock);
 }


can we instead just remove this now?

Suggested change

JL_DLLEXPORT void jl_typeinf_lock_begin(void)

{

JL_LOCK(&typeinf_lock);

//Although this is claiming to be a typeinfer lock, it is actually

//affecting the codegen lock count, not type inference's inferencing count

jl_task_t *ct = jl_current_task;

ct->reentrant_codegen++;

}

JL_DLLEXPORT void jl_typeinf_lock_end(void)

{

jl_task_t *ct = jl_current_task;

ct->reentrant_codegen--;

JL_UNLOCK(&typeinf_lock);

}

(and related exports in jl_exported_funcs.inc)

Since GPUCompiler is depending on those for type inference, I'd rather keep them around until we completely drop locking around our type inference, just in case we locate any hidden bugs along the way.

vtjnash

SGTM

vtjnash · 2022-11-18T19:27:07Z

But apparently this broke CI in many "interesting" ways however?

pchintalapudi · 2022-11-20T19:36:29Z

But apparently this broke CI in many "interesting" ways however?

I think most of them are now fixed, but why is the analyzegc check newly failing? I don't see any particular reason that the changes would cause a value to become unrooted?

vtjnash · 2022-11-20T22:22:21Z

It is a heuristic-driven pass, so unrelated changes can result in it exploring a portion of the execution graph that it previously skipped (or exploring it differently)

DilumAluthge · 2022-11-24T04:23:37Z

It looks like this PR might have broken CI on master due to a "semantic conflict"? CI was all green on this PR, but when I look at 113efb6 on master, I see failures on i686-w64-mingw32, x86_64-w64-mingw32, and x86_64-apple-darwin. On each platform, the failure is in the misc testset and looks something like this:

Error in testset misc:
--
  | Test Failed at /Users/julia/.julia/scratchspaces/a66863c6-20e8-4ff4-8a62-49f30b1f605e/agent-cache/default-macmini-x64-4.0/build/default-macmini-x64-4-0/julialang/julia-master/julia-113efb6e0a/share/julia/test/misc.jl:355
  | Expression: after_comp >= before_comp
  | Evaluated: 0x6e2cbf1afdb64277 >= 0x9e9cb5ddee20d1af
  | ERROR: LoadError: Test run finished with errors

I have triggered a re-run of each of the failed jobs; let's see if the test failures persist on the re-run.

brenhinkeller · 2022-11-25T21:30:08Z

Seems like PkgEval should have been run on this before merging... and we probably should not be breaking GPUCompiler willy nilly these days even if this didn't also break CI

jpsamaroo · 2022-11-25T22:23:25Z

and we probably should not be breaking GPUCompiler willy nilly these days

If type inference is now thread-safe, then we shouldn't need to be accessing this lock, we only need to hold the codegen lock or module lock as-needed.

@pchintalapudi does that sound correct?

pchintalapudi · 2022-11-25T23:05:03Z

It looks like this PR might have broken CI on master due to a "semantic conflict"?

That failure looks like a real intermittent failure (I probably got the timer reporting logic wrong) which should be a quick fix.

Seems like PkgEval should have been run on this before merging... and we probably should not be breaking GPUCompiler willy nilly these days even if this didn't also break CI

At least for me, the GPUCompiler tests seem to be passing on master? Also, none of the changes that I expect would affect GPUCompiler here were included in the final PR.

If type inference is now thread-safe, then we shouldn't need to be accessing this lock, we only need to hold the codegen lock or module lock as-needed.

As per #46825 (comment), if you're just accessing the LLVM IR then yes the context lock is sufficient, but if you're accessing the codeinst as well some of those fields might be modified unsafely. Incidentally, the context lock is probably also necessary in addition to being sufficient if the context is being provided by the Julia compiler.

DilumAluthge · 2022-11-25T23:08:32Z

I've opened an issue for the CI failure: #47710

* Remove typeinfer lock altogether * Don't remove the typeinf lock functions * Track reentrancy in current task state * Fix up some git status * Initialize task variables * Promise that jl_typeinf_func is rooted somewhere (cherry picked from commit 113efb6)

pchintalapudi requested a review from vtjnash September 18, 2022 18:40

vtjnash approved these changes Sep 19, 2022

View reviewed changes

pchintalapudi mentioned this pull request Sep 19, 2022

Reduce codegen lock scope #46836

Merged

pchintalapudi force-pushed the pc/typeinf-remove branch from 482afa6 to 976d642 Compare October 4, 2022 05:54

brenhinkeller added the compiler:inference Type inference label Nov 17, 2022

pchintalapudi force-pushed the pc/typeinf-remove branch from f025f7f to 1c47b73 Compare November 18, 2022 00:54

pchintalapudi added the backport 1.9 Change should be backported to release-1.9 label Nov 18, 2022

vchuravy requested a review from vtjnash November 18, 2022 02:03

vtjnash reviewed Nov 18, 2022

View reviewed changes

pchintalapudi force-pushed the pc/typeinf-remove branch from 161c25c to bc88b43 Compare November 21, 2022 01:19

pchintalapudi added 6 commits November 22, 2022 10:12

Remove typeinfer lock altogether

a16ac40

Don't remove the typeinf lock functions

9b5c568

Track reentrancy in current task state

356b959

Fix up some git status

98514a5

Initialize task variables

99a8318

Promise that jl_typeinf_func is rooted somewhere

5e1936d

pchintalapudi force-pushed the pc/typeinf-remove branch from bc88b43 to 5e1936d Compare November 22, 2022 15:12

pchintalapudi merged commit 113efb6 into master Nov 23, 2022

pchintalapudi deleted the pc/typeinf-remove branch November 23, 2022 22:11

DilumAluthge mentioned this pull request Nov 26, 2022

Fix and simplify inference timing logic #47711

Merged

fingolfin mentioned this pull request Nov 28, 2022

in gf.c jl_type_infer: why not define in_reference as atomic #47712

Closed

KristofferC mentioned this pull request Nov 28, 2022

release-1.9: Backports for julia 1.9.0-alpha2 / 1.9.0-beta1 #47602

Merged

51 tasks

vchuravy mentioned this pull request Nov 29, 2022

Support external linkage in "sysimages" #44527

Merged

KristofferC mentioned this pull request Dec 14, 2022

Backports for Julia 1.8.4 #47488

Merged

26 tasks

KristofferC removed the backport 1.9 Change should be backported to release-1.9 label Dec 27, 2022

pchintalapudi mentioned this pull request Dec 29, 2022

Don't double-count inference time #48033

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove typeinfer lock altogether #46825

Remove typeinfer lock altogether #46825

pchintalapudi commented Sep 18, 2022

maleadt commented Sep 18, 2022

pchintalapudi commented Sep 18, 2022

maleadt commented Sep 19, 2022

pchintalapudi commented Sep 19, 2022

vtjnash left a comment

pchintalapudi commented Sep 21, 2022

maleadt commented Sep 22, 2022

pchintalapudi commented Oct 4, 2022 •

edited

Loading

pchintalapudi commented Nov 18, 2022

vtjnash Nov 18, 2022 •

edited

Loading

pchintalapudi Nov 20, 2022

vtjnash left a comment

vtjnash commented Nov 18, 2022

pchintalapudi commented Nov 20, 2022

vtjnash commented Nov 20, 2022

DilumAluthge commented Nov 24, 2022

brenhinkeller commented Nov 25, 2022

jpsamaroo commented Nov 25, 2022

pchintalapudi commented Nov 25, 2022

DilumAluthge commented Nov 25, 2022

Remove typeinfer lock altogether #46825

Remove typeinfer lock altogether #46825

Conversation

pchintalapudi commented Sep 18, 2022

maleadt commented Sep 18, 2022

pchintalapudi commented Sep 18, 2022

maleadt commented Sep 19, 2022

pchintalapudi commented Sep 19, 2022

vtjnash left a comment

Choose a reason for hiding this comment

pchintalapudi commented Sep 21, 2022

maleadt commented Sep 22, 2022

pchintalapudi commented Oct 4, 2022 • edited Loading

pchintalapudi commented Nov 18, 2022

vtjnash Nov 18, 2022 • edited Loading

Choose a reason for hiding this comment

pchintalapudi Nov 20, 2022

Choose a reason for hiding this comment

vtjnash left a comment

Choose a reason for hiding this comment

vtjnash commented Nov 18, 2022

pchintalapudi commented Nov 20, 2022

vtjnash commented Nov 20, 2022

DilumAluthge commented Nov 24, 2022

brenhinkeller commented Nov 25, 2022

jpsamaroo commented Nov 25, 2022

pchintalapudi commented Nov 25, 2022

DilumAluthge commented Nov 25, 2022

pchintalapudi commented Oct 4, 2022 •

edited

Loading

vtjnash Nov 18, 2022 •

edited

Loading