Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove typeinfer lock altogether #46825

Merged
merged 6 commits into from
Nov 23, 2022
Merged

Remove typeinfer lock altogether #46825

merged 6 commits into from
Nov 23, 2022

Conversation

pchintalapudi
Copy link
Member

This is the last section of code that refers to the old type inference lock, so now it gets its own mutex to itself.

@maleadt
Copy link
Member

maleadt commented Sep 18, 2022

GPUCompiler was using these. Can you explain if and how we need to do locking from there now?

@pchintalapudi
Copy link
Member Author

If GPUCompiler is using the typeinf lock for type inference alone, it should no longer need to do that. If it was using it as a proxy for the codegen lock, there currently isn't a replacement, so I can just add the definitions back in (perhaps with a rename to make it more obvious?)

@maleadt
Copy link
Member

maleadt commented Sep 19, 2022

We were also using this to lock codegen. But aren't the thread-safe context/modules enough for that?

@pchintalapudi
Copy link
Member Author

Once the LLVM IR has been generated, there should be no need for locks besides the context lock itself. I can't guarantee that all accesses to codeinst fields are safe outside a lock though.

Copy link
Sponsor Member

@vtjnash vtjnash left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, it is a simple change needed here now, almost NFC. This make the code look cleaner though.

@pchintalapudi
Copy link
Member Author

@maleadt Do you still want the lock methods present or should we merge this as-is? Hopefully in a couple of PRs the codegen lock will entirely dissipate away, so it would be good to identify any external dependencies on it.

@maleadt
Copy link
Member

maleadt commented Sep 22, 2022

Do you still want the lock methods present or should we merge this as-is?

You tell me, I really don't know anymore at this point what we should and shouldn't lock in GPUCompiler.jl. FWIW, we used to use the typinf lock to protect everything involving type inference (jl_type_intersection_with_env, jl_specializations_get_linfo) and codegen (jl_create_native). IIUC the former isn't needed anymore, and the latter takes an OrcThreadSafeModule.

Once the LLVM IR has been generated

But that's jl_create_native, which takes an OrcThreadSafeModule, so isn't it using that lock?

@pchintalapudi
Copy link
Member Author

pchintalapudi commented Oct 4, 2022

we used to use the typinf lock to protect everything involving type inference (jl_type_intersection_with_env, jl_specializations_get_linfo) and codegen (jl_create_native)

I also don't think this is necessary anymore, but it might be safer to drop the lock synchronously with julia dropping the lock in codegen around these functions (#46836) rather than having to do it as part of this PR. So I've added the typeinf lock functions back to avoid breaking GPUCompiler here, though they're uncalled by base.

But that's jl_create_native, which takes an OrcThreadSafeModule, so isn't it using that lock?

Internally create_native will take the lock, but the lock doesn't need to be held while calling the function. Also, the lock is released before function return, and the context lock is sufficient to protect the LLVM IR from the return onwards.

@brenhinkeller brenhinkeller added the compiler:inference Type inference label Nov 17, 2022
@pchintalapudi pchintalapudi added the backport 1.9 Change should be backported to release-1.9 label Nov 18, 2022
@pchintalapudi
Copy link
Member Author

@vtjnash Could you review this PR again? I've added some code to deal with type inference recursion handling which should probably be backported to 1.9.

Comment on lines 3438 to 3452
JL_DLLEXPORT void jl_typeinf_lock_begin(void)
{
JL_LOCK(&typeinf_lock);
//Although this is claiming to be a typeinfer lock, it is actually
//affecting the codegen lock count, not type inference's inferencing count
jl_task_t *ct = jl_current_task;
ct->reentrant_codegen++;
}

JL_DLLEXPORT void jl_typeinf_lock_end(void)
{
jl_task_t *ct = jl_current_task;
ct->reentrant_codegen--;
JL_UNLOCK(&typeinf_lock);
}
Copy link
Sponsor Member

@vtjnash vtjnash Nov 18, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we instead just remove this now?

Suggested change
JL_DLLEXPORT void jl_typeinf_lock_begin(void)
{
JL_LOCK(&typeinf_lock);
//Although this is claiming to be a typeinfer lock, it is actually
//affecting the codegen lock count, not type inference's inferencing count
jl_task_t *ct = jl_current_task;
ct->reentrant_codegen++;
}
JL_DLLEXPORT void jl_typeinf_lock_end(void)
{
jl_task_t *ct = jl_current_task;
ct->reentrant_codegen--;
JL_UNLOCK(&typeinf_lock);
}

(and related exports in jl_exported_funcs.inc)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since GPUCompiler is depending on those for type inference, I'd rather keep them around until we completely drop locking around our type inference, just in case we locate any hidden bugs along the way.

Copy link
Sponsor Member

@vtjnash vtjnash left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM

@vtjnash
Copy link
Sponsor Member

vtjnash commented Nov 18, 2022

But apparently this broke CI in many "interesting" ways however?

@pchintalapudi
Copy link
Member Author

But apparently this broke CI in many "interesting" ways however?

I think most of them are now fixed, but why is the analyzegc check newly failing? I don't see any particular reason that the changes would cause a value to become unrooted?

@vtjnash
Copy link
Sponsor Member

vtjnash commented Nov 20, 2022

It is a heuristic-driven pass, so unrelated changes can result in it exploring a portion of the execution graph that it previously skipped (or exploring it differently)

@pchintalapudi pchintalapudi deleted the pc/typeinf-remove branch November 23, 2022 22:11
@DilumAluthge
Copy link
Member

It looks like this PR might have broken CI on master due to a "semantic conflict"? CI was all green on this PR, but when I look at 113efb6 on master, I see failures on i686-w64-mingw32, x86_64-w64-mingw32, and x86_64-apple-darwin. On each platform, the failure is in the misc testset and looks something like this:

Error in testset misc:
--
  | Test Failed at /Users/julia/.julia/scratchspaces/a66863c6-20e8-4ff4-8a62-49f30b1f605e/agent-cache/default-macmini-x64-4.0/build/default-macmini-x64-4-0/julialang/julia-master/julia-113efb6e0a/share/julia/test/misc.jl:355
  | Expression: after_comp >= before_comp
  | Evaluated: 0x6e2cbf1afdb64277 >= 0x9e9cb5ddee20d1af
  | ERROR: LoadError: Test run finished with errors

I have triggered a re-run of each of the failed jobs; let's see if the test failures persist on the re-run.

@brenhinkeller
Copy link
Sponsor Contributor

Seems like PkgEval should have been run on this before merging... and we probably should not be breaking GPUCompiler willy nilly these days even if this didn't also break CI

@jpsamaroo
Copy link
Member

and we probably should not be breaking GPUCompiler willy nilly these days

If type inference is now thread-safe, then we shouldn't need to be accessing this lock, we only need to hold the codegen lock or module lock as-needed.

@pchintalapudi does that sound correct?

@pchintalapudi
Copy link
Member Author

It looks like this PR might have broken CI on master due to a "semantic conflict"?

That failure looks like a real intermittent failure (I probably got the timer reporting logic wrong) which should be a quick fix.

Seems like PkgEval should have been run on this before merging... and we probably should not be breaking GPUCompiler willy nilly these days even if this didn't also break CI

At least for me, the GPUCompiler tests seem to be passing on master? Also, none of the changes that I expect would affect GPUCompiler here were included in the final PR.

If type inference is now thread-safe, then we shouldn't need to be accessing this lock, we only need to hold the codegen lock or module lock as-needed.

As per #46825 (comment), if you're just accessing the LLVM IR then yes the context lock is sufficient, but if you're accessing the codeinst as well some of those fields might be modified unsafely. Incidentally, the context lock is probably also necessary in addition to being sufficient if the context is being provided by the Julia compiler.

@DilumAluthge
Copy link
Member

I've opened an issue for the CI failure: #47710

KristofferC pushed a commit that referenced this pull request Nov 28, 2022
* Remove typeinfer lock altogether

* Don't remove the typeinf lock functions

* Track reentrancy in current task state

* Fix up some git status

* Initialize task variables

* Promise that jl_typeinf_func is rooted somewhere

(cherry picked from commit 113efb6)
@KristofferC KristofferC mentioned this pull request Dec 14, 2022
26 tasks
@KristofferC KristofferC removed the backport 1.9 Change should be backported to release-1.9 label Dec 27, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler:inference Type inference
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants