Don't perform extra inference during incremental image creation #48054

Keno · 2022-12-30T21:11:27Z

As noted in #48047, we're currently attempting to infer extra methods during incremental image saving, which causes us to miss edges in the image. In particular, in the case of #48047, Cthulhu had the compile=min option set, which caused the code instance for do_typeinf! to not be infered. However, later it was nevertheless queued for precompilation, causing inference to occur at an inopportune time.

This PR simply prevents code instances that don't explicitly have the ->precompile flag set (e.g. the guard instance created for the interpreter) from being enqueued for precompilation. It is not clear that this is necessarily the correct behavior - we may in fact want to infer these method instances, just before we set up the serializer state, but for now this fixes #48047 for me.

I also included an appropriate test and a warning message if we attempt to enter inference when this is not legal, so any revisit of what should be happening here can hopefully make use of those.

JeffBezanson · 2022-12-30T23:13:48Z

src/precompile_utils.c

@@ -188,7 +188,7 @@ static int precompile_enq_specialization_(jl_method_instance_t *mi, void *closur
                (jl_ir_inlining_cost((jl_array_t*)inferred) == UINT16_MAX)) {
                do_compile = 1;
            }
-            else if (jl_atomic_load_relaxed(&codeinst->invoke) != NULL || jl_atomic_load_relaxed(&codeinst->precompile)) {
+            else if (jl_atomic_load_relaxed(&codeinst->precompile)) {


Maybe instead we can check that invoke is not set to jl_fptr_interpret_call?

Yes, that would work for this case, but I don't really understand the purpose and semantics of the ->precompile field then if we just basically ignore it here.

@timholy wrote this particular section of code.

Keno · 2023-01-02T11:23:06Z

I'm planning to merge this today so Revise starts working again. Obviously if @timholy has alternative ideas for the correct criterion here, so should do that as a followup.

Keno · 2023-01-02T13:59:53Z

Looks like the condition here isn't quite enough. Revise triggers the error message I added:

julia> using Revise
[ Info: Precompiling Revise [295af30f-e4ad-537b-8983-00126c2a3abe]
ERROR: Attempted to enter inference while writing out image.

Keno · 2023-01-02T14:21:06Z

Looks like @vtjnash was right and we just can't have that case at all (if we want it, we need to have it do re-inference earlier).

Keno · 2023-01-04T22:55:18Z

Sigh, we have a test that fails when some of these get removed from the sysimage. I'm just gonna turn off the warning I added here, put back the logic that adds them to the list and then rely on the check I added at the inference entry to avoid having it actually do the inference. Not pretty, but should fix the bug and hopefully @vtjnash or @timholy will fix this properly soon.

As noted in #48047, we're currently attempting to infer extra methods during incremental image saving, which causes us to miss edges in the image. In particular, in the case of #48047, Cthulhu had the `compile=min` option set, which caused the code instance for `do_typeinf!` to not be infered. However, later it was nevertheless queued for precompilation, causing inference to occur at an inopportune time. This PR simply prevents code instances that don't explicitly have the `->precompile` flag set (e.g. the guard instance created for the interpreter) from being enqueued for precompilation. It is not clear that this is necessarily the correct behavior - we may in fact want to infer these method instances, just before we set up the serializer state, but for now this fixes #48047 for me. I also included an appropriate test and a warning message if we attempt to enter inference when this is not legal, so any revisit of what should be happening here can hopefully make use of those.

timholy · 2023-01-05T20:10:14Z

Sorry about this. Until today, for the last few months I've been ignoring my gmail (GitHub notifications), and only just saw this.

I think the code in question is old, for example here it is in 1.8:

julia/src/precompile.c

Lines 288 to 311 in 0efc765

    
           static int precompile_enq_specialization_(jl_method_instance_t *mi, void *closure) 
        
           { 
        
               assert(jl_is_method_instance(mi)); 
        
               jl_code_instance_t *codeinst = mi->cache; 
        
               while (codeinst) { 
        
                   int do_compile = 0; 
        
                   if (codeinst->invoke != jl_fptr_const_return) { 
        
                       if (codeinst->inferred && codeinst->inferred != jl_nothing && 
        
                           jl_ir_flag_inferred((jl_array_t*)codeinst->inferred) && 
        
                           !jl_ir_flag_inlineable((jl_array_t*)codeinst->inferred)) { 
        
                           do_compile = 1; 
        
                       } 
        
                       else if (codeinst->invoke != NULL || codeinst->precompile) { 
        
                           do_compile = 1; 
        
                       } 
        
                   } 
        
                   if (do_compile) { 
        
                       jl_array_ptr_1d_push((jl_array_t*)closure, (jl_value_t*)mi); 
        
                       return 1; 
        
                   } 
        
                   codeinst = jl_atomic_load_relaxed(&codeinst->next); 
        
               } 
        
               return 1; 
        
           }

It seems to be identical to what you started with before this PR.

I think the big difference is that we're calling it from more paths now than we used to be. And probably at a later stage: I moved the time-of-call to after we'd done the system traversal to make sure that the native code for external methods (methods we add to pre-existing modules) and external specializations (new specializations of methods we don't own) also got written, so I was just trying to re-use the work we had already done on system traversal. In rough terms, that moved the call location from somewhere around here:

julia/src/precompile.c

Lines 108 to 116 in dc2b4d9

    
           void *native_code = NULL; 
        
           bool_t emit_native = jl_options.outputo || jl_options.outputbc || jl_options.outputunoptbc || jl_options.outputasm; 
        
           bool_t emit_split = jl_options.outputji && emit_native; 
        
           ios_t *s = NULL; 
        
           ios_t *z = NULL; 
        
           int64_t srctextpos = 0 ;

to now being called from here:

julia/src/staticdata.c

Line 2628 in 0913cbc

    
           *_native_data = jl_precompile_worklist(worklist, extext_methods, new_specializations);

The intent is simply to cache "what we've got." I think anything that has precompile set (EDIT: see important clarification in #48054 (comment)) but actually isn't precompiled already is just stuck; in my view, module-level compilation settings have priority over tagging (the tag is set when precompilation occurs, so we expect all the inference to be done by then if it's ever going to be done). The flag is a cache_me rather than a "please compile me" tag. It is intended to make sure we have a way to say "cache me even if we don't have backedges to the worklist we're caching now."

Does that provide enough of a roadmap to know how to fix this?

vtjnash · 2023-01-05T20:48:31Z

After that system traversal, any code execution will invalidate all of that result, since the system has changed

timholy · 2023-01-05T21:33:18Z

I think we're agreeing, but I'm not certain I know what you mean. What execution are you referring to?

To clarify, my point is that we shouldn't do any additional inference; if we discover anything that needs inference before we can pass it to LLVM, we shouldn't infer and we shouldn't pass to LLVM. The only reason to pass anything through LLVM (again) is because we need to generate a clean LLVM module for caching, but if we didn't need to do that then I'd say "no more compilation at all."

The flag on the codeinst is misleading (EDIT: see #48054 (comment)) because it should probably be named cacheme rather than precompile. It gets set under certain circumstances to ensure that we cache what we already precompiled, even if there aren't any backedges; it's not prospectively saying "you should precompile this."

vtjnash · 2023-01-05T21:51:10Z

That is fine, but you have to re-run jl_prepare_serialization_data after, since the previously prepared data is no longer valid

timholy · 2023-01-05T22:35:50Z

So are you proposing basically that we call jl_prepare_serialization_data again after the jl_precompile_worklist? That sounds straightforward. But I thought it was also bad to infer things that the module-settings said shouldn't be aggressively compiled? Thus I'm wondering if just deleting #48054 (comment) is indeed the right answer.

Understand that because of my LLVM-ignorance I am a little vague about what the various code/opaque-pointer fields of a codeinst actually mean and what their settings can be. So my vague, general statement is that "if we already emitted LLVM code for this thing, we need to re-do that in the clean module; if we didn't, then perhaps we shouldn't now." I trust others know more than I about what to actually change to make it happen that way, and I'll be grateful if someone besides me implements the actual fix. Not because I'm lazy.

timholy · 2023-01-06T09:42:51Z

Ah shoot, I need to correct one thing:

https://github.com/JuliaLang/julia/blame/db00cc1a8455ceb4a8dc3cb8dd6ded3d62e46dcb/src/julia.h#L372

vs

https://github.com/JuliaLang/julia/blame/f8d3bd22f68f10bd2dc3f1be3b74bfa7cb125bc5/src/julia.h#L387

The one I added (to jl_method_instance_t) means "cache me"; I can't speak for the one Jeff added. And I think the one causing trouble here is the one Jeff added, but only because of the new way it's being used. So I'm less sure of the proper solution here; maybe @vtjnash's hint that maybe we just scoop up the whole system a second time is what's required? Not sure.

As noted in #48047, we're currently attempting to infer extra methods during incremental image saving, which causes us to miss edges in the image. In particular, in the case of #48047, Cthulhu had the `compile=min` option set, which caused the code instance for `do_typeinf!` to not be infered. However, later it was nevertheless queued for precompilation, causing inference to occur at an inopportune time. This PR simply prevents code instances that don't explicitly have the `->precompile` flag set (e.g. the guard instance created for the interpreter) from being enqueued for precompilation. It is not clear that this is necessarily the correct behavior - we may in fact want to infer these method instances, just before we set up the serializer state, but for now this fixes #48047 for me. I also included an appropriate test and a warning message if we attempt to enter inference when this is not legal, so any revisit of what should be happening here can hopefully make use of those. (cherry picked from commit 80aeebe)

Keno requested review from vchuravy, vtjnash and timholy December 30, 2022 21:11

Keno added the backport 1.9 Change should be backported to release-1.9 label Dec 30, 2022

Keno mentioned this pull request Dec 30, 2022

Missing backedge somewhere on master? #48047

Closed

JeffBezanson reviewed Dec 30, 2022

View reviewed changes

Keno force-pushed the kf/48047 branch from 9b04a84 to 1e3555e Compare December 31, 2022 16:14

KristofferC mentioned this pull request Jan 2, 2023

Backports for julia 1.9.0-beta3 #48075

Merged

41 tasks

Keno force-pushed the kf/48047 branch 2 times, most recently from 8c71279 to 756db23 Compare January 2, 2023 13:09

Keno force-pushed the kf/48047 branch from 756db23 to 5481ca8 Compare January 2, 2023 14:20

Keno force-pushed the kf/48047 branch from 5481ca8 to 95d75d1 Compare January 4, 2023 23:06

Keno merged commit 80aeebe into master Jan 5, 2023

Keno deleted the kf/48047 branch January 5, 2023 12:47

KristofferC removed the backport 1.9 Change should be backported to release-1.9 label Jan 17, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Don't perform extra inference during incremental image creation #48054

Don't perform extra inference during incremental image creation #48054

Keno commented Dec 30, 2022

JeffBezanson Dec 30, 2022

Keno Dec 30, 2022

vchuravy Dec 31, 2022

Keno commented Jan 2, 2023

Keno commented Jan 2, 2023

Keno commented Jan 2, 2023

Keno commented Jan 4, 2023

timholy commented Jan 5, 2023 •

edited

Loading

vtjnash commented Jan 5, 2023

timholy commented Jan 5, 2023 •

edited

Loading

vtjnash commented Jan 5, 2023

timholy commented Jan 5, 2023 •

edited

Loading

timholy commented Jan 6, 2023

Don't perform extra inference during incremental image creation #48054

Don't perform extra inference during incremental image creation #48054

Conversation

Keno commented Dec 30, 2022

JeffBezanson Dec 30, 2022

Choose a reason for hiding this comment

Keno Dec 30, 2022

Choose a reason for hiding this comment

vchuravy Dec 31, 2022

Choose a reason for hiding this comment

Keno commented Jan 2, 2023

Keno commented Jan 2, 2023

Keno commented Jan 2, 2023

Keno commented Jan 4, 2023

timholy commented Jan 5, 2023 • edited Loading

vtjnash commented Jan 5, 2023

timholy commented Jan 5, 2023 • edited Loading

vtjnash commented Jan 5, 2023

timholy commented Jan 5, 2023 • edited Loading

timholy commented Jan 6, 2023

timholy commented Jan 5, 2023 •

edited

Loading

timholy commented Jan 5, 2023 •

edited

Loading

timholy commented Jan 5, 2023 •

edited

Loading