Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow for permanent tracking of compilation times #41762

Closed

Conversation

janrous-rai
Copy link

@janrous-rai janrous-rai commented Aug 2, 2021

In situations where we run long-running server that spends non-trivial amount of time in the compilation we are happy to pay the cost to have this instrumentation always on.

This PR allows us to enable permanent tracking (effectively switching the before/after switches moot) by calling jl_track_compile_time_permanently() method, and then providing jl_cumulative_compile_time_ns_total() to return the total time.

@janrous-rai
Copy link
Author

fyi @vchuravy, @vtjnash and @IanButterworth who were reviewing Nathan's original re-entrancy PR

@NHDaly
Copy link
Member

NHDaly commented Aug 26, 2021

Okay, now that #41733 is merged, i think we can come back to this one. I'll put a bit of work into rebasing this over #41733, since i did touch a bunch of related places.

@NHDaly
Copy link
Member

NHDaly commented Aug 26, 2021

Okay, I think i've merged in latest master.

@janrous-rai: I was thinking more about this, and after the changes in #41733, I think this PR now provides two separate things:

  1. Always-on compile time tracking + the ability to fetch the total cumulative time
  2. Restoring the ability to also track per-thread compilation time in addition to the global time we created in Make jl_cumulative_compile_time_ns global (and reentrant). #41733.

For 1., my feeling is: 💯% a great idea, golden, ship it
For 2.: I think i'm also supportive of this? I know that it's probably a bit less useful since 1.7 will be doing task migration, but I also feel like MOAR METRICS, you know? :) I can't think of immediately obvious uses for it, but if it's easy, it seems worth adding.

BUT the main thing i realized is that those two things can be merged separately, since feature 1. is both easier and more excellent. So i'm going to rebase this PR on master, and strip everything out except "Always-on compile time tracking + the ability to fetch the total cumulative time," and then push up a new PR based on this one for feature 2., and we can see how hard that is and whether we want to push on it.

EDIT: I've opened #42018 for 2.

NHDaly and others added 2 commits August 26, 2021 17:41
In situations where we run long-running server that spends non-trivial amount of time in the compilation we are happy to pay the cost to have this instrumentation always on.

This PR allows us to enable permanent tracking (effectively switching the before/after switches moot) by calling `jl_track_compile_time_permanently()` method, and then providing `jl_cumulative_compile_time_ns_total()` to return the total time.

Co-Authored-By: janrous-rai <jan.rous@relational.ai>
Copy link
Member

@NHDaly NHDaly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I love it! 😁

At this point, I've written most of it, so probably good to have another set of eyes on it, but yeah, this LGTM!

Thanks for the initial idea, @janrous-rai - i think this is a solid improvement. 👍


Permanently enable tracking of time spent in compilation by Julia.

Julia has the ability to measure the amount of time spent during compilation (including
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

during -> on maybe? 🤔

Permanently enable tracking of time spent in compilation by Julia.

Julia has the ability to measure the amount of time spent during compilation (including
type-inference, optimization, llvm optimizaiton, codegen, etc). However, on some systems
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

type-inference -> type inference I think, and optimizaiton -> optimization? :)


Julia has the ability to measure the amount of time spent during compilation (including
type-inference, optimization, llvm optimizaiton, codegen, etc). However, on some systems
(FreeBSD-based systems are the known problems), this measurment can be too expensive, so
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

problems -> problem I think? :)


Calling this function will globally enable tracking the cumulative compilation time.

You can fetch the current total cumulative time spent in compilation by calling:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

total cumulative -> cumulative? 🤔

"""
Base.cumulative_compile_time_ns_total()

The current total cumulative time Julia has spent in compilation, in nanoseconds.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

total cumulative -> cumulative? 🤔

Comment on lines +85 to +86
Otherwise, this function will only return the total time spent in compilation during calls
to [`Base.@time`](@ref).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps

Suggested change
Otherwise, this function will only return the total time spent in compilation during calls
to [`Base.@time`](@ref).
Otherwise, this function will return only the time spent in compilation accumulated during all calls
to [`Base.@time`](@ref).

? :)

return jl_atomic_load_relaxed(&jl_cumulative_compile_time);
}

extern "C" JL_DLLEXPORT
uint64_t jl_cumulative_compile_time_ns_total()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe drop the total, redundant with the cumulative? :)

@@ -562,10 +562,14 @@ static void JL_NORETURN throw_internal(jl_task_t *ct, jl_value_t *exception JL_M
ptls->io_wait = 0;
// @time needs its compile timer disabled on error,
// and cannot use a try-finally as it would break scope for assignments
// We blindly disable compilation time tracking here, for all running Tasks, even though
// it may cause some incorrect measurements. This is a known bug, and is being tracked
// here: https://github.com/JuliaLang/julia/pull/39138
Copy link
Member

@Sacha0 Sacha0 Aug 26, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess till #39138 lands (edit: or rather its solution, i.e. #41923), the separate always_measure_compile_time flag has utility in that it ends up being more robust? 🤔

{
// Increment the flag to allow reentrant callers to `@time`.
jl_always_measure_compile_time = 1;
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need a different flag? My intuition would be that you only need to increment jl_measure_compile_time_enabled and never decrement it.

Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see due to the jl_throw behaviour...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is my understanding as well. Perhaps advancing #41923 is in order? :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah.... i've marked #41923 for #triage.

Now that we've talked about it, i think that would definitely be the better approach than this one, yeah.

@janrous-rai
Copy link
Author

Looks good, thanks @NHDaly for working through this. Let me know if you would like me to resolve the feedback on this or you consider yourself the new "owner" of this until approved?

Regarding the suggestions about 1. total compile time tracking and 2. per-thread compile time tracking -- I think that (1) is definitely useful and I'm not really sure if (2) has much value. Yes, generally speaking, more metrics is better, but with task migration, threads become even more "opaque" and not really interesting so I would be completely fine with having just the per-process total.

It might still make sense to measure per-thread (or total) compilation contention (waiting on the lock held by other threads) as this likely may result in observable performance loss, but this is for another PR altogether.

@NHDaly
Copy link
Member

NHDaly commented Sep 2, 2021

@janrous-rai - yeah, i'll take it over. sorry for the delay. I think waiting for #41923 is definitely better, but it'll have to wait until triage next week. Thanks and sorry

@NHDaly
Copy link
Member

NHDaly commented Sep 26, 2021

OKAY, so, now that #41923 is merged, i think we can actually close this PR! 😮

I think users can globabally and permanently enable compilation time tracking by calling Base.cumulative_compile_time_ns_before(), and that's it! :)

You don't need to worry about it getting turned off on an exception, because we've handled that with the try-catch, so if you manually bump the start count with cumulative_compile_time_ns_before(), it'll never turn off. 😊

This was quite a journey, but I'm glad it's settled.

So I'm going to close this PR now! @janrous-rai let me know if I'm missing anything else!
Thanks @Sacha0 and @vchuravy for the help to get here. 💪

@NHDaly NHDaly closed this Sep 26, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants