Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Count invalidations in profiling reports #49535

Merged
merged 9 commits into from
May 15, 2023
Merged

Conversation

pchintalapudi
Copy link
Member

VTune has functionality to track counters as a function of time, so we can track invalidation count over a program's lifetime directly inside the profiler report. There are probably other cases where this is useful.

image

Copy link
Member

@topolarity topolarity left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting! Tracy supports similar functionality but as a float/double value that can be plotted in the visualizer.

We currently have it hooked up to track heap size without a generic timing API, but this could replace that if it can do more than inc/dec

Does ITTAPI support that more general API, or is it limited to increment/decrement?

@KristofferC
Copy link
Member

KristofferC commented Apr 27, 2023

Does ITTAPI support that more general API, or is it limited to increment/decrement?

Looks like __itt_counter_set_value(__itt_counter id, void *value_ptr) can do this (https://www.intel.com/content/www/us/en/docs/gpa/user-guide/2022-4/counter-api.html) but it seems to only be for integer values.

I agree it would be good to base the API on this more general style.

@KristofferC
Copy link
Member

KristofferC commented Apr 27, 2023

This is nice, from loading Makie we can see with SnoopCompile and @snoopr:

 inserting convert(::Type{IT}, x::GeometryBasics.OffsetInteger) where IT<:Integer @ GeometryBasics ~/.julia/packages/GeometryBasics/kt2Em/src/offsetintegers.jl:40 invalidated:
   backedges:  1: superseding convert(::Type{T}, x::Number) where T<:Number @ Base number.jl:7 with MethodInstance for 
...
               6: superseding convert(::Type{T}, x::Number) where T<:Number @ Base number.jl:7 with MethodInstance for convert(::Type{Int64}, ::Integer) (18 children)
               7: superseding convert(::Type{T}, x::Number) where T<:Number @ Base number.jl:7 with MethodInstance for convert(::Type{T} where T<:Integer, ::Integer) (22 children)
               8: superseding convert(::Type{T}, x::Number) where T<:Number @ Base number.jl:7 with MethodInstance for convert(::Type{UInt64}, ::Integer) (89 children)
...

and indeed, in Tracy (with this plot), we see the number of invalidations sharply rise when that method is added:

image

@topolarity
Copy link
Member

but it seems to only be for integer values.

I think we might be in luck actually. Looks like it's polymorphic:

typedef enum {
    __itt_metadata_unknown = 0,
    __itt_metadata_u64,     /**< Unsigned 64-bit integer */
    __itt_metadata_s64,     /**< Signed 64-bit integer */
    __itt_metadata_u32,     /**< Unsigned 32-bit integer */
    __itt_metadata_s32,     /**< Signed 32-bit integer */
    __itt_metadata_u16,     /**< Unsigned 16-bit integer */
    __itt_metadata_s16,     /**< Signed 16-bit integer */
    __itt_metadata_float,   /**< Signed 32-bit floating-point */
    __itt_metadata_double   /**< SIgned 64-bit floating-point */
} __itt_metadata_type;

https://github.com/intel/ittapi/blob/4a3762fbc517475bfb0586efc513e51c164108c9/include/ittnotify.h#L2436-L2446

@KristofferC
Copy link
Member

Ah, I guess you use __itt_counter_create_typed to create the counter then.

@pchintalapudi
Copy link
Member Author

What do we need a float counter for? Looks to me like heap counter is also integral?

@KristofferC
Copy link
Member

There might be other plots in the future I am thinking. But if it is messy, should be fine to restrict it to integers for now. But maybe using the API for the absolute value over the dec/inc is still a good idea?

@pchintalapudi
Copy link
Member Author

It doesn't really matter for the heap plot since we will only call it from one thread, but I would be wary in general of calling the set_value api from anywhere else since it's highly unlikely to be done in a thread safe manner.

@pchintalapudi
Copy link
Member Author

@KristofferC did you have a patch to add that Tracy support in? I think if that's already available we should add it to this PR.

@pchintalapudi pchintalapudi force-pushed the pc/ittapi-invalidations branch from a22a5b5 to 167aba4 Compare May 3, 2023 05:30
@pchintalapudi pchintalapudi changed the base branch from pc/cond-event to master May 3, 2023 05:31
@pchintalapudi pchintalapudi marked this pull request as ready for review May 3, 2023 05:31
@KristofferC
Copy link
Member

@pchintalapudi, please see 4608aba and give your thoughts.

image

@pchintalapudi
Copy link
Member Author

Unfortunately this is not thread safe. As an example, two threads could both add to the atomic variable, but then the thread with the higher value calls TracyCPlot before the thread with the lower value is able to. If I'm understanding correctly, this would cause the graph to show a negative instead of a positive slope. Ideally the library itself needs to provide inc/dec functions to avoid this kind of issue.

@KristofferC
Copy link
Member

Unfortunately this is not thread safe

Yeah, I was kind of winging it. I'll look into adding it to Tracy but that might take a while. What's best to do in the meantime?

@vchuravy
Copy link
Member

vchuravy commented May 3, 2023

I think your commit could be okay for now, but with explicit documentation that it may be imprecise.
The inaccuracy is self-healing. Julia will maintain the counter correctly so overtime tracy will see an approximation of the correct value :)

@KristofferC
Copy link
Member

Woukd be good with input from @topolarity on the tracy implementation above.

@topolarity
Copy link
Member

Left one small note that probably needs attention

Otherwise I think that change is good to kick things off 👍 We should see about fixing the "tearing" when we decide to improve Tracy's fiber lock.

@KristofferC
Copy link
Member

I pushed the tracy code to this branch.

@KristofferC
Copy link
Member

There are existing Tracy plots (for the GC) at

julia/src/gc.c

Lines 3097 to 3099 in 827d34a

#ifdef USE_TRACY
TracyCPlot("Heap size", live_bytes + gc_num.allocd);
#endif

and

julia/src/gc.c

Lines 3492 to 3494 in 827d34a

#ifdef USE_TRACY
TracyCPlot("Heap size", jl_gc_live_bytes());
#endif

Presumably these should be deleted, or moved to the generic counting system?

@pchintalapudi
Copy link
Member Author

I can delete them, I just don't know how to apply the memory units to the counter's graph of heap size, which is why I left them in. I also don't know if the generic stubs I added are exactly equivalent to what that plot is supposed to show.

@topolarity
Copy link
Member

We can probably leave making the heap size plots generic for later, but fwiw it's valid to call TracyCPlotConfig multiple times for the same plot, so that might make integrating with the generic plot init a bit more straight-forward.

In the mean time, I'd like not to delete the heap size plot or lose the TracyPlotFormatMemory config.

@pchintalapudi pchintalapudi force-pushed the pc/ittapi-invalidations branch from 80950dc to 0e8adb7 Compare May 8, 2023 02:25
@pchintalapudi
Copy link
Member Author

@topolarity I wired the memory plot configuration into timing.c instead and deleted the old plotting code, could you check if the memory plot still exists and is correct with Tracy?

Copy link
Member

@topolarity topolarity left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new plot seems generally synchronized, but compared to before it's missing the peak value from the beginning of the sweep phase:
image

Is the old value a bug?

It might also be worth experimenting with moving the updates to where the per-thread allocations actually happen. That would provide a richer allocation view, assuming that the traffic isn't too excessive.

@pchintalapudi
Copy link
Member Author

I think per-thread allocations would be too expensive, a short loop can quickly blow up the number of counter events. I don't know about the divergence from the old plot, but I'm plotting the precise value of live_bytes whenever we update it. If there's something else that's added to live_bytes to get the total heap size, then we could look at adding that too.

@topolarity
Copy link
Member

I think per-thread allocations would be too expensive, a short loop can quickly blow up the number of counter events.

Agreed, let's not do that 👍

I don't know about the divergence from the old plot, but I'm plotting the precise value of live_bytes whenever we update it.

The desired update pre-sweep is the bytes allocated by the GC just before reclaiming in the sweep phase (so that the downward slope happens over the sweep phase only, the peak is representative of the max heap size since the last collection, and the delta is the bytes reclaimed).

@pchintalapudi
Copy link
Member Author

I think the tracy plot should now return to its original high and negative slope appearance with this change, and I've added a few more memory plots for JIT memory and image memory. If all of these look good I think we should be good to merge @topolarity ?

Copy link
Member

@topolarity topolarity left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Heap size is behaving like before, and the new plots are looking lovely:
image

I wish we could group the JIT data/code/total size together into one plot, but that'll need a Tracy enhancement before we can make that happen.

@pchintalapudi pchintalapudi merged commit 909c57f into master May 15, 2023
@pchintalapudi pchintalapudi deleted the pc/ittapi-invalidations branch May 15, 2023 21:46
Comment on lines +854 to +855
size_t code_size = 0;
size_t data_size = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On aarch64 darwin I'm getting

/Users/mose/repo/julia/src/jitlayers.cpp:858:20: warning: variable 'data_size' set but not used [-Wunused-but-set-variable]
            size_t data_size = 0;
                   ^
/Users/mose/repo/julia/src/jitlayers.cpp:857:20: warning: variable 'code_size' set but not used [-Wunused-but-set-variable]
            size_t code_size = 0;
                   ^

(and currently we aren't using -Werror on CI on macOS)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants