-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Count invalidations in profiling reports #49535
Conversation
586c590
to
a22a5b5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting! Tracy supports similar functionality but as a float/double value that can be plotted in the visualizer.
We currently have it hooked up to track heap size without a generic timing API, but this could replace that if it can do more than inc/dec
Does ITTAPI support that more general API, or is it limited to increment/decrement?
Looks like I agree it would be good to base the API on this more general style. |
This is nice, from loading Makie we can see with SnoopCompile and
and indeed, in Tracy (with this plot), we see the number of invalidations sharply rise when that method is added: |
I think we might be in luck actually. Looks like it's polymorphic:
|
Ah, I guess you use |
What do we need a float counter for? Looks to me like heap counter is also integral? |
There might be other plots in the future I am thinking. But if it is messy, should be fine to restrict it to integers for now. But maybe using the API for the absolute value over the dec/inc is still a good idea? |
It doesn't really matter for the heap plot since we will only call it from one thread, but I would be wary in general of calling the set_value api from anywhere else since it's highly unlikely to be done in a thread safe manner. |
@KristofferC did you have a patch to add that Tracy support in? I think if that's already available we should add it to this PR. |
a22a5b5
to
167aba4
Compare
@pchintalapudi, please see 4608aba and give your thoughts. |
Unfortunately this is not thread safe. As an example, two threads could both add to the atomic variable, but then the thread with the higher value calls TracyCPlot before the thread with the lower value is able to. If I'm understanding correctly, this would cause the graph to show a negative instead of a positive slope. Ideally the library itself needs to provide inc/dec functions to avoid this kind of issue. |
Yeah, I was kind of winging it. I'll look into adding it to Tracy but that might take a while. What's best to do in the meantime? |
I think your commit could be okay for now, but with explicit documentation that it may be imprecise. |
Woukd be good with input from @topolarity on the tracy implementation above. |
Left one small note that probably needs attention Otherwise I think that change is good to kick things off 👍 We should see about fixing the "tearing" when we decide to improve Tracy's fiber lock. |
I pushed the tracy code to this branch. |
There are existing Tracy plots (for the GC) at Lines 3097 to 3099 in 827d34a
and Lines 3492 to 3494 in 827d34a
Presumably these should be deleted, or moved to the generic counting system? |
I can delete them, I just don't know how to apply the memory units to the counter's graph of heap size, which is why I left them in. I also don't know if the generic stubs I added are exactly equivalent to what that plot is supposed to show. |
We can probably leave making the heap size plots generic for later, but fwiw it's valid to call In the mean time, I'd like not to delete the heap size plot or lose the |
80950dc
to
0e8adb7
Compare
@topolarity I wired the memory plot configuration into timing.c instead and deleted the old plotting code, could you check if the memory plot still exists and is correct with Tracy? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new plot seems generally synchronized, but compared to before it's missing the peak value from the beginning of the sweep phase:
Is the old value a bug?
It might also be worth experimenting with moving the updates to where the per-thread allocations actually happen. That would provide a richer allocation view, assuming that the traffic isn't too excessive.
I think per-thread allocations would be too expensive, a short loop can quickly blow up the number of counter events. I don't know about the divergence from the old plot, but I'm plotting the precise value of live_bytes whenever we update it. If there's something else that's added to live_bytes to get the total heap size, then we could look at adding that too. |
Agreed, let's not do that 👍
The desired update pre-sweep is the bytes allocated by the GC just before reclaiming in the sweep phase (so that the downward slope happens over the sweep phase only, the peak is representative of the max heap size since the last collection, and the delta is the bytes reclaimed). |
I think the tracy plot should now return to its original high and negative slope appearance with this change, and I've added a few more memory plots for JIT memory and image memory. If all of these look good I think we should be good to merge @topolarity ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
size_t code_size = 0; | ||
size_t data_size = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On aarch64 darwin I'm getting
/Users/mose/repo/julia/src/jitlayers.cpp:858:20: warning: variable 'data_size' set but not used [-Wunused-but-set-variable]
size_t data_size = 0;
^
/Users/mose/repo/julia/src/jitlayers.cpp:857:20: warning: variable 'code_size' set but not used [-Wunused-but-set-variable]
size_t code_size = 0;
^
(and currently we aren't using -Werror
on CI on macOS)
VTune has functionality to track counters as a function of time, so we can track invalidation count over a program's lifetime directly inside the profiler report. There are probably other cases where this is useful.