-
Notifications
You must be signed in to change notification settings - Fork 298
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
engine.report_pmu() [WIP] #616
Closed
Closed
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Keep the arrays containing machine code alive by storing references to them in a Lua table. Otherwise they will be garbage collected. dynasm uses a GC callback to unmap memory that was used for generated code and so the most likely consequence is a segfault. Here is how it looks in dmesg: segfault at 7fe0d50de000 ip 00007fe0d50de000 sp 00007ffcd2c89cb8 error 14 where "error 14" means an error during instruction fetch. This problem triggered immediately when using the pmu library with non-trivial code under test (running an app network).
Renamed 'ref-cycles' to 'ref_cycles' both for consistency with other counters and to make it easier to use as a Lua table key. report() now takes a table for input rather than a counterset. This makes it easy to write Lua code that manipulates data (e.g. taking deltas from separate runs) and then call report() to format it. report() now lexically sorts the counters based on their names, with the exception of the fixed-purpose counters (cycles, instructions, ref_cycles) that are printed first in a fixed order. This is intended to increase consistency and make it slightly easier to compare results by eyeball.
This is a quick implementation that is hard coded to enabled. Prints a report like this: *** Tee EVENT TOTAL /packet cycles 14,872,691,360 14.868 ref_cycles 11,169,642,000 11.166 instructions 36,740,872,664 36.728 br_misp_retired.all_branches 3,976,251 0.004 mem_load_uops_retired.l1_hit 12,770,250,074 12.766 mem_load_uops_retired.l2_hit 1,022,848,683 1.022 mem_load_uops_retired.l3_hit 130,238 0.000 mem_load_uops_retired.l3_miss 0 0.000 packet 1,000,347,915 1.000 *** Source EVENT TOTAL /packet cycles 31,175,791,454 31.165 ref_cycles 23,407,559,568 23.399 instructions 60,442,148,650 60.421 br_misp_retired.all_branches 10,879,179 0.011 mem_load_uops_retired.l1_hit 23,403,995,255 23.396 mem_load_uops_retired.l2_hit 107,486,748 0.107 mem_load_uops_retired.l3_hit 281,230 0.000 mem_load_uops_retired.l3_miss 0 0.000 packet 1,000,347,915 1.000 *** Sink EVENT TOTAL /packet cycles 19,251,176,856 19.244 ref_cycles 14,454,619,392 14.450 instructions 45,548,677,978 45.533 br_misp_retired.all_branches 3,945,214 0.004 mem_load_uops_retired.l1_hit 8,737,428,004 8.734 mem_load_uops_retired.l2_hit 1,016,554,792 1.016 mem_load_uops_retired.l3_hit 121,803 0.000 mem_load_uops_retired.l3_miss 0 0.000 packet 1,000,347,915 1.000
This is a quick change to make the 'snabbmark basic1' benchmark print per-app PMU counters. It also comments out one of the links on the 'Tee' app to make the basic1 benchmark more directly comparable with the benchmark in snabbco#615.
Closing PR: I am not actively developing this branch. |
dpino
pushed a commit
to dpino/snabb
that referenced
this pull request
Dec 6, 2016
Switch to 1-based indexing in snabb-softwire-v1
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR is an alternative to #615 for tracking performance counter values per app.
This branch counts the performance events during each individual
push()
andpull()
call and accumulates a total for each app. That is to say that it directly counts the number cycles, cache misses, etc, that occur during each callback.The function
engine.report_pmu()
prints the values for each app and also calculates per-packet values. (The number of packets the apps has processed is determined by its links.)The event counting seems to work surprisingly well. I was concerned that the individual callbacks would be too short and so there would be too much noise when trying to count their PMU events. However, the initial results are very consistent with the numbers reported in #615 based on long running averages.
Results for a
Tee
app in aSource->Tee->Sink
network from this branch:and from #615 with comparable
/packet
column:The overhead is significant however and the basic1 benchmark loses around 1/3 of its throughput when sampling the PMU counters. So you would only enable this feature when you are willing to sacrifice overall performance to see per-app performance.
Could be a good plan to tidy up this code and merge it in preference to #615.
I am still interested in having an
appbench
style program that can generate a "datasheet" for an app that estimates how it will perform in different situations (packet size, traffic mix, data in L1/L2/L3/DRAM, etc). However, it should be possible to build that on this code anyway.