-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Errors with CLI analysis #103
Comments
We saw this error with another workload today. If there is any insight, would be good to have. |
Hi Gina. Thank you for reporting this issue. Further investigation of your workload (specifically workloads/current/mi200/pmc_perf.csv) has uncovered multiple dispatches where This issue is arising because when attempting to eval the Python expression, it's attempting division by a NaN. $ ./src/omniperf analyze -p workloads/current/mi200/ -b 2.1.8 -g
--------
Analyze
--------
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
raw pmc df info:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 17845 entries, 0 to 17844
Columns: 1128 entries, ('SQ_IFETCH_LEVEL', 'Index') to ('pmc_perf', 'CompleteNs')
dtypes: float64(104), int64(1005), object(18), uint64(1)
memory usage: 153.6+ MB
None
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
filtered pmc df info:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 17845 entries, 0 to 17844
Columns: 1128 entries, ('SQ_IFETCH_LEVEL', 'Index') to ('pmc_perf', 'CompleteNs')
dtypes: float64(104), int64(1005), object(18), uint64(1)
memory usage: 153.6+ MB
None
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Expression:
Value =
to_avg(((100 * raw_pmc_df.get('pmc_perf').get("SQ_ACTIVE_INST_SCA")) / (raw_pmc_df.get('pmc_perf').get("GRBM_GUI_ACTIVE") * ammolite__numCU)))
Inputs:
Var ammolite__numCU : 104
Output:
inf
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Expression:
Peak =
100
Inputs:
Output:
100
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Expression:
PoP =
to_avg(((100 * raw_pmc_df.get('pmc_perf').get("SQ_ACTIVE_INST_SCA")) / (raw_pmc_df.get('pmc_perf').get("GRBM_GUI_ACTIVE") * ammolite__numCU)))
Inputs:
Var ammolite__numCU : 104
Output:
inf
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
--------------------------------------------------------------------------------
2. System Speed-of-Light
╒═════════╤═══════════╤═════════╤════════╤════════╤═══════╕
│ Index │ Metric │ Value │ Unit │ Peak │ PoP │
╞═════════╪═══════════╪═════════╪════════╪════════╪═══════╡
│ 2.1.8 │ SALU Util │ inf │ Pct │ 100 │ inf │
╘═════════╧═══════════╧═════════╧════════╧════════╧═══════╛ Before implementing a full-fledged patch I'd like to understand why rocprof is reporting these numbers. At the very least we will update code to throw a warning if illogical |
I am seeing this in a situation where I am analyzing the top N kernels, 1 by 1. A small subset of the kernels are showing this error. Why wouldn't I see the error for all kernels? |
@PaulMullowney we know this error is triggered when arithmetic encounters a dispatch where ( As discussed in Teams chat we have a few tests planned to help clarify why ( I'll follow up in the next few days after running these tests |
Update: |
Signed-off-by: coleramos425 <colramos@amd.com>
We've updated the Omniperf code s.t. anytime this Our custom merge utility didn't fix the original issue so we've passed the issue to rocprof team. Awaiting response... Pushing issue to a future milestone |
Signed-off-by: coleramos425 <colramos@amd.com> Signed-off-by: fei.zheng <fei.zheng@amd.com>
Signed-off-by: coleramos425 <colramos@amd.com> Signed-off-by: fei.zheng <fei.zheng@amd.com>
Signed-off-by: colramos-amd <colramos@amd.com>
While the underlying issue seems to still be present in rocprofiler: Omniperf will catch the bug and throw a warning via the above commits. Closing issue. |
Can this error be worked around?
Omniperf version I am using:
The text was updated successfully, but these errors were encountered: