Skip to content

Commit

Permalink
Add full documentation for updated metrics (#224)
Browse files Browse the repository at this point in the history
Co-authored-by: Nick Curtis <nicholas.curtis@amd.com>
Signed-off-by: colramos-amd <colramos@amd.com>
  • Loading branch information
coleramos425 and skyreflectedinmirrors committed Feb 6, 2024
1 parent b0cf695 commit 966e50b
Show file tree
Hide file tree
Showing 100 changed files with 10,914 additions and 319 deletions.
433 changes: 337 additions & 96 deletions src/docs/analysis.md

Large diffs are not rendered by default.

18 changes: 13 additions & 5 deletions src/docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -32,13 +32,13 @@ def install(package):
# -- Project information -----------------------------------------------------

project = "Omniperf"
copyright = "2022, Audacious Software Group"
copyright = "2023-2024, Audacious Software Group"
author = "Audacious Software Group"

# The short X.Y version
version = repo_version
# The full version, including alpha/beta/rc tags
release = ""
release = repo_version

# -- General configuration ---------------------------------------------------

Expand All @@ -52,9 +52,12 @@ def install(package):
"myst_parser",
]

myst_heading_anchors = 2
show_authors = True

myst_heading_anchors = 4
# enable replacement of (tm) & friends
myst_enable_extensions = ["replacements"]
myst_enable_extensions = ["replacements", "dollarmath"]


# Add any paths that contain templates here, relative to this directory.
templates_path = ["_templates"]
Expand Down Expand Up @@ -112,6 +115,10 @@ def install(package):
# so a file named "default.css" will overwrite the builtin "default.css".
html_static_path = ["_static"]

latex_elements = {
"sphinxsetup": 'verbatimwrapslines=true, verbatimforcewraps=true',
}


# -- Options for HTMLHelp output ---------------------------------------------

Expand All @@ -130,7 +137,7 @@ def install(package):
# Toc options
"collapse_navigation": True,
"sticky_navigation": True,
"navigation_depth": 4,
"navigation_depth": 5,
"includehidden": True,
"titles_only": False,
}
Expand Down Expand Up @@ -162,6 +169,7 @@ def setup(app):
app.add_transform(AutoStructify)
app.add_config_value("docstring_replacements", {}, True)
app.connect("source-read", replaceString)
app.add_css_file("css/custom.css")


# function to replace version string througout documentation
Expand Down
22 changes: 17 additions & 5 deletions src/docs/faq.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
:maxdepth: 4
```

**1. How do I export profiling data I've already generated using Omniperf?**
**1. How do I export profiling data I have already generated using Omniperf?**

In order to interact with the Grafana GUI you must sync data with the MongoDB backend. This interaction is done through ***database*** mode.

Expand Down Expand Up @@ -35,11 +35,23 @@ $ export LANG=C.UTF-8

1. Open MobaXterm
2. In the top ribbon, select `Tunneling`
![Tunnel Button](images/tunnel_demo1.png)
``` {image} images/tunnel_demo1.png
:alt: MobaXterm Tunnel Button
:class: bg-primary
:align: center
```
This pop up will appear
![Pop up](images/tunnel_demo2.png)
``` {image} images/tunnel_demo2.png
:alt: MobaXterm Pop Up
:class: bg-primary
:align: center
```
3. Press `New SSH tunnel`
![Pop up](images/tunnel_demo3.png)
``` {image} images/tunnel_demo3.png
:alt: MobaXterm Pop Up
:class: bg-primary
:align: center
```
4. Configure tunnel accordingly

Local clients
Expand All @@ -52,4 +64,4 @@ This pop up will appear
SSH Server
- SSH server: Name of the server one is connecting to
- SSH login: Username to login to the server
- SSH port: 22
- SSH port: 22
23 changes: 12 additions & 11 deletions src/docs/getting_started.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,13 +10,13 @@

1. **Launch & Profile the target application with the command line profiler**

The command line profiler launches the target application, calls the rocProfiler API, and collects profile results for the specified kernels, dispatches, and/or IP blocks. If not specified, Omniperf will default to collecting all available counters for all kernels/dispatches launched by the user's executable.
The command line profiler launches the target application, calls the rocProfiler API via the rocProf binary, and collects profile results for the specified kernels, dispatches, and/or hardware components. If not specified, Omniperf will default to collecting all available counters for all kernels/dispatches launched by the user's executable.

To collect the default set of data for all kernels in the target application, launch, e.g.:
```shell
$ omniperf profile -n vcopy_data -- ./vcopy -n 1048576 -b 256
$ omniperf profile -n vcopy_data -- ./vcopy 1048576 256
```
The app runs, each kernel is launched, and profiling results are generated. By default, results are written to (e.g.,) ./workloads/vcopy_data (configurable via the `-n` argument). To collect all requested profile information, it may be required to replay kernels multiple times.
The app runs, each kernel is launched, and profiling results are generated. By default, results are written to e.g., ./workloads/vcopy_data (configurable via the `-n` argument). To collect all requested profile information, it may be required to replay kernels multiple times.

2. **Customize data collection**

Expand All @@ -25,19 +25,20 @@

Some common filters include:

- `-k`/`--kernel` enables filtering kernels by name. `-d`/`--dispatch` enables filtering based on dispatch ID
- `-b`/`--ipblocks` enables collects metrics for only the specified (one or more) IP Blocks.
- `-k`/`--kernel` enables filtering kernels by name.
- `-d`/`--dispatch` enables filtering based on dispatch ID.
- `-b`/`--ipblocks` enables collects metrics for only the specified (one or more) hardware component blocks.

To view available metrics by IP Block you can use the `--list-metrics` argument to view a list of all available metrics organized by IP Block.
To view available metrics by IP Block you can use the `--list-metrics` argument:
```shell
$ omniperf analyze --list-metrics <sys_arch>
```

3. **Analyze at the command line**

After generating a local output folder (./workloads/\<name>), the command line tool can also be used to quickly interface with profiling results. View different metrics derived from your profiled results and get immediate access all metrics organized by IP block.
After generating a local output folder (./workloads/\<name>), the command line tool can also be used to quickly interface with profiling results. View different metrics derived from your profiled results and get immediate access all metrics organized by IP blocks.

If no kernel, dispatch, or ipblock filters are applied at this stage, analysis will be reflective of the entirety of the profiling data.
If no kernel, dispatch, or hardware block filters are applied at this stage, analysis will be reflective of the entirety of the profiling data.

To interact with profiling results from a different session, users just provide the workload path. `-p`/`--path` enables users to analyze existing profiling data in the Omniperf CLI.

Expand All @@ -55,7 +56,7 @@
### Modes
Modes change the fundamental behavior of the Omniperf command line tool. Depending on which mode is chosen, different command line options become available.

- **Profile**: Target application is launched on the local system utilizing AMD’s [ROC Profiler](https://github.com/ROCm-Developer-Tools/rocprofiler). Depending on the profiling options chosen, selected kernels, dispatches, and/or IP Blocks in the application are profiled and results are stored locally in an output folder (./workloads/\<name>).
- **Profile**: Target application is launched on the local system using AMD’s [ROC Profiler](https://github.com/ROCm-Developer-Tools/rocprofiler). Depending on the profiling options chosen, selected kernels, dispatches, and/or hardware components in the application are profiled and results are stored locally in an output folder (./workloads/\<name>).

```shell
$ omniperf profile --help
Expand All @@ -65,7 +66,7 @@ Modes change the fundamental behavior of the Omniperf command line tool. Dependi

To gererate a lightweight GUI interface users can add the `--gui` flag to their analysis command.

This mode is designed to be a middle ground to the highly detailed Omniperf Grafana GUI and is great for users who want immediate access to an IP Block they’re already familiar with.
This mode is designed to be a middle ground to the highly detailed Omniperf Grafana GUI and is great for users who want immediate access to a hardware component they’re already familiar with.

```shell
$ omniperf analyze --help
Expand All @@ -90,4 +91,4 @@ Standalone roofline analysis | profile | `--name`, `--roof-only`, `-- <profile_c
Import a workload to database | database | `--import`, `--host`, `--username`, `--workload`, `--team`
Remove a workload from database | database | `--remove`, `--host`, `--username`, `--workload`, `--team`
Launch standalone GUI from CLI | analyze | `--path`, `--gui`
Interact with profiling results from CLI | analyze | `--path`
Interact with profiling results from CLI | analyze | `--path`
4 changes: 2 additions & 2 deletions src/docs/high_level_design.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,10 +8,10 @@

The [Omniperf](https://github.com/AMDResearch/omniperf) Tool is architecturally composed of three major components, as shown in the following figure.

- **Omniperf Profiling**: Acquire raw performance counters via application replay based on the [rocProfiler](https://rocm.docs.amd.com/projects/rocprofiler/en/latest/rocprof.html). The counters are stored in a comma-seperated value, for further analyis. A set of MI200 specific micro benchmarks are also run to acquire the hierarchical roofline data. The roofline model is not available on earlier accelerators.
- **Omniperf Profiling**: Acquire raw performance counters via application replay based on [rocProf](https://rocm.docs.amd.com/projects/rocprofiler/en/latest/rocprof.html). The counters are stored in a comma-seperated value, for further analysis. A set of MI200 specific micro benchmarks are also run to acquire the hierarchical roofline data. The roofline model is not available on earlier accelerators.

- **Omniperf Grafana Analyzer**:
- *Grafana database import*: All raw performance counters are imported into the backend MongoDB database for Grafana GUI analysis and visualization. Compatibility of previously generated data between Omniperf versions is not necessarily guarenteed.
- *Grafana database import*: All raw performance counters are imported into the backend MongoDB database for Grafana GUI analysis and visualization. Compatibility of previously generated data between Omniperf versions is not necessarily guaranteed.
- *Grafana GUI Analyzer*: A Grafana dashboard is designed to retrieve the raw counters info from the backend database. It also creates the relevant performance metrics and visualization.
- **Omniperf Standalone GUI Analyzer**: A standalone GUI is provided to enable performance analysis without importing data into the backend database.

Expand Down
Binary file removed src/docs/images/Arithmetic_operations.png
Binary file not shown.
Binary file removed src/docs/images/Command_processor.png
Binary file not shown.
Binary file removed src/docs/images/Comp_pipe_sol.png
Binary file not shown.
Binary file removed src/docs/images/Compute_pipeline_stats.png
Binary file not shown.
Binary file removed src/docs/images/Constant_cache_l2_interface.png
Binary file not shown.
Binary file removed src/docs/images/Constant_cache_stats.png
Binary file not shown.
Binary file removed src/docs/images/Instruc_cache_sol.png
Binary file not shown.
Binary file removed src/docs/images/Instruction_cache_stats.png
Binary file not shown.
Binary file removed src/docs/images/Instruction_mix.png
Binary file not shown.
Binary file removed src/docs/images/L1D_sol.png
Binary file not shown.
Binary file removed src/docs/images/L1_cache_stalls.png
Binary file not shown.
Binary file removed src/docs/images/L1_l2_transactions.png
Binary file not shown.
Binary file removed src/docs/images/L1_utcl1_transactions.png
Binary file not shown.
Binary file removed src/docs/images/L2_cache_accesses.png
Binary file not shown.
Binary file removed src/docs/images/L2_cache_sol.png
Binary file not shown.
Binary file removed src/docs/images/L2_ea_stalls.png
Binary file not shown.
Binary file removed src/docs/images/L2_ea_transactions.png
Binary file not shown.
Binary file removed src/docs/images/L2_ea_transactions_per_channel.png
Binary file not shown.
Binary file removed src/docs/images/LDS_sol.png
Binary file not shown.
Binary file removed src/docs/images/LDS_stats.png
Binary file not shown.
Binary file removed src/docs/images/MFMA_arithmetic_instruction_mix.png
Binary file not shown.
Binary file removed src/docs/images/Memory_chart_analysis.png
Binary file not shown.
Binary file removed src/docs/images/Shader_processing_input.png
Binary file not shown.
Binary file removed src/docs/images/System_info_panel.png
Binary file not shown.
Binary file removed src/docs/images/System_speed_of_light.png
Binary file not shown.
Binary file removed src/docs/images/Texture_address.png
Diff not rendered.
Binary file removed src/docs/images/Texture_data.png
Diff not rendered.
Binary file removed src/docs/images/VALU_arithmetic_instruction_mix.png
Diff not rendered.
Binary file removed src/docs/images/VMEM_arithmetic_intensity_mix.png
Diff not rendered.
Binary file removed src/docs/images/Vec_L1D_cache_accesses.png
Diff not rendered.
Binary file removed src/docs/images/Vec_L1D_cache_sol.png
Diff not rendered.
Binary file removed src/docs/images/Wavefront_launch.png
Diff not rendered.
Binary file added src/docs/images/cpc_panel.png
Binary file added src/docs/images/cpf_panel.png
Binary file added src/docs/images/cu-arith-ops_panel.png
Binary file added src/docs/images/cu-inst-mix_panel.png
Binary file added src/docs/images/cu-pipeline-stats_panel.png
Binary file added src/docs/images/cu-sol_panel.png
Binary file added src/docs/images/cu-vmem-instr-mix_panel.png
Binary file added src/docs/images/fabric.png
Loading

0 comments on commit 966e50b

Please sign in to comment.