From f5712875aa8ca491c71ada2537bfe8ee29f98005 Mon Sep 17 00:00:00 2001 From: David Galiffi Date: Thu, 21 Mar 2024 11:14:37 -0400 Subject: [PATCH] Updated links in documentation. (#328) Updated to reflect new GitHub organization. Fixed broken links to GitHub pages. Signed-off-by: David Galiffi --- CMakeLists.txt | 2 +- CONTRIBUTING.md | 12 ++++++------ README.md | 10 +++++----- cmake/omniperf.lua.in | 2 +- src/docs-1.x/analysis.md | 12 ++++++------ src/docs-1.x/getting_started.md | 2 +- src/docs-1.x/high_level_design.md | 2 +- src/docs-1.x/index.md | 2 +- src/docs-1.x/installation.md | 2 +- src/docs-1.x/introduction.md | 4 ++-- src/docs-1.x/profiling.md | 4 ++-- src/docs-2.x/analysis.md | 12 ++++++------ src/docs-2.x/getting_started.md | 2 +- src/docs-2.x/high_level_design.md | 2 +- src/docs-2.x/index.md | 2 +- src/docs-2.x/installation.md | 2 +- src/docs-2.x/introduction.md | 2 +- src/docs-2.x/performance_model.md | 16 ++++++++-------- src/docs-2.x/profiling.md | 2 +- 19 files changed, 47 insertions(+), 47 deletions(-) diff --git a/CMakeLists.txt b/CMakeLists.txt index 6838c882b..3448ee326 100644 --- a/CMakeLists.txt +++ b/CMakeLists.txt @@ -29,7 +29,7 @@ project( VERSION ${OMNIPERF_VERSION} LANGUAGES C DESCRIPTION "OmniPerf" - HOMEPAGE_URL "https://github.com/AMDResearch/omniperf") + HOMEPAGE_URL "https://github.com/ROCm/omniperf") include(ExternalProject) include(GNUInstallDirs) diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index aeeea7f0b..a806fa68c 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -1,11 +1,11 @@ ## How to fork from us -To keep our development fast and conflict free, we recommend you to [fork](https://github.com/AMDResearch/omniperf/fork) our repository and start your work from our `dev` branch in your private repository. +To keep our development fast and conflict free, we recommend you to [fork](https://github.com/ROCm/omniperf/fork) our repository and start your work from our `dev` branch in your private repository. Afterwards, git clone your repository to your local machine. But that is not it! To keep track of the original develop repository, add it as another remote. ``` -git remote add mainline https://github.com/AMDResearch/omniperf.git +git remote add mainline https://github.com/ROCm/omniperf.git git checkout dev ``` @@ -21,19 +21,19 @@ and apply your changes there. ### Did you find a bug? -- Ensure the bug was not already reported by searching on GitHub under [Issues](https://github.com/AMDResearch/omniperf/issues). +- Ensure the bug was not already reported by searching on GitHub under [Issues](https://github.com/ROCm/omniperf/issues). -- If you're unable to find an open issue addressing the problem, [open a new one](https://github.com/AMDResearch/omniperf/issues/new). +- If you're unable to find an open issue addressing the problem, [open a new one](https://github.com/ROCm/omniperf/issues/new). ### Did you write a patch that fixes a bug? -- Open a new GitHub [pull request](https://github.com/AMDResearch/omniperf/compare) with the patch. +- Open a new GitHub [pull request](https://github.com/ROCm/omniperf/compare) with the patch. - Ensure the PR description clearly describes the problem and solution. If there is an existing GitHub issue open describing this bug, please include it in the description so we can close it. - Ensure the PR is based on the `dev` branch of the Omniperf GitHub repository. -- Omniperf requires new commits to include a "Signed-off-by" token in the commit message (typically enabled via the `git commit -s` option), indicating your agreement to the projects's [Developer's Certificate of Origin](https://developercertificate.org/) and compatability with the project [LICENSE](https://github.com/AMDResearch/omniperf/blob/main/LICENSE): +- Omniperf requires new commits to include a "Signed-off-by" token in the commit message (typically enabled via the `git commit -s` option), indicating your agreement to the projects's [Developer's Certificate of Origin](https://developercertificate.org/) and compatability with the project [LICENSE](https://github.com/ROCm/omniperf/blob/main/LICENSE): > (a) The contribution was created in whole or in part by me and I diff --git a/README.md b/README.md index a6f27ab1b..f9e0345e9 100644 --- a/README.md +++ b/README.md @@ -1,7 +1,7 @@ -[![Ubuntu 20.04](https://github.com/AMDResearch/omniperf/actions/workflows/ubuntu-focal.yml/badge.svg)](https://github.com/AMDResearch/omniperf/actions/workflows/ubuntu-focal.yml) -[![RHEL 8](https://github.com/AMDResearch/omniperf/actions/workflows/opensuse.yml/badge.svg)](https://github.com/AMDResearch/omniperf/actions/workflows/opensuse.yml) -[![MI100](https://github.com/AMDResearch/omniperf/actions/workflows/mi100.yml/badge.svg?branch=2.x)](https://github.com/AMDResearch/omniperf/actions/workflows/mi100.yml) -[![Docs](https://github.com/AMDResearch/omniperf/actions/workflows/docs.yml/badge.svg)](https://amdresearch.github.io/omniperf/) +[![Ubuntu 20.04](https://github.com/ROCm/omniperf/actions/workflows/ubuntu-focal.yml/badge.svg)](https://github.com/ROCm/omniperf/actions/workflows/ubuntu-focal.yml) +[![RHEL 8](https://github.com/ROCm/omniperf/actions/workflows/opensuse.yml/badge.svg)](https://github.com/ROCm/omniperf/actions/workflows/opensuse.yml) +[![MI100](https://github.com/ROCm/omniperf/actions/workflows/mi100.yml/badge.svg?branch=2.x)](https://github.com/ROCm/omniperf/actions/workflows/mi100.yml) +[![Docs](https://github.com/ROCm/omniperf/actions/workflows/docs.yml/badge.svg)](https://rocm.github.io/omniperf/) [![DOI](https://zenodo.org/badge/561919887.svg)](https://zenodo.org/badge/latestdoi/561919887) @@ -14,7 +14,7 @@ targets usage on MI100 and MI200 accelerators. * For more information on available features, installation steps, and workload profiling and analysis, please refer to the online -[documentation](https://amdresearch.github.io/omniperf). +[documentation](https://rocm.github.io/omniperf). * Omniperf is an AMD open source research project and is not supported as part of the ROCm software stack. We welcome contributions and diff --git a/cmake/omniperf.lua.in b/cmake/omniperf.lua.in index bc2182302..ca55ab6f6 100644 --- a/cmake/omniperf.lua.in +++ b/cmake/omniperf.lua.in @@ -12,7 +12,7 @@ whatis("Name: omniperf") whatis("Version: @OMNIPERF_FULL_VERSION@") whatis("Keywords: Profiling, Performance, GPU") whatis("Description: tool for GPU performance profiling") -whatis("URL: https://github.com/AMDResearch/omniperf") +whatis("URL: https://github.com/ROCm/omniperf") -- Export environmental variables local topDir="@CMAKE_INSTALL_PREFIX@" diff --git a/src/docs-1.x/analysis.md b/src/docs-1.x/analysis.md index 9b68249c4..e8e18fe90 100644 --- a/src/docs-1.x/analysis.md +++ b/src/docs-1.x/analysis.md @@ -12,7 +12,7 @@ While analyzing with the CLI offers quick and straightforward access to Omniperf See sections below for more information on each. ## CLI Analysis -> Profiling results from the [aforementioned vcopy workload](https://amdresearch.github.io/omniperf/profiling.html#workload-compilation) will be used in the following sections to demonstrate the use of Omniperf in MI GPU performance analysis. Unless otherwise noted, the performance analysis is done on the MI200 platform. +> Profiling results from the [aforementioned vcopy workload](https://rocm.github.io/omniperf/profiling.html#workload-compilation) will be used in the following sections to demonstrate the use of Omniperf in MI GPU performance analysis. Unless otherwise noted, the performance analysis is done on the MI200 platform. ### Features @@ -171,7 +171,7 @@ $ omniperf analyze -p workloads/vcopy/mi200/ --list-metrics gfx90a ├─────────┼─────────────────────────────┤ ... ``` - 2. Choose your own customized subset of metrics with `-b` (a.k.a. `--metric`), or build your own config following [config_template](https://github.com/AMDResearch/omniperf/blob/main/src/omniperf_analyze/configs/panel_config_template.yaml). Below shows how to generate a report containing only metric 2 (a.k.a. System Speed-of-Light). + 2. Choose your own customized subset of metrics with `-b` (a.k.a. `--metric`), or build your own config following [config_template](https://github.com/ROCm/omniperf/blob/main/src/omniperf_analyze/configs/panel_config_template.yaml). Below shows how to generate a report containing only metric 2 (a.k.a. System Speed-of-Light). ```shell-session $ omniperf analyze -p workloads/vcopy/mi200/ -b 2 -------- @@ -351,7 +351,7 @@ be generated directly from the command-line. This option is provided as an alternative for users wanting to explore profiling results graphically, but without the additional setup requirements or server-side overhead of Omniperf's detailed [Grafana -interface](https://amdresearch.github.io/omniperf/analysis.html#grafana-based-gui) +interface](https://rocm.github.io/omniperf/analysis.html#grafana-based-gui) option. The standalone GUI analyzer is provided as simple [Flask](https://flask.palletsprojects.com/en/2.2.x/) application allowing users to view results from within a web browser. @@ -365,7 +365,7 @@ between the desired web browser host (e.g. login node or remote workstation) and required. Alternatively, users may find it more convenient to download profiled workloads to perform analysis on their local system. -See [FAQ](https://amdresearch.github.io/omniperf/faq.html) for more details on SSH tunneling. +See [FAQ](https://rocm.github.io/omniperf/faq.html) for more details on SSH tunneling. ``` #### Usage @@ -420,7 +420,7 @@ Once you have applied a filter, you will also see several additional sections become available with detailed metrics specific to that area of AMD hardware. These detailed sections mirror the data displayed in Omniperf's [Grafana -interface](https://amdresearch.github.io/omniperf/analysis.html#grafana-based-gui). +interface](https://rocm.github.io/omniperf/analysis.html#grafana-based-gui). ### Grafana-based GUI @@ -470,7 +470,7 @@ For example, if one wants to inspect Dispatch Range from 17 to 48, inclusive, th ##### Incremental Profiling Omniperf supports incremental profiling to significantly speed up performance analysis. -> Refer to [*IP Block profiling*](https://amdresearch.github.io/omniperf/profiling.html#ip-block-profiling) section for this command. +> Refer to [*IP Block profiling*](https://rocm.github.io/omniperf/profiling.html#ip-block-profiling) section for this command. By default, the entire application is profiled to collect perfmon counter for all IP blocks, giving a system level view of where the workload stands in terms of performance optimization opportunities and bottlenecks. diff --git a/src/docs-1.x/getting_started.md b/src/docs-1.x/getting_started.md index 80ae888f0..03fef23aa 100644 --- a/src/docs-1.x/getting_started.md +++ b/src/docs-1.x/getting_started.md @@ -74,7 +74,7 @@ Modes change the fundamental behavior of the Omniperf command line tool. Dependi - **Database**: Our detailed Grafana GUI is built on a MongoDB database. `--import` profiling results to the DB to interact with the workload in Grafana or `--remove` the workload from the DB. Connection options will need to be specified. See the [*Grafana - Analysis*](https://amdresearch.github.io/omniperf/analysis.html#grafana-gui-import) import section + Analysis*](https://rocm.github.io/omniperf/analysis.html#grafana-gui-import) import section for more details on this. ```shell diff --git a/src/docs-1.x/high_level_design.md b/src/docs-1.x/high_level_design.md index fbe33ca3b..1fd0c5559 100644 --- a/src/docs-1.x/high_level_design.md +++ b/src/docs-1.x/high_level_design.md @@ -6,7 +6,7 @@ :maxdepth: 4 ``` -The [Omniperf](https://github.com/AMDResearch/omniperf) Tool is architecturally composed of three major components, as shown in the following figure. +The [Omniperf](https://github.com/ROCm/omniperf) Tool is architecturally composed of three major components, as shown in the following figure. - **Omniperf Profiling**: Acquire raw performance counters via application replay based on the [rocProfiler](https://rocm.docs.amd.com/projects/rocprofiler/en/latest/rocprof.html). The counters are stored in a comma-seperated value, for further analyis. A set of MI200 specific micro benchmarks are also run to acquire the hierarchical roofline data. The roofline model is not available on earlier accelerators. diff --git a/src/docs-1.x/index.md b/src/docs-1.x/index.md index 931718107..f3569389e 100644 --- a/src/docs-1.x/index.md +++ b/src/docs-1.x/index.md @@ -1,4 +1,4 @@ -# Welcome to the [Omniperf](https://github.com/AMDResearch/omniperf) Documentation! +# Welcome to the [Omniperf](https://github.com/ROCm/omniperf) Documentation! ```eval_rst .. toctree:: diff --git a/src/docs-1.x/installation.md b/src/docs-1.x/installation.md index af8c21bac..acb2b2ba2 100644 --- a/src/docs-1.x/installation.md +++ b/src/docs-1.x/installation.md @@ -51,7 +51,7 @@ defined as follows: A typical install will begin by downloading the latest release tarball available from the -[Releases](https://github.com/AMDResearch/omniperf/releases) section +[Releases](https://github.com/ROCm/omniperf/releases) section of the Omniperf development site. From there, untar and descend into the top-level directory as follows: diff --git a/src/docs-1.x/introduction.md b/src/docs-1.x/introduction.md index f0e3864d1..0d21d7d64 100644 --- a/src/docs-1.x/introduction.md +++ b/src/docs-1.x/introduction.md @@ -6,11 +6,11 @@ :maxdepth: 4 ``` -[Browse Omniperf source code on Github](https://github.com/AMDResearch/omniperf) +[Browse Omniperf source code on Github](https://github.com/ROCm/omniperf) ## Scope -MI Performance Profiler ([Omniperf](https://github.com/AMDResearch/omniperf)) is a system performance profiling tool for Machine Learning/HPC workloads running on AMD Instinct (tm) Accelerators. It is currently built on top of the [rocProfiler](https://rocm.docs.amd.com/projects/rocprofiler/en/latest/rocprof.html) to monitor hardware performance counters. The Omniperf tool primarily targets accelerators in the MI100 and MI200 families. Development is in progress to support MI300 and Radeon (tm) RDNA (tm) GPUs. +MI Performance Profiler ([Omniperf](https://github.com/ROCm/omniperf)) is a system performance profiling tool for Machine Learning/HPC workloads running on AMD Instinct (tm) Accelerators. It is currently built on top of the [rocProfiler](https://rocm.docs.amd.com/projects/rocprofiler/en/latest/rocprof.html) to monitor hardware performance counters. The Omniperf tool primarily targets accelerators in the MI100 and MI200 families. Development is in progress to support MI300 and Radeon (tm) RDNA (tm) GPUs. ## Features diff --git a/src/docs-1.x/profiling.md b/src/docs-1.x/profiling.md index 56c234604..59ecb9ca1 100644 --- a/src/docs-1.x/profiling.md +++ b/src/docs-1.x/profiling.md @@ -6,7 +6,7 @@ :maxdepth: 5 ``` -The [Omniperf](https://github.com/AMDResearch/omniperf) repository +The [Omniperf](https://github.com/ROCm/omniperf) repository includes source code for a sample GPU compute workload, __vcopy.cpp__. A copy of this file is available in the `share/sample` subdirectory after a normal Omniperf installation, or via the @@ -37,7 +37,7 @@ Releasing CPU memory ``` ## Omniperf Profiling -The *omniperf* script, availible through the [Omniperf](https://github.com/AMDResearch/omniperf) repository, is used to aquire all necessary perfmon data through analysis of compute workloads. +The *omniperf* script, availible through the [Omniperf](https://github.com/ROCm/omniperf) repository, is used to aquire all necessary perfmon data through analysis of compute workloads. **omniperf help:** ```shell-session diff --git a/src/docs-2.x/analysis.md b/src/docs-2.x/analysis.md index 5ba55fbfa..184434f9f 100644 --- a/src/docs-2.x/analysis.md +++ b/src/docs-2.x/analysis.md @@ -12,7 +12,7 @@ While analyzing with the CLI offers quick and straightforward access to Omniperf See sections below for more information on each. ## CLI Analysis -> Profiling results from the [aforementioned vcopy workload](https://amdresearch.github.io/omniperf/profiling.html#workload-compilation) will be used in the following sections to demonstrate the use of Omniperf in MI GPU performance analysis. Unless otherwise noted, the performance analysis is done on the MI200 platform. +> Profiling results from the [aforementioned vcopy workload](https://rocm.github.io/omniperf/profiling.html#workload-compilation) will be used in the following sections to demonstrate the use of Omniperf in MI GPU performance analysis. Unless otherwise noted, the performance analysis is done on the MI200 platform. ### Features @@ -266,7 +266,7 @@ Analysis mode = cli 2.1.30 -> L1I Fetch Latency ... ``` -3. Choose your own customized subset of metrics with `-b` (a.k.a. `--block`), or build your own config following [config_template](https://github.com/AMDResearch/omniperf/blob/main/src/omniperf_analyze/configs/panel_config_template.yaml). Below shows how to generate a report containing only metric 2 (a.k.a. System Speed-of-Light). +3. Choose your own customized subset of metrics with `-b` (a.k.a. `--block`), or build your own config following [config_template](https://github.com/ROCm/omniperf/blob/main/src/omniperf_analyze/configs/panel_config_template.yaml). Below shows how to generate a report containing only metric 2 (a.k.a. System Speed-of-Light). ```shell-session $ omniperf analyze -p workloads/vcopy/MI200/ -b 2 -------- @@ -363,7 +363,7 @@ be generated directly from the command-line. This option is provided as an alternative for users wanting to explore profiling results graphically, but without the additional setup requirements or server-side overhead of Omniperf's detailed [Grafana -interface](https://amdresearch.github.io/omniperf/analysis.html#grafana-based-gui) +interface](https://rocm.github.io/omniperf/analysis.html#grafana-based-gui) option. The standalone GUI analyzer is provided as simple [Flask](https://flask.palletsprojects.com/en/2.2.x/) application allowing users to view results from within a web browser. @@ -377,7 +377,7 @@ between the desired web browser host (e.g. login node or remote workstation) and required. Alternatively, users may find it more convenient to download profiled workloads to perform analysis on their local system. -See [FAQ](https://amdresearch.github.io/omniperf/faq.html) for more details on SSH tunneling. +See [FAQ](https://rocm.github.io/omniperf/faq.html) for more details on SSH tunneling. ``` #### Usage @@ -437,7 +437,7 @@ Once you have applied a filter, you will also see several additional sections become available with detailed metrics specific to that area of AMD hardware. These detailed sections mirror the data displayed in Omniperf's [Grafana -interface](https://amdresearch.github.io/omniperf/analysis.html#grafana-based-gui). +interface](https://rocm.github.io/omniperf/analysis.html#grafana-based-gui). ### Grafana-based GUI @@ -487,7 +487,7 @@ For example, if one wants to inspect Dispatch Range from 17 to 48, inclusive, th ##### Incremental Profiling Omniperf supports incremental profiling to significantly speed up performance analysis. -> Refer to [*Hardware Component Filtering*](https://amdresearch.github.io/omniperf/profiling.html#hardware-component-filtering) section for this command. +> Refer to [*Hardware Component Filtering*](https://rocm.github.io/omniperf/profiling.html#hardware-component-filtering) section for this command. By default, the entire application is profiled to collect performance counters for all hardware blocks, giving a complete view of where the workload stands in terms of performance optimization opportunities and bottlenecks. diff --git a/src/docs-2.x/getting_started.md b/src/docs-2.x/getting_started.md index 23cfc9b65..dc4d5e9eb 100644 --- a/src/docs-2.x/getting_started.md +++ b/src/docs-2.x/getting_started.md @@ -75,7 +75,7 @@ Modes change the fundamental behavior of the Omniperf command line tool. Dependi - **Database**: Our detailed Grafana GUI is built on a MongoDB database. `--import` profiling results to the DB to interact with the workload in Grafana or `--remove` the workload from the DB. Connection options will need to be specified. See the [*Grafana - Analysis*](https://amdresearch.github.io/omniperf/analysis.html#grafana-gui-import) import section + Analysis*](https://rocm.github.io/omniperf/analysis.html#grafana-gui-import) import section for more details on this. ```shell diff --git a/src/docs-2.x/high_level_design.md b/src/docs-2.x/high_level_design.md index 1e156296b..436efe7db 100644 --- a/src/docs-2.x/high_level_design.md +++ b/src/docs-2.x/high_level_design.md @@ -6,7 +6,7 @@ :maxdepth: 4 ``` -The [Omniperf](https://github.com/AMDResearch/omniperf) Tool is architecturally composed of three major components, as shown in the following figure. +The [Omniperf](https://github.com/ROCm/omniperf) Tool is architecturally composed of three major components, as shown in the following figure. - **Omniperf Profiling**: Acquire raw performance counters via application replay based on [rocProf](https://rocm.docs.amd.com/projects/rocprofiler/en/latest/rocprof.html). The counters are stored in a comma-seperated value, for further analysis. A set of MI200 specific micro benchmarks are also run to acquire the hierarchical roofline data. The roofline model is not available on earlier accelerators. diff --git a/src/docs-2.x/index.md b/src/docs-2.x/index.md index 4f3f7c107..3b375be0b 100644 --- a/src/docs-2.x/index.md +++ b/src/docs-2.x/index.md @@ -1,4 +1,4 @@ -# Welcome to the [Omniperf](https://github.com/AMDResearch/omniperf) Documentation! +# Welcome to the [Omniperf](https://github.com/ROCm/omniperf) Documentation! ```eval_rst .. toctree:: diff --git a/src/docs-2.x/installation.md b/src/docs-2.x/installation.md index 4c8bb2365..a7fce3199 100644 --- a/src/docs-2.x/installation.md +++ b/src/docs-2.x/installation.md @@ -51,7 +51,7 @@ defined as follows: A typical install will begin by downloading the latest release tarball available from the -[Releases](https://github.com/AMDResearch/omniperf/releases) section +[Releases](https://github.com/ROCm/omniperf/releases) section of the Omniperf development site. From there, untar and descend into the top-level directory as follows: diff --git a/src/docs-2.x/introduction.md b/src/docs-2.x/introduction.md index aa4e0f903..79f18a6aa 100644 --- a/src/docs-2.x/introduction.md +++ b/src/docs-2.x/introduction.md @@ -10,7 +10,7 @@ This documentation was created to provide a detailed breakdown of all facets of This project is proudly open source, and we welcome all feedback! For more details on how to contribute, please see our Contribution Guide. -[Browse Omniperf source code on Github](https://github.com/AMDResearch/omniperf) +[Browse Omniperf source code on Github](https://github.com/ROCm/omniperf) ## What is Omniperf diff --git a/src/docs-2.x/performance_model.md b/src/docs-2.x/performance_model.md index ecdb04ffd..1f564084f 100644 --- a/src/docs-2.x/performance_model.md +++ b/src/docs-2.x/performance_model.md @@ -2178,7 +2178,7 @@ A good discussion of coarse and fine grained memory allocations and what type of (VALU_inst_mix_example)= ## VALU Arithmetic Instruction Mix -For this example, we consider the [instruction mix sample](https://github.com/AMDResearch/omniperf/blob/dev/sample/instmix.hip) distributed as a part of Omniperf. +For this example, we consider the [instruction mix sample](https://github.com/ROCm/omniperf/blob/dev/sample/instmix.hip) distributed as a part of Omniperf. ```{note} This example is expected to work on all CDNA accelerators, however the results in this section were collected on an [MI2XX](2xxnote) accelerator @@ -2269,7 +2269,7 @@ shows that we have exactly one of each type of VALU arithmetic instruction, by c (Fabric_transactions_example)= ## Infinity-Fabric(tm) transactions -For this example, we consider the [Infinity Fabric(tm) sample](https://github.com/AMDResearch/omniperf/blob/dev/sample/fabric.hip) distributed as a part of Omniperf. +For this example, we consider the [Infinity Fabric(tm) sample](https://github.com/ROCm/omniperf/blob/dev/sample/fabric.hip) distributed as a part of Omniperf. This code launches a simple read-only kernel, e.g.: ```c++ @@ -2826,7 +2826,7 @@ On an AMD [MI2XX](2xxnote) accelerator, for FP32 values this will generate a `gl (flatmembench)= ### Global / Generic (FLAT) -For this example, we consider the [vector-memory sample](https://github.com/AMDResearch/omniperf/blob/dev/sample/vmem.hip) distributed as a part of Omniperf. +For this example, we consider the [vector-memory sample](https://github.com/ROCm/omniperf/blob/dev/sample/vmem.hip) distributed as a part of Omniperf. This code launches many different versions of a simple read/write/atomic-only kernels targeting various address spaces, e.g. below is our simple `global_write` kernel: ```c++ @@ -2976,7 +2976,7 @@ The assembly in these experiments were generated for an [MI2XX](2xxnote) acceler Next, we examine a generic write. As discussed [previously](Flat_design), our `generic_write` kernel uses an address space cast to _force_ the compiler to choose our desired address space, regardless of other optimizations that may be possible. -We also note that the `filter` parameter passed in as a kernel argument (see [example](https://github.com/AMDResearch/omniperf/blob/dev/sample/vmem.hip), or [design note](Flat_design)) is set to zero on the host, such that we always write to the 'local' (LDS) memory allocation `lds`. +We also note that the `filter` parameter passed in as a kernel argument (see [example](https://github.com/ROCm/omniperf/blob/dev/sample/vmem.hip), or [design note](Flat_design)) is set to zero on the host, such that we always write to the 'local' (LDS) memory allocation `lds`. Examining this kernel in the VMEM Instruction Mix table yields: @@ -3339,7 +3339,7 @@ Next we examine the use of 'Spill/Scratch' memory. On current CDNA accelerators such as the [MI2XX](2xxnote), this is implemented using the [private](mspace) memory space, which maps to ['scratch' memory](https://llvm.org/docs/AMDGPUUsage.html#amdgpu-address-spaces) in AMDGPU hardware terminology. This type of memory can be accessed via different instructions depending on the specific architecture targeted. However, current CDNA accelerators such as the [MI2XX](2xxnote) use so called `buffer` instructions to access private memory in a simple (and typically) coalesced manner. See [Sec. 9.1, 'Vector Memory Buffer Instructions' of the CDNA2 ISA guide](https://www.amd.com/system/files/TechDocs/instinct-mi200-cdna2-instruction-set-architecture.pdf) for further reading on this instruction type. -We develop a [simple kernel](https://github.com/AMDResearch/omniperf/blob/dev/sample/stack.hip) that uses stack memory: +We develop a [simple kernel](https://github.com/ROCm/omniperf/blob/dev/sample/stack.hip) that uses stack memory: ```c++ #include __global__ void knl(int* out, int filter) { @@ -3404,7 +3404,7 @@ Here we see a single write to the stack (10.3.6), which corresponds to an L1-L2 (IPC_example)= ## Instructions-per-cycle and Utilizations example -For this section, we use the instructions-per-cycle (IPC) [example](https://github.com/AMDResearch/omniperf/blob/dev/sample/ipc.hip) included with Omniperf. +For this section, we use the instructions-per-cycle (IPC) [example](https://github.com/ROCm/omniperf/blob/dev/sample/ipc.hip) included with Omniperf. This example is compiled using `c++17` support: @@ -3824,7 +3824,7 @@ Finally, we note that our branch utilization (11.2.5) has increased slightly fro ## LDS Examples -For this example, we consider the [LDS sample](https://github.com/AMDResearch/omniperf/blob/dev/sample/lds.hip) distributed as a part of Omniperf. +For this example, we consider the [LDS sample](https://github.com/ROCm/omniperf/blob/dev/sample/lds.hip) distributed as a part of Omniperf. This code contains two kernels to explore how both [LDS](lds) bandwidth and bank conflicts are calculated in Omniperf. This example was compiled and run on an MI250 accelerator using ROCm v5.6.0, and Omniperf v2.0.0. @@ -4037,7 +4037,7 @@ The bank conflict rate linearly increases with the number of work-items within a ## Occupancy Limiters Example -In this [example](https://github.com/AMDResearch/omniperf/blob/dev/sample/occupancy.hip), we will investigate the use of the resource allocation panel in the [Workgroup Manager](SPI)'s metrics section to determine occupancy limiters. +In this [example](https://github.com/ROCm/omniperf/blob/dev/sample/occupancy.hip), we will investigate the use of the resource allocation panel in the [Workgroup Manager](SPI)'s metrics section to determine occupancy limiters. This code contains several kernels to explore how both various kernel resources impact achieved occupancy, and how this is reported in Omniperf. This example was compiled and run on a MI250 accelerator using ROCm v5.6.0, and Omniperf v2.0.0: diff --git a/src/docs-2.x/profiling.md b/src/docs-2.x/profiling.md index 230cda3e4..bd06bf95b 100644 --- a/src/docs-2.x/profiling.md +++ b/src/docs-2.x/profiling.md @@ -6,7 +6,7 @@ :maxdepth: 5 ``` -The [Omniperf](https://github.com/AMDResearch/omniperf) repository +The [Omniperf](https://github.com/ROCm/omniperf) repository includes source code for a sample GPU compute workload, __vcopy.cpp__. A copy of this file is available in the `share/sample` subdirectory after a normal Omniperf installation, or via the