Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ROOFLINE] Add CUDA support to roofline analysis #12205

Merged
merged 6 commits into from
Jul 30, 2022

Conversation

tkonolige
Copy link
Contributor

Add functions to estimate peak flops and bandwidth for CUDA. Add a new registration mechanism to the roofline analysis to support adding any target. This mechanism uses generic functions with overrides. New targets only need to add estimate_peak_bandwidth and estimate_peak_flops functions.

Also fix cuda codegen and tensorcore_infer_fragment.cc to support filling matrix_a and matrix_b fragments.

@AndrewZhaoLuo

@AndrewZhaoLuo AndrewZhaoLuo self-requested a review July 27, 2022 20:20
@AndrewZhaoLuo
Copy link
Contributor

Will take a look tomorrow

Copy link
Contributor

@AndrewZhaoLuo AndrewZhaoLuo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to grok the tensorcore stuff a bit but seems good so far. On my 3070

I get 420 Gb/s bandwidth vs the 448 advertised. For the TFLops I actually get more than the 40.6 TFLops advertised (I get 41.2 TFlops which seems close enough)

python/tvm/utils/roofline/cuda.py Outdated Show resolved Hide resolved
python/tvm/utils/roofline/cuda.py Outdated Show resolved Hide resolved
python/tvm/utils/roofline/cuda.py Outdated Show resolved Hide resolved
tests/python/unittest/test_roofline.py Show resolved Hide resolved
python/tvm/utils/roofline/cuda.py Outdated Show resolved Hide resolved
python/tvm/utils/roofline/cuda.py Show resolved Hide resolved
Tristan Konolige added 6 commits July 29, 2022 08:56
Add functions to estimate peak flops and bandwidth for CUDA. Add a new
registration mechanism to the roofline analysis to support adding any
target. This mechanism uses generic functions with overrides. New
targets only need to add `estimate_peak_bandwidth` and
`estimate_peak_flops` functions.

Also fix cuda codegen and tensorcore_infer_fragment.cc to support
filling matrix_a and matrix_b fragments.
@AndrewZhaoLuo AndrewZhaoLuo merged commit 961a7c7 into apache:main Jul 30, 2022
xinetzone pushed a commit to daobook/tvm that referenced this pull request Nov 25, 2022
* [ROOFLINE] Add CUDA support to roofline analysis

Add functions to estimate peak flops and bandwidth for CUDA. Add a new
registration mechanism to the roofline analysis to support adding any
target. This mechanism uses generic functions with overrides. New
targets only need to add `estimate_peak_bandwidth` and
`estimate_peak_flops` functions.

Also fix cuda codegen and tensorcore_infer_fragment.cc to support
filling matrix_a and matrix_b fragments.

* formatiing

* move statement back inside loops

* print out report for debugging

* default to avx2

* review comments
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants