Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ROOFLINE] Add CUDA support to roofline analysis #12205

Merged
merged 6 commits into from
Jul 30, 2022

Commits on Jul 29, 2022

  1. [ROOFLINE] Add CUDA support to roofline analysis

    Add functions to estimate peak flops and bandwidth for CUDA. Add a new
    registration mechanism to the roofline analysis to support adding any
    target. This mechanism uses generic functions with overrides. New
    targets only need to add `estimate_peak_bandwidth` and
    `estimate_peak_flops` functions.
    
    Also fix cuda codegen and tensorcore_infer_fragment.cc to support
    filling matrix_a and matrix_b fragments.
    Tristan Konolige committed Jul 29, 2022
    Configuration menu
    Copy the full SHA
    b2e23ab View commit details
    Browse the repository at this point in the history
  2. formatiing

    Tristan Konolige committed Jul 29, 2022
    Configuration menu
    Copy the full SHA
    f853173 View commit details
    Browse the repository at this point in the history
  3. move statement back inside loops

    Tristan Konolige committed Jul 29, 2022
    Configuration menu
    Copy the full SHA
    68d0a92 View commit details
    Browse the repository at this point in the history
  4. print out report for debugging

    Tristan Konolige committed Jul 29, 2022
    Configuration menu
    Copy the full SHA
    08e4cc6 View commit details
    Browse the repository at this point in the history
  5. default to avx2

    Tristan Konolige committed Jul 29, 2022
    Configuration menu
    Copy the full SHA
    42ebae0 View commit details
    Browse the repository at this point in the history
  6. review comments

    Tristan Konolige committed Jul 29, 2022
    Configuration menu
    Copy the full SHA
    8da98e9 View commit details
    Browse the repository at this point in the history