Contraction f16, bf16, f32_f16, f32_bf16, f64_f32 #158

CongMa13 · 2023-11-24T02:42:32Z

No description provided.

test/01_contraction/configs/bilinear_test_params.yaml

test/utils.hpp

- Support _Float16 - Support hip_bfloat16 - Add unit test of _Float16 and hip_bfloat16 - Add sample of _Float16 and hip_bfloat16

- Support ABCD data type f32 and compute type f16, bf16 - Support ABCD data type f64 and compute type f32 - Fixed bug: alpha, beta were passed in as wrong data type in unit test of contraction - Create sample template of contraction

Solution unique_ids of Actor Critic are have not been ready yet, but we put some placeholders in the new Actor Critic to make the unit tests be able to pass.

Update contraction device instances since CK has updated them.

1. Initiate the data with 0.01, 0.02, ... by default 2. Print C

When logger level is set to HIPTENSOR_LOG_LEVEL_PERF_TRACE, we make CK instances measure the running time. The problem is that CK internally will run the contraction 10 times by default. This leads to an issues: 1. It returns wrong result for C = alpha A x B + beta C Set StreamConfig.nrepeat_ = 1, the contraction will be run once

1. ck::bhalf_t cannot cast to float or double by static_cast. Use ck::type_convert() to fix it. 2. epsilon() is not good value to measure the relative difference of data. It is too small for double (eps < 10e-13).

The pattern of contraction sameple file is - bilinear: simple_bilinear_contraction_<A>_<B>_<C>_<D>_compute_<compute>.cpp - scale : simple_scale_contraction_<A>_<B>_<C>_compute_<compute>.cpp

The relative difference between contraction result and CPU reference is less than 0.1% after the improvement.

1. Revert the default threshold of relative difference to (100 * std::numeric_limits<T>::epsilon()) 2. Update CPU reference to make the difference between CPU reference and output of contraction instance is less than (100 * std::numeric_limits<T>::epsilon()).

README.md

cgmillette

Looks good to me - let's wait until CK merges and then we will have to update CI to the latest

cgmillette

LGTM

CongMa13 changed the title ~~Contraction f16 bf16~~ Contraction f16, bf16, f32_f16, f32_bf16, f64_f32 Nov 30, 2023

CongMa13 force-pushed the contraction_f16_bf16 branch 3 times, most recently from 259d0ae to c58123a Compare December 6, 2023 01:45

CongMa13 marked this pull request as ready for review December 6, 2023 01:47

CongMa13 requested review from cgmillette, bragadeesh, mkarunan, dlangbe and afanfa as code owners December 6, 2023 01:47

CongMa13 commented Dec 6, 2023

View reviewed changes

test/01_contraction/configs/bilinear_test_params.yaml Show resolved Hide resolved

CongMa13 commented Dec 6, 2023

View reviewed changes

test/utils.hpp Outdated Show resolved Hide resolved

CongMa13 requested review from saadrahim and LisaDelaney as code owners December 8, 2023 17:59

CongMa13 added 12 commits December 8, 2023 18:01

Add support to f16 and bf16 to contraction

c5fbcec

- Support _Float16 - Support hip_bfloat16 - Add unit test of _Float16 and hip_bfloat16 - Add sample of _Float16 and hip_bfloat16

Add placeholder for solution unique_id

ab8d557

Solution unique_ids of Actor Critic are have not been ready yet, but we put some placeholders in the new Actor Critic to make the unit tests be able to pass.

Update contraction device instances

df27e32

Update contraction device instances since CK has updated them.

Print C in sample output

f85df83

1. Initiate the data with 0.01, 0.02, ... by default 2. Print C

Fixed a bug in CPU reference

f631818

1. ck::bhalf_t cannot cast to float or double by static_cast. Use ck::type_convert() to fix it. 2. epsilon() is not good value to measure the relative difference of data. It is too small for double (eps < 10e-13).

Add commnets

e5cefe7

Rename contraction sameple files

4345a1c

The pattern of contraction sameple file is - bilinear: simple_bilinear_contraction_<A>_<B>_<C>_<D>_compute_<compute>.cpp - scale : simple_scale_contraction_<A>_<B>_<C>_compute_<compute>.cpp

Improve CPU reference accurary

43f33ee

The relative difference between contraction result and CPU reference is less than 0.1% after the improvement.

Add comments to explain how to pass alpha value

fec9065

Update CPU reference

b21fe0b

1. Revert the default threshold of relative difference to (100 * std::numeric_limits<T>::epsilon()) 2. Update CPU reference to make the difference between CPU reference and output of contraction instance is less than (100 * std::numeric_limits<T>::epsilon()).

CongMa13 force-pushed the contraction_f16_bf16 branch from 67fbad0 to 67aebeb Compare December 8, 2023 20:53

cgmillette reviewed Dec 8, 2023

View reviewed changes

README.md Outdated Show resolved Hide resolved

cgmillette previously approved these changes Dec 8, 2023

View reviewed changes

CongMa13 dismissed cgmillette’s stale review via ba2fecf December 8, 2023 22:33

CongMa13 force-pushed the contraction_f16_bf16 branch 3 times, most recently from c1e92bf to 5ba83c1 Compare December 11, 2023 16:25

cgmillette previously approved these changes Dec 11, 2023

View reviewed changes

CongMa13 dismissed cgmillette’s stale review via b21fe0b December 11, 2023 17:35

CongMa13 force-pushed the contraction_f16_bf16 branch from 5ba83c1 to b21fe0b Compare December 11, 2023 17:35

cgmillette approved these changes Dec 11, 2023

View reviewed changes

CongMa13 merged commit 8c11d59 into ROCm:develop Dec 11, 2023
2 of 6 checks passed

CongMa13 deleted the contraction_f16_bf16 branch December 11, 2023 17:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Contraction f16, bf16, f32_f16, f32_bf16, f64_f32 #158

Contraction f16, bf16, f32_f16, f32_bf16, f64_f32 #158

CongMa13 commented Nov 24, 2023

cgmillette left a comment

cgmillette left a comment

Contraction f16, bf16, f32_f16, f32_bf16, f64_f32 #158

Contraction f16, bf16, f32_f16, f32_bf16, f64_f32 #158

Conversation

CongMa13 commented Nov 24, 2023

cgmillette left a comment

Choose a reason for hiding this comment

cgmillette left a comment

Choose a reason for hiding this comment