Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Contraction f16, bf16, f32_f16, f32_bf16, f64_f32 #158

Merged
merged 12 commits into from
Dec 11, 2023

Conversation

CongMa13
Copy link
Collaborator

No description provided.

@CongMa13 CongMa13 changed the title Contraction f16 bf16 Contraction f16, bf16, f32_f16, f32_bf16, f64_f32 Nov 30, 2023
@CongMa13 CongMa13 force-pushed the contraction_f16_bf16 branch 3 times, most recently from 259d0ae to c58123a Compare December 6, 2023 01:45
@CongMa13 CongMa13 marked this pull request as ready for review December 6, 2023 01:47
test/utils.hpp Outdated Show resolved Hide resolved
- Support _Float16
- Support hip_bfloat16
- Add unit test of _Float16 and hip_bfloat16
- Add sample of _Float16 and hip_bfloat16
- Support ABCD data type f32 and compute type f16, bf16
- Support ABCD data type f64 and compute type f32
- Fixed bug: alpha, beta were passed in as wrong data type in unit test
of contraction
- Create sample template of contraction
Solution unique_ids of Actor Critic are have not been ready yet, but we
put some placeholders in the new Actor Critic to make the unit tests be
able to pass.
Update contraction device instances since CK has updated them.
1. Initiate the data with 0.01, 0.02, ... by default
2. Print C
When logger level is set to HIPTENSOR_LOG_LEVEL_PERF_TRACE, we make CK
instances measure the running time. The problem is that CK internally
will run the contraction 10 times by default. This leads to an issues:

1. It returns wrong result for C = alpha A x B + beta C

Set StreamConfig.nrepeat_ = 1, the contraction will be run once
1. ck::bhalf_t cannot cast to float or double by static_cast.
Use ck::type_convert() to fix it.

2. epsilon() is not good value to measure the relative difference of
data. It is too small for double (eps < 10e-13).
The pattern of contraction sameple file is

- bilinear: simple_bilinear_contraction_<A>_<B>_<C>_<D>_compute_<compute>.cpp
- scale   : simple_scale_contraction_<A>_<B>_<C>_compute_<compute>.cpp
The relative difference between contraction result and CPU reference is
less than 0.1% after the improvement.
1. Revert the default threshold of relative difference to (100 * std::numeric_limits<T>::epsilon())
2. Update CPU reference to make the difference between CPU reference and output of contraction instance
is less than (100 * std::numeric_limits<T>::epsilon()).
README.md Outdated Show resolved Hide resolved
cgmillette
cgmillette previously approved these changes Dec 8, 2023
Copy link
Collaborator

@cgmillette cgmillette left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me - let's wait until CK merges and then we will have to update CI to the latest

@CongMa13 CongMa13 force-pushed the contraction_f16_bf16 branch 3 times, most recently from c1e92bf to 5ba83c1 Compare December 11, 2023 16:25
cgmillette
cgmillette previously approved these changes Dec 11, 2023
Copy link
Collaborator

@cgmillette cgmillette left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@CongMa13 CongMa13 merged commit 8c11d59 into ROCm:develop Dec 11, 2023
2 of 6 checks passed
@CongMa13 CongMa13 deleted the contraction_f16_bf16 branch December 11, 2023 17:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants