Fix cublas NaNs in Falcon #2765

klosax · 2023-08-24T12:26:45Z

To better avoid getting NaN, this changes the single precision tanhf to double precision tanh.

ggml.c

klosax · 2023-08-24T14:53:05Z

Dont know how to solve this. Any suggestions?

without blas: All seems to work fine

cublas:
With master: F32 output tensor works fine, but not Q4_0
With this PR: Q4_0 output tensor works fine, but not F32
All other tensors Q4_0

ggml.c

klosax · 2023-08-25T09:37:24Z

These changes does not avoid NaNs completely when using cublas,, but decreases the probability of getting them.

ggerganov · 2023-08-25T12:07:25Z

We should understand fully where the NaNs come from.
If you keep //#define GGML_GELU_FP16 as you have proposed and also apply @ikawrakow's suggestions, do you observe NaNs in any case?

klosax · 2023-08-25T12:24:17Z

The NaNs come from the input to the gelu function. I tried limit the input to the suggested range -10 to 10 and even lower but there was still NaNs. My guess is that this is compounding precision errors somewhere else, it is hard to debug this. The original Falcon is BF16 and when converted to F32 all outliers will be there, but will get cutoff if converted to F16. If converted to F16 instead of F32 prior to quantization the probability of NaNs seems to be lower.

The NaN problem here seems related to cublas. Without blas it seems to work.

klosax · 2023-08-25T14:07:05Z

The only difference is using the LLAMA_CUBLAS compile parameter

Falcon-7b
HF model converted to F32
All 2D tensors quantized to q4_0

HellaSwag output:

cublas with mmq:

309	72.49190939
310	72.25806452
311	72.34726688
312	72.43589744
NAN: task 313 ending 0 logprob nan

cublas wihtout mmq:

309	73.46278317
310	73.22580645
311	73.31189711
312	73.3974359
313	73.48242812
314	73.56687898
315	73.65079365
..
400	74.5

without blas:

309	73.78640777
310	73.54838710
311	73.63344051
312	73.71794872
313	73.80191693
314	73.88535032
315	73.96825397
..
400	74.5

klosax · 2023-08-25T15:21:37Z

Tested using -nommq and it seems to work as good as without blas. Are there bugs in the custom cuda mul mat q kernels?

slaren · 2023-08-25T15:28:27Z

Are there bugs in the custom cuda mul mat q kernels?

This may be because with mmq src1 is quantized to q8_1, so there is some loss of precision. With CPU it is also quantized to q8_0 or q8_1, so the result should be similar.

ggerganov · 2023-08-25T16:15:27Z

@klosax Are you offloading tensors? We know that offloading KV tensors currently does not work with Falcon

If you are not offloading, then I don't think -nommq should make any difference, no?

slaren · 2023-08-25T16:17:02Z

If you are not offloading, then I don't think -nommq should make any difference, no?

The CUDA backend will automatically copy the weights when processing large prompts, even if not offloaded. Like the original cuBLAS implementation did.

klosax · 2023-08-25T18:38:09Z

No I do not offload any tensors.

I will do more tests comparing cublas with and without -nommq. Maybe it is just me, but when testing things out it would really have helped a lot to have a mode using only CPU full float precision as a reference, even if the source tensors are quantized.

klosax · 2023-08-25T20:48:13Z

Falcon-7b
HF model converted to F32
Output tensor F32
All other tensors quantized to q4_0

HellaSwag output:

Using cublas with mmq

118	75.42372881
119	75.63025210
120	75.00000000
121	75.20661157
122	75.40983607
123	75.60975610
124	75.00000000
125	74.40000000
126	74.60317460
NAN: task 127 ending 3 logprob nan

Using cublas without mmq (-nommq)

121	74.38016529
122	74.59016393
123	74.79674797
124	74.19354839
125	73.60000000
126	73.80952381
127	74.01574803
128	73.43750000
..
400	77.0

klosax · 2023-08-26T14:38:54Z

NaNs seems to appear with MMQ only when Q4_0 tensors are involved.
If there is no bug in the Q4_0 MMQ, then maybe disable Q4_0 MMQ when using Falcon?

klosax · 2023-08-27T12:12:58Z

I will close this.
The NaN problems disappear when disabling cuda mmq using -nommq and is not directly related to the precison of tanh. The MMQ is too inaccurate for the Falcon model, which leads to accumulating precision errors that will result NaNs. My suggestion is disabling MMQ when a Falcon model is loaded.

ggml.c : use double precision for tanh

797312e

ggerganov reviewed Aug 24, 2023

View reviewed changes

ggml.c Outdated Show resolved Hide resolved

ggml.c : use ggml_float for gelu

84e8da6

ikawrakow reviewed Aug 24, 2023

View reviewed changes

ggml.c Show resolved Hide resolved

ikawrakow reviewed Aug 24, 2023

View reviewed changes

ggml.c Show resolved Hide resolved

ggml.c : undefine GGML_GELU_FP16

252647c

klosax requested a review from ggerganov August 25, 2023 09:38

klosax mentioned this pull request Aug 25, 2023

CUDA: use mul_mat_q kernels by default #2683

Merged

klosax changed the title ~~Fix gelu NaNs~~ Fix cublas NaNs in Falcon Aug 26, 2023

klosax mentioned this pull request Aug 27, 2023

k_quants tuning for Falcon-7b #2816

Merged

klosax closed this Aug 27, 2023

klosax deleted the fix-gelu-nans branch August 27, 2023 12:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix cublas NaNs in Falcon #2765

Fix cublas NaNs in Falcon #2765

klosax commented Aug 24, 2023

klosax commented Aug 24, 2023 •

edited

Loading

klosax commented Aug 25, 2023

ggerganov commented Aug 25, 2023

klosax commented Aug 25, 2023 •

edited

Loading

klosax commented Aug 25, 2023 •

edited

Loading

klosax commented Aug 25, 2023

slaren commented Aug 25, 2023

ggerganov commented Aug 25, 2023

slaren commented Aug 25, 2023

klosax commented Aug 25, 2023 •

edited

Loading

klosax commented Aug 25, 2023 •

edited

Loading

klosax commented Aug 26, 2023

klosax commented Aug 27, 2023

Fix cublas NaNs in Falcon #2765

Fix cublas NaNs in Falcon #2765

Conversation

klosax commented Aug 24, 2023

klosax commented Aug 24, 2023 • edited Loading

klosax commented Aug 25, 2023

ggerganov commented Aug 25, 2023

klosax commented Aug 25, 2023 • edited Loading

klosax commented Aug 25, 2023 • edited Loading

klosax commented Aug 25, 2023

slaren commented Aug 25, 2023

ggerganov commented Aug 25, 2023

slaren commented Aug 25, 2023

klosax commented Aug 25, 2023 • edited Loading

klosax commented Aug 25, 2023 • edited Loading

klosax commented Aug 26, 2023

klosax commented Aug 27, 2023

klosax commented Aug 24, 2023 •

edited

Loading

klosax commented Aug 25, 2023 •

edited

Loading

klosax commented Aug 25, 2023 •

edited

Loading

klosax commented Aug 25, 2023 •

edited

Loading

klosax commented Aug 25, 2023 •

edited

Loading