Pre allocate KV tensors and use inference mode #13

jlamypoirier · 2023-01-30T23:41:43Z

Fixes: #12 (Use bigcode-project/transformers#5)

Add pre_allocate_cache cache option. Big speedup on GPU but doesn't help with the CPU bottleneck.
Use torch inference mode. Speeds up CPU calls, marginal GPU speedup.

python3 src/main.py --hidden_size=2048 --n_head=16 --n_layer=24 --pipeline_class=HF_Pipeline --model_class=GPT2 --dtype=float16 --device=cuda --cycles=5 --batch_size=256 --max_new_tokens=100 --n_positions=512 --attention_type=[1/2] --max_log_outputs=1 --activation_function=gelu_new_python [--pre_allocate_cache] [--profile]

MQA, before: e2e = 1242 ms, cuda = 762 ms
MQA, pre-allocate: e2e = 1248 ms, cuda = 693 ms (no e2e speedup for this case but there will be for cases that aren't cpu-bottlenecked)
MQA, inference mode: e2e = 1130 ms, cuda = 690 ms
MQA, inference mode, no pre-allocate: e2e = 1121 ms, cuda = 760 ms

MHA, before: e2e = 2241 ms, cuda = 2201 ms
MHA, pre-allocate: e2e = 1442 ms, cuda = 1048 ms
MHA, inference mode: e2e = 1249 ms, cuda = 1046 ms

…n function, editable transformers in docker image, cleanup

This reverts commit ec62e17.

This reverts commit d7fe3dd.

jlamypoirier and others added 29 commits December 20, 2022 13:28

dockerfile

e7473b9

Refactor and profiling

f101f76

style

1c95f1f

formatting and fixes

068430d

cleanup

33e8e11

style

e742a53

Merge branch 'dockerfile' into profiling

ee237d2

cleanup

633619e

improvements

008a2d4

misc

06797e7

cleanup

06a803e

fix_import

916d01a

fixes

c1b4e4a

Merge branch 'main' into profiling

9cf847f

Update transformers

c1efe53

Update transformers, improve profiling output, configurable activatio…

4c77db1

…n function, editable transformers in docker image, cleanup

Update makefile

64ff77d

--no-cache-dir

03115bc

move constants

d7fe3dd

call as a module

ec62e17

stuff

d8e22d8

cleanup

b04d9bc

Merge branch 'profiling' into pre_allocate

c288f0a

fix

21e9a6a

Inference mode

7e397bc

Revert "call as a module"

c9681ae

This reverts commit ec62e17.

Revert "move constants"

13561bb

This reverts commit d7fe3dd.

Merge branch 'profiling' into pre_allocate

3a6524c

Merge branch 'main' into pre_allocate

8c5925e

jlamypoirier closed this Feb 8, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pre allocate KV tensors and use inference mode #13

Pre allocate KV tensors and use inference mode #13

jlamypoirier commented Jan 30, 2023 •

edited

Loading

Pre allocate KV tensors and use inference mode #13

Pre allocate KV tensors and use inference mode #13

Conversation

jlamypoirier commented Jan 30, 2023 • edited Loading

jlamypoirier commented Jan 30, 2023 •

edited

Loading