Make prefilling return first token for loadgen integration #143

sixiang-google · 2024-07-08T18:38:21Z

No description provided.

sixiang-google · 2024-07-08T18:44:40Z

vipannalla

LGTM, I'll let others review too

vipannalla · 2024-07-08T20:48:20Z

tests/test_llama_e2e.py

@@ -278,7 +278,7 @@ def test_llama_e2e_two_addtional_tokens(self):
    slot = 0

    # pylint: disable-next=all
-    prefill_result = engine.prefill(
+    prefill_result, _ = engine.prefill(


Please add the first_token to the out_tokens array (Line 288), otherwise out_tokens will not equal expected_output_tokens. Same for all other calls..

vipannalla · 2024-07-08T20:55:38Z

jetstream_pt/engine.py

+        tokens_idx = (0, 1),
+        valid_idx = (1, 2),
+        length_idx = (2, 3),
+        samples_per_slot = 1,


Suggestion: Instead of magic numbers, should use same logic as in generate so its clearer to reader:

length = token_out.shape[1] result_tokens = engine_api.ResultTokens( data=data, tokens_idx=(0, length), valid_idx=(length, 2 * length), length_idx=(2 * length, 2 * length + 1), samples_per_slot=1, )

FanhaiLu1

Can you run a E2E test and share the output token result?

We should also change the decode return token logic. My concerns is that current implementation first decode token is same as prefill token.

* Make prefilling return first token for loadgen integration * minor fix and lint * enable passing of max_decode_length as a flag

sixiang-google added 2 commits July 8, 2024 01:06

Make prefilling return first token for loadgen integration

bcbf95a

Merge branch 'main' into sixiang

46e1cc2

sixiang-google closed this Jul 8, 2024

sixiang-google reopened this Jul 8, 2024

vipannalla requested a review from JoeZijunZhou July 8, 2024 20:56

vipannalla reviewed Jul 8, 2024

View reviewed changes

vipannalla requested review from qihqi and FanhaiLu1 July 8, 2024 20:57

FanhaiLu1 reviewed Jul 8, 2024

View reviewed changes

minor fix and lint

b1446aa

sixiang-google force-pushed the sixiang branch from 2347ebb to 48e0627 Compare July 9, 2024 17:54

enable passing of max_decode_length as a flag

02a730b

sixiang-google force-pushed the sixiang branch from 48e0627 to 02a730b Compare July 9, 2024 18:45

qihqi approved these changes Jul 10, 2024

View reviewed changes

qihqi merged commit 50a6d10 into main Jul 10, 2024
4 checks passed

wang2yn84 pushed a commit that referenced this pull request Jul 18, 2024

Make prefilling return first token for loadgen integration (#143)

8675c30

* Make prefilling return first token for loadgen integration * minor fix and lint * enable passing of max_decode_length as a flag

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make prefilling return first token for loadgen integration #143

Make prefilling return first token for loadgen integration #143

sixiang-google commented Jul 8, 2024

sixiang-google commented Jul 8, 2024

vipannalla left a comment

vipannalla Jul 8, 2024

JoeZijunZhou Jul 9, 2024

vipannalla Jul 8, 2024

FanhaiLu1 left a comment

Make prefilling return first token for loadgen integration #143

Make prefilling return first token for loadgen integration #143

Conversation

sixiang-google commented Jul 8, 2024

sixiang-google commented Jul 8, 2024

vipannalla left a comment

Choose a reason for hiding this comment

vipannalla Jul 8, 2024

Choose a reason for hiding this comment

JoeZijunZhou Jul 9, 2024

Choose a reason for hiding this comment

vipannalla Jul 8, 2024

Choose a reason for hiding this comment

FanhaiLu1 left a comment

Choose a reason for hiding this comment