[Bugfix] add truncate_prompt_tokens to work offline, directly from LLM class. #4598

yecohn · 2024-05-04T15:42:21Z

Fixes #4507.

added function _validate_prompt to make sure prompt is a right format and run some validations.
truncate_prompt_tokens should take truncate the prompt to the left as seen in vllm/entrypoints/openai/serving_engine.py line 182.

simon-mo · 2024-05-04T20:50:56Z

@tdoublep can you help review this?

tdoublep · 2024-05-06T07:56:04Z

@simon-mo sure, will try to get to that later today

tdoublep · 2024-05-06T11:03:42Z

vllm/entrypoints/llm.py

+        self,
+        prompt: Optional[str] = None,
+        prompt_token_ids: Optional[List[int]] = None,
+        truncate_prompt_tokens: Optional[Annotated[int, Field(ge=1)]] = None
+    ) -> Tuple[Optional[str], Optional[List[int]]]:


This function re-implements the same logic as this function. It's worth thinking about whether we should have it implemented in a single place and re-used accordingly. I think that would be preferable, unless there is some strong reason to have it duplicated.

I'm also studying #3512 to understand if that needs to be merged before this one.

yes this is a good point. let me know what do you think is the best and I'll implement it

For now let's just make sure that the behaviour is equivalent (see the below discussion re: using truncation from the tokenizer). I think it would make more sense to think about refactoring it after #3512 is merged because after that point the completions API will be using the tokenizer from the engine anyway.

tdoublep · 2024-05-06T11:06:04Z

vllm/entrypoints/llm.py

+            if prompt_token_ids is None:
+                prompt_token_ids = self.llm_engine.tokenizer.tokenizer(
+                    prompt).input_ids
+            prompt_token_ids = prompt_token_ids[-truncate_prompt_tokens:]


This isn't exactly the same behaviour as the equivalent code in the completion API. In that case, we use the built-in capability of the tokenizer to perform truncation in the case that the user provides a prompt, and only implement the truncation ourselves in the case when the user provides prompt ids. I think it would be better to do the same here to be consistent.

yes indeed, the thing I remarked though is that using the tokenizer with kwargs does a right truncation and not a left one basically doing prompt_token_ids[:truncated_prompt_tokens]. I ll recheck to make sure I am not mistaken.

here the traceback I got:

`

Preparing working directory 2s

# Creating "/workspace/build/buildkite/vllm/ci"

$ cd /workspace/build/buildkite/vllm/ci

$ git clone -v -- https://github.com/vllm-project/vllm.git .

Cloning into '.'...

POST git-upload-pack (175 bytes)

POST git-upload-pack (gzip 2402 to 1225 bytes)

remote: Enumerating objects: 16905, done.

remote: Counting objects: 100% (3599/3599), done.

remote: Compressing objects: 100% (711/711), done.

remote: Total 16905 (delta 3181), reused 2987 (delta 2887), pack-reused 13306

Receiving objects: 100% (16905/16905), 8.71 MiB | 20.47 MiB/s, done.

Resolving deltas: 100% (12703/12703), done.

$ git clean -ffxdq

# Fetch and checkout pull request head from GitHub

$ git fetch -- origin refs/pull/4598/head

remote: Enumerating objects: 15, done.

remote: Counting objects: 100% (15/15), done.

remote: Compressing objects: 100% (10/10), done.

remote: Total 15 (delta 7), reused 13 (delta 5), pack-reused 0

Unpacking objects: 100% (15/15), 3.66 KiB | 749.00 KiB/s, done.

From https://github.com/vllm-project/vllm

* branch refs/pull/4598/head -> FETCH_HEAD

# FETCH_HEAD is now 968d5e9651ca28b90498e282e3095f9654ceebe0

$ git checkout -f 968d5e9

Note: switching to '968d5e9651ca28b90498e282e3095f9654ceebe0'.

You are in 'detached HEAD' state. You can look around, make experimental

changes and commit them, and you can discard any commits you make in this

state without impacting any branches by switching back to a branch.

If you want to create a new branch to retain commits you create, you may

do so (now or later) by using -c with the switch command. Example:

git switch -c

Or undo this operation with:

git switch -

Turn off this advice by setting config variable advice.detachedHead to false

HEAD is now at 968d5e9 fix dependency injection

# Cleaning again to catch any post-checkout changes

$ git clean -ffxdq

# Checking to see if Git data needs to be sent to Buildkite

$ buildkite-agent meta-data exists buildkite:git:commit

Running commands

$ trap 'kill -- $$' INT TERM QUIT; cp -r ~/.ssh /workspace/.ssh && chmod -R 777 /workspace

Some containers had unknown exit statuses. Perhaps they were in ImagePullBackOff.

`

regarding the build I got an error buildkite/ci/pr/distributed-tests-multiple-groups — Failed (exit status -10) do you have any idea what could it be ? and how could I debug this ?

@yecohn Regarding the truncation side issue. You are right, that by default the tokenizers do truncation from the right. That's why we added this argument when calling get_tokenizer to set the truncation side to left.

Re: the build error I don't think it looks related to your changes currently.

JackChuang · 2024-10-17T08:58:44Z

Hi @tdoublep, @yecohn, and @simon-mo,
It looks like this feature has not been merged yet. According to my tests, truncate_prompt_tokens still does not work for offline inference.

I’d like to see this feature implemented. In my case, I need the truncation to occur in vllm, and I would also like to experiment with the generated KV caches.

Are there any plans to support this feature? Can I help in any way?
Or, alternatively, are there any other methods to perform prompt truncation while running offline inference in the vllm?

Thanks,

yecohn added 3 commits May 4, 2024 15:03

add new feature to unable truncate_prompt for offline usage

1ba8d94

[fix bug]: add truncate_prompt capability for offline mode

ae29377

fix dependency injection

968d5e9

tdoublep reviewed May 6, 2024

View reviewed changes

tdoublep mentioned this pull request Jun 28, 2024

Add option to completion API to truncate prompt tokens #3144

Merged

yecohn closed this Jul 23, 2024

yecohn deleted the truncate_usage_pr branch July 23, 2024 16:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bugfix] add truncate_prompt_tokens to work offline, directly from LLM class. #4598

[Bugfix] add truncate_prompt_tokens to work offline, directly from LLM class. #4598

yecohn commented May 4, 2024 •

edited

Loading

simon-mo commented May 4, 2024

tdoublep commented May 6, 2024

tdoublep May 6, 2024

tdoublep May 6, 2024

yecohn May 6, 2024

tdoublep May 7, 2024

tdoublep May 6, 2024

yecohn May 6, 2024 •

edited

Loading

yecohn May 6, 2024

tdoublep May 7, 2024

JackChuang commented Oct 17, 2024

Preparing working directory	2s
	# Creating "/workspace/build/buildkite/vllm/ci"
	$ cd /workspace/build/buildkite/vllm/ci
	$ git clone -v -- https://github.com/vllm-project/vllm.git .
	Cloning into '.'...
	POST git-upload-pack (175 bytes)
	POST git-upload-pack (gzip 2402 to 1225 bytes)
	remote: Enumerating objects: 16905, done.
	remote: Counting objects: 100% (3599/3599), done.
	remote: Compressing objects: 100% (711/711), done.
	remote: Total 16905 (delta 3181), reused 2987 (delta 2887), pack-reused 13306
	Receiving objects: 100% (16905/16905), 8.71 MiB \| 20.47 MiB/s, done.
	Resolving deltas: 100% (12703/12703), done.
	$ git clean -ffxdq
	# Fetch and checkout pull request head from GitHub
	$ git fetch -- origin refs/pull/4598/head
	remote: Enumerating objects: 15, done.
	remote: Counting objects: 100% (15/15), done.
	remote: Compressing objects: 100% (10/10), done.
	remote: Total 15 (delta 7), reused 13 (delta 5), pack-reused 0
	Unpacking objects: 100% (15/15), 3.66 KiB \| 749.00 KiB/s, done.
	From https://github.com/vllm-project/vllm
	* branch refs/pull/4598/head -> FETCH_HEAD
	# FETCH_HEAD is now `968d5e9651ca28b90498e282e3095f9654ceebe0`
	$ git checkout -f `968d5e9`
	Note: switching to '968d5e9651ca28b90498e282e3095f9654ceebe0'.

	You are in 'detached HEAD' state. You can look around, make experimental
	changes and commit them, and you can discard any commits you make in this
	state without impacting any branches by switching back to a branch.

	If you want to create a new branch to retain commits you create, you may
	do so (now or later) by using -c with the switch command. Example:

	git switch -c

	Or undo this operation with:

	git switch -

	Turn off this advice by setting config variable advice.detachedHead to false

	HEAD is now at `968d5e9` fix dependency injection
	# Cleaning again to catch any post-checkout changes
	$ git clean -ffxdq
	# Checking to see if Git data needs to be sent to Buildkite
	$ buildkite-agent meta-data exists buildkite:git:commit
	Running commands
	$ trap 'kill -- $$' INT TERM QUIT; cp -r ~/.ssh /workspace/.ssh && chmod -R 777 /workspace
	Some containers had unknown exit statuses. Perhaps they were in ImagePullBackOff.

[Bugfix] add truncate_prompt_tokens to work offline, directly from LLM class. #4598

[Bugfix] add truncate_prompt_tokens to work offline, directly from LLM class. #4598

Conversation

yecohn commented May 4, 2024 • edited Loading

simon-mo commented May 4, 2024

tdoublep commented May 6, 2024

tdoublep May 6, 2024

Choose a reason for hiding this comment

tdoublep May 6, 2024

Choose a reason for hiding this comment

yecohn May 6, 2024

Choose a reason for hiding this comment

tdoublep May 7, 2024

Choose a reason for hiding this comment

tdoublep May 6, 2024

Choose a reason for hiding this comment

yecohn May 6, 2024 • edited Loading

Choose a reason for hiding this comment

yecohn May 6, 2024

Choose a reason for hiding this comment

tdoublep May 7, 2024

Choose a reason for hiding this comment

JackChuang commented Oct 17, 2024

yecohn commented May 4, 2024 •

edited

Loading

yecohn May 6, 2024 •

edited

Loading