Clarification needed: `Token indices sequence length is longer ...` during inference #20

detkov · 2023-04-11T08:39:05Z

Versions

compel==1.1.0
diffusers==0.15.0.dev0

Reproduction code sample:

from diffusers import StableDiffusionPipeline
from compel import Compel

pipeline = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5")
compel = Compel(tokenizer=pipeline.tokenizer, text_encoder=pipeline.text_encoder)

prompt = 25 * "a cat playing with a ball in the forest"
negative_prompt = 25 * "ugly, blurry, out of focus "

prompt_embeds = compel.build_conditioning_tensor(prompt)
negative_prompt_embeds = compel.build_conditioning_tensor(negative_prompt)

prompt_embeds, negative_prompt_embeds = compel.pad_conditioning_tensors_to_same_length(
    conditionings=[prompt_embeds, negative_prompt_embeds]
)

images = pipeline(prompt_embeds=prompt_embeds, negative_prompt_embeds=negative_prompt_embeds)

During this, I encounter the following message:

Token indices sequence length is longer than the specified maximum sequence length for this model (227 > 77). Running this sequence through the model will result in indexing errors

So I have those questions:

Do I understand it right, that I can just ignore this?
How can I stop this message from being printed?
Where does it came from?

Thanks in advance

The text was updated successfully, but these errors were encountered:

damian0815 · 2023-04-11T09:11:20Z

Interesting, i haven't seen that before. Is it happening on one of the calls to compel.build_conditioning_tensor() or does it only appear at pipeline()? In any case because Compel was not initialized with truncate_long_prompts=False, the token sequence ought to be being truncated to the correct length before it gets passed to pipeline (which means the call to pad_conditioning_tensors_to_same_length() is unnecessary).

detkov · 2023-04-12T08:29:09Z

I'm sorry, I just forgot to paste the truncate_long_prompts=False, into the initialization of compel.

Dependencies:

diffusers==0.14.0
compel==1.1.0

Code, can be run in the colab with GPU:

from diffusers import StableDiffusionPipeline
from compel import Compel
import torch

pipeline = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", 
                                                   safety_checker=None,
                                                   requires_safety_checker=False,
                                                   feature_extractor=None,
                                                   torch_dtype=torch.float16).to('cuda')

compel = Compel(tokenizer=pipeline.tokenizer, text_encoder=pipeline.text_encoder, 
                truncate_long_prompts=False)

prompt = 25 * "a cat playing with a ball in the forest"
negative_prompt = 25 * "ugly, blurry, out of focus "

prompt_embeds = compel.build_conditioning_tensor(prompt)
negative_prompt_embeds = compel.build_conditioning_tensor(negative_prompt)

prompt_embeds, negative_prompt_embeds = compel.pad_conditioning_tensors_to_same_length(
    conditionings=[prompt_embeds, negative_prompt_embeds]
)

images = pipeline(prompt_embeds=prompt_embeds, negative_prompt_embeds=negative_prompt_embeds)

Similarly, I have Token indices sequence length is longer than the specified maximum sequence length for this model (227 > 77). Running this sequence through the model will result in indexing errors in the output. I don't understand, because it seems like if I use the truncate_long_prompts=False, I shouldn't see messages like this.

damian0815 · 2023-04-12T12:16:36Z

Is the error happening on one of the calls to compel.build_conditioning_tensor() or does it only appear at pipeline()? if it's only happening at pipeline() then this is entirely expected - the long prompt is a hack and it shouldn't work (i don't use it fwiw, it's provided based on user demand)

detkov · 2023-04-12T14:04:58Z

This happens on compel.build_conditioning_tensor()

damian0815 · 2023-04-14T00:12:18Z

i spent some time looking into this, it is in fact an expected warning that i don't have any control over. token sequences longer than 77 tokens are not a supported usage of the stable diffusion model (i assume you're using SD) - that it works at all is a quirk.

BEpresent · 2023-06-01T14:09:12Z

Is the sequence truncated over 77 tokens in Compel ? Would a method such as this here make sense ? @damian0815

damian0815 · 2023-06-01T15:39:56Z

@BEpresent you can pass an argument, i think it's truncate=False, to Compel __init__ and it will turn of truncation

BEpresent · 2023-06-15T13:07:51Z

@BEpresent you can pass an argument, i think it's truncate=False, to Compel __init__ and it will turn of truncation

Could it be this?

compel = Compel(..., truncate_long_prompts=False)

Even when initializing it like this, I still get

Token indices sequence length is longer than the specified maximum sequence length for this model (85 > 77). Running this sequence through the model will result in indexing errors.

damian0815 · 2023-06-16T17:40:52Z

i wonder if this means i'm suppose to slice the prompt into segments before sending it to the text encoder?

can you try slicing your prompt in half somewhere and using the new .and() syntax, something like this. if your old prompt was a b c, d, e f change it to ("a b c", "d, e f").and(). doesn't matter where you slice it exactly, the two parts should be <75 tokens. it will produce different output, but please check if you notice that tokens after the 75 token cutoff are being paid attention to better this way.

BEpresent · 2023-06-19T12:12:15Z

thanks, just for clarification, would the .and() syntax work also with the conditioning tensors and they would be passed in in chunks ?

So instead of this

        conditioning = self.compel.build_conditioning_tensor(long_prompt)
        negative_conditioning = self.compel.build_conditioning_tensor(long_negative_prompt)
        conditioning, negative_conditioning = self.compel.pad_conditioning_tensors_to_same_length([conditioning, negative_conditioning])

Something like this?

        conditioning = self.compel.build_conditioning_tensor(("a b c", "d, e f").and())
        negative_conditioning = self.compel.build_conditioning_tensor(("a b c", "d, e f").and())
        conditioning, negative_conditioning = self.compel.pad_conditioning_tensors_to_same_length([conditioning, negative_conditioning])

Could I use tiktoken to calculate tokens or some other method to count tokens ?

damian0815 · 2023-06-20T08:14:59Z

close you need to put the '.and()' inside the prompt string: self.compel.build_conditioning_tensor('("a b c", "d, e f").and()')

compel has methods to count tokens, you can call compel.describe_tokenization() or compel.get_tokens() and check the lenght of the returned arrays. note though that this does not account for syntax - if you call compel.get_tokens() on a prompt like a cat++ playing with a ball the ++ will be included as two extra tokens, whereas when you run compel "for real" they will be applied and removed from the token sequence

damian0815 closed this as completed Apr 14, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clarification needed: `Token indices sequence length is longer ...` during inference #20

Clarification needed: `Token indices sequence length is longer ...` during inference #20

detkov commented Apr 11, 2023

damian0815 commented Apr 11, 2023

detkov commented Apr 12, 2023

damian0815 commented Apr 12, 2023

detkov commented Apr 12, 2023

damian0815 commented Apr 14, 2023 •

edited

Loading

BEpresent commented Jun 1, 2023

damian0815 commented Jun 1, 2023

BEpresent commented Jun 15, 2023

damian0815 commented Jun 16, 2023

BEpresent commented Jun 19, 2023 •

edited

Loading

damian0815 commented Jun 20, 2023

Clarification needed: Token indices sequence length is longer ... during inference #20

Clarification needed: Token indices sequence length is longer ... during inference #20

Comments

detkov commented Apr 11, 2023

damian0815 commented Apr 11, 2023

detkov commented Apr 12, 2023

damian0815 commented Apr 12, 2023

detkov commented Apr 12, 2023

damian0815 commented Apr 14, 2023 • edited Loading

BEpresent commented Jun 1, 2023

damian0815 commented Jun 1, 2023

BEpresent commented Jun 15, 2023

damian0815 commented Jun 16, 2023

BEpresent commented Jun 19, 2023 • edited Loading

damian0815 commented Jun 20, 2023

Clarification needed: `Token indices sequence length is longer ...` during inference #20

Clarification needed: `Token indices sequence length is longer ...` during inference #20

damian0815 commented Apr 14, 2023 •

edited

Loading

BEpresent commented Jun 19, 2023 •

edited

Loading