Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some Extra Comments #1

Closed
nablabits opened this issue Jul 19, 2023 · 6 comments
Closed

Some Extra Comments #1

nablabits opened this issue Jul 19, 2023 · 6 comments
Labels
good first issue Good for newcomers

Comments

@nablabits
Copy link

I really liked the care you put into defining the problem (and the picture 🀩), thanks for sharing ✨

Sadly, I didn't manage to get the example working on my side. Adding num_beams=6 like gante did in his example seems to remove the words, not really sure if it's because of the bad_words_ids or because of the beam search sampling, though πŸ€”

(I decided to open this thread here just in case we further discuss/learn on this topic we won't flood the main thread on HF, hope it makes sense)

@SoyGema
Copy link
Owner

SoyGema commented Jul 19, 2023

Hey @nablabits thanks for reaching out! And thanks for trying ... I'm going to dig on this on the two mental models ( num beams and search and get back to you)
Sure !πŸ’« Keeping the discussion here is totally logical.

Thanks for the contrib πŸ˜€

@SoyGema SoyGema added the good first issue Good for newcomers label Jul 19, 2023
@SoyGema
Copy link
Owner

SoyGema commented Jul 19, 2023

Hello there @nablabits .
I can confirm that the reproduction of the issue is correct . Reproduced the example without num_beams and it didnΒ΄t eliminate the word. When I introduced the parameter, it changes the generation.

SOLVED
EDITED : It was a matter of taking all the sequence as a part of the forbidden part.

If we go to generate num_beams arg definition, if it is not declared as a parameter it means that there is no beam search. When we go to the developer guide, what Im taking -please correct me if Im wrong- is that Beam Search reduces the risk of missing high probability word sequences eventually chooses the one with highest probability . If we go to the figure displayed at the end of the section, we see that is "less surprising" - If we go to other method, it selects another word, not the highest one.

Then we to the SequenceBiasLogitsProcessor documentation .

The bias is applied to the last token of a sequence when the next generated token can complete it. Consequently, to take the most of biasing sequences with more than one token, consider using beam methods (to gracefully work around partially completed sequences that have a negative bias) and applying the bias to their prefixes (to ensure the bias is applied earlier)

Also, At a high level, what Im understanding is that the class we are exploring , NoBadWordsLogitProcessor , is a subtype of SequenceBiasLogitProcessor in which that bias applied but with the sequence parameter set to -∞

--- not really sure if it's because of the bad_words_ids or because of the beam search sampling, though ---

So what I think ItΒ΄s happening, itΒ΄s that Beam search is selecting the "less surprising and most probable generation" and then the argument bad_words_id prevents it from being generated. If we select None or another search Search method, ej Greedy, it might not end at the bad_word, therefore the generation continues as it was before because it doesn't match .

@nablabits
Copy link
Author

Wow, thanks for sharing those resources, my understanding is that the beam method computes the joint prob for all the words in the beam making some relevant tokens appear even if they are followed by banned ones (which fits into your less surprising and most probable generation to some extent) , whereas, if you are on a greedy search once you have in your sequence a sink you are lost, does this make sense to you?

I tried your new example and it's still not working on my end, but I think I found why, it has to do with get_tokens_as_list() as there the word_list gets split by character instead of words, therefore just adding for word in word_list.split(" ") will do the trick making bad_words indistinguishable from sequence_biases:

inputs = tokenizer(["Margaret is outstanding, well known as a"], return_tensors="pt")

# SequenceBiasLogitsProcessor way
def get_tokens_as_tuple(word):
    return tuple(tokenizer_with_prefix_space([word], add_special_tokens=False).input_ids[0])

sequence_bias = {
    get_tokens_as_tuple("great"): float("-inf"),
    get_tokens_as_tuple("writer"): float("-inf")
}
biased_ids = model.generate(inputs["input_ids"], max_new_tokens=3, sequence_bias=sequence_bias)
print(tokenizer.batch_decode(biased_ids, skip_special_tokens=True)[0])

# NoBadWordsLogitsProcessor flavour
def get_tokens_as_list(word_list):
    """Converts a sequence of words into a list of tokens """
    tokens_list = []
    for word in word_list.split(" "):
        tokenized_word = tokenizer_with_prefix_space([word], add_special_tokens=False).input_ids[0]
        tokens_list.append(tokenized_word)
    return tokens_list

word_list = "great writer"
bad_words_ids = get_tokens_as_list(word_list=word_list)

badwords_ids2 = model.generate(inputs["input_ids"], max_new_tokens=3, bad_words_ids=bad_words_ids, eos_token_id=tokenizer_with_prefix_space.eos_token_id)

print(tokenizer.batch_decode(badwords_ids2, skip_special_tokens=True)[0])

Ez horregaitik πŸ˜‰

@SoyGema
Copy link
Owner

SoyGema commented Jul 20, 2023

Hey πŸ™‚ . Thanks. Now it works

How would you feel if I add you to the PR as a contributor somehow so we present this jointly ? You solved the one-word-sequence in the function ( I think the cake example solves the multiple-word-sequence ) and provided useful feedback + kindly tried all stuff . LMK if works for you πŸ™‚

If that works let me find a way ( jointly PR, or add you as a reviewer, I have to check how to put in the PR πŸ₯°)

@nablabits
Copy link
Author

Ahh, glad to hear that. I know that you knew the split bit all along and were testing me πŸ˜‰ (I hope I passed the test)

Greatly appreciated but, really, don't bother with that, I'm more than happy with the learning experience and the collaboration. Once said that, I rush to add that, some day and that day may never come I might request back this collaboration mwahahahah
img

@SoyGema
Copy link
Owner

SoyGema commented Jul 21, 2023

" You come into my repo , the day we close the first good issue and you ask me to do a future collaboration - in the Open , hopefully " Count on it :) @nablabits

GoodFirstIssue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

2 participants