-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Some Extra Comments #1
Comments
Hey @nablabits thanks for reaching out! And thanks for trying ... I'm going to dig on this on the two mental models ( num beams and search and get back to you) Thanks for the contrib π |
Hello there @nablabits . SOLVED If we go to generate num_beams arg definition, if it is not declared as a parameter it means that there is no beam search. When we go to the developer guide, what Im taking -please correct me if Im wrong- is that Beam Search reduces the risk of missing high probability word sequences eventually chooses the one with highest probability . If we go to the figure displayed at the end of the section, we see that is "less surprising" - If we go to other method, it selects another word, not the highest one. Then we to the SequenceBiasLogitsProcessor documentation . The bias is applied to the last token of a sequence when the next generated token can complete it. Consequently, to take the most of biasing sequences with more than one token, consider using beam methods (to gracefully work around partially completed sequences that have a negative bias) and applying the bias to their prefixes (to ensure the bias is applied earlier) Also, At a high level, what Im understanding is that the class we are exploring , NoBadWordsLogitProcessor , is a subtype of SequenceBiasLogitProcessor in which that bias applied but with the sequence parameter set to -β --- not really sure if it's because of the bad_words_ids or because of the beam search sampling, though --- So what I think ItΒ΄s happening, itΒ΄s that Beam search is selecting the "less surprising and most probable generation" and then the argument |
Wow, thanks for sharing those resources, my understanding is that the beam method computes the joint prob for all the words in the beam making some relevant tokens appear even if they are followed by banned ones (which fits into your less surprising and most probable generation to some extent) , whereas, if you are on a greedy search once you have in your sequence a sink you are lost, does this make sense to you? I tried your new example and it's still not working on my end, but I think I found why, it has to do with inputs = tokenizer(["Margaret is outstanding, well known as a"], return_tensors="pt")
# SequenceBiasLogitsProcessor way
def get_tokens_as_tuple(word):
return tuple(tokenizer_with_prefix_space([word], add_special_tokens=False).input_ids[0])
sequence_bias = {
get_tokens_as_tuple("great"): float("-inf"),
get_tokens_as_tuple("writer"): float("-inf")
}
biased_ids = model.generate(inputs["input_ids"], max_new_tokens=3, sequence_bias=sequence_bias)
print(tokenizer.batch_decode(biased_ids, skip_special_tokens=True)[0])
# NoBadWordsLogitsProcessor flavour
def get_tokens_as_list(word_list):
"""Converts a sequence of words into a list of tokens """
tokens_list = []
for word in word_list.split(" "):
tokenized_word = tokenizer_with_prefix_space([word], add_special_tokens=False).input_ids[0]
tokens_list.append(tokenized_word)
return tokens_list
word_list = "great writer"
bad_words_ids = get_tokens_as_list(word_list=word_list)
badwords_ids2 = model.generate(inputs["input_ids"], max_new_tokens=3, bad_words_ids=bad_words_ids, eos_token_id=tokenizer_with_prefix_space.eos_token_id)
print(tokenizer.batch_decode(badwords_ids2, skip_special_tokens=True)[0]) Ez horregaitik π |
Hey π . Thanks. Now it works How would you feel if I add you to the PR as a contributor somehow so we present this jointly ? You solved the one-word-sequence in the function ( I think the cake example solves the multiple-word-sequence ) and provided useful feedback + kindly tried all stuff . LMK if works for you π If that works let me find a way ( jointly PR, or add you as a reviewer, I have to check how to put in the PR π₯°) |
" You come into my repo , the day we close the first good issue and you ask me to do a future collaboration - in the Open , hopefully " Count on it :) @nablabits |
I really liked the care you put into defining the problem (and the picture π€©), thanks for sharing β¨
Sadly, I didn't manage to get the example working on my side. Adding
num_beams=6
like gante did in his example seems to remove the words, not really sure if it's because of thebad_words_ids
or because of the beam search sampling, though π€(I decided to open this thread here just in case we further discuss/learn on this topic we won't flood the main thread on HF, hope it makes sense)
The text was updated successfully, but these errors were encountered: