[FIX] Add _encode_pair() to vllm_causallms.py #1108
Closed
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Most models using the Llama tokenizer currently show drasticly different evaluation results when tested with the new vLLM implementation. The culprit is identical to a, previous solved, llama tokenization issue with the HF implemetation in this PR.
The _encode_pair() method is used for the HuggingFace model, where
whole_enc
andcontext_enc
are encoded first before settingcontinuation_enc
to bewhole_enc[len(context_enc):]
, while basic batch encoding is used for vLLM.The current vLLM implementation does the encoding of the pair by just running both the inputs to the tokenizer as such:
This inconsistency reflects strongly on the results of MMLU due to the continuation encoding having an extra empty string token in the current implementation, while the _encode_pair way returns only the token of the letter.
The following are normalized accuracies for 0 shot evaluations on
meta-llama/Llama-2-7b-hf
:This modification also address this issue, since
GeneZC/MiniMA-3B
uses theLlamaTokenizer
.From our testing, it seems that this discrepancy appears in most models that use
LlamaTokenizer
, such asmeta-llama/Llama-2-7b
andopenlm-research/open_llama_7b_v2
. However, some models, like01-ai/Yi-6B
, do not show this issue. The only notable difference in the tokenizer config is that they setadd_bos_token=False
, which is not the default.