Architecture | Models | Example HuggingFace Models |
---|---|---|
ChatGLMModel |
ChatGLM | |
GemmaForCausalLM |
Gemma | |
GPTNeoXForCausalLM |
Dolly | |
RedPajama | ||
LlamaForCausalLM |
Llama 3 | |
Llama 2 | ||
OpenLLaMA | ||
TinyLlama | ||
MistralForCausalLM |
Mistral | |
Notus | ||
Zephyr | ||
PhiForCausalLM |
Phi | |
QWenLMHeadModel |
Qwen |
The pipeline can work with other similar topologies produced by optimum-intel
with the same model signature. The model is required to have the following inputs after the conversion:
input_ids
contains the tokens.attention_mask
is filled with1
.beam_idx
selects beams.position_ids
(optional) encodes a position of currently generating token in the sequence and a singlelogits
output.
Note
Models should belong to the same family and have the same tokenizers.
Some models may require access request submission on the Hugging Face page to be downloaded.
If https://huggingface.co/ is down, the conversion step won't be able to download the models.