Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FEAT: add codestral-v0.1 #1575

Merged
merged 3 commits into from
Jun 5, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/source/getting_started/installation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,7 @@ Currently, supported models include:
- ``baichuan``, ``baichuan-chat``, ``baichuan-2-chat``
- ``internlm-16k``, ``internlm-chat-7b``, ``internlm-chat-8k``, ``internlm-chat-20b``
- ``mistral-v0.1``, ``mistral-instruct-v0.1``, ``mistral-instruct-v0.2``, ``mistral-instruct-v0.3``
- ``codestral-v0.1``
- ``Yi``, ``Yi-1.5``, ``Yi-chat``, ``Yi-1.5-chat``, ``Yi-1.5-chat-16k``
- ``code-llama``, ``code-llama-python``, ``code-llama-instruct``
- ``deepseek``, ``deepseek-coder``, ``deepseek-chat``, ``deepseek-coder-instruct``
Expand Down
47 changes: 47 additions & 0 deletions doc/source/models/builtin/llm/codestral-v0.1.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
.. _models_llm_codestral-v0.1:

========================================
codestral-v0.1
========================================

- **Context Length:** 32768
- **Model Name:** codestral-v0.1
- **Languages:** en
- **Abilities:** generate
- **Description:** Codestrall-22B-v0.1 is trained on a diverse dataset of 80+ programming languages, including the most popular ones, such as Python, Java, C, C++, JavaScript, and Bash

Specifications
^^^^^^^^^^^^^^


Model Spec 1 (pytorch, 22 Billion)
++++++++++++++++++++++++++++++++++++++++

- **Model Format:** pytorch
- **Model Size (in billions):** 22
- **Quantizations:** 4-bit, 8-bit, none
- **Engines**: vLLM, Transformers (vLLM only available for quantization none)
- **Model ID:** mistralai/Mistral-7B-Instruct-v0.2
- **Model Hubs**: `Hugging Face <https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2>`__

Execute the following command to launch the model, remember to replace ``${quantization}`` with your
chosen quantization method from the options listed above::

xinference launch --model-engine ${engine} --model-name codestral-v0.1 --size-in-billions 22 --model-format pytorch --quantization ${quantization}


Model Spec 2 (ggufv2, 22 Billion)
++++++++++++++++++++++++++++++++++++++++

- **Model Format:** ggufv2
- **Model Size (in billions):** 22
- **Quantizations:** Q2_K, Q3_K_S, Q3_K_M, Q3_K_L, Q4_K_S, Q4_K_M, Q5_K_S, Q5_K_M, Q6_K, Q8_0
- **Engines**: llama.cpp
- **Model ID:** bartowski/Codestral-22B-v0.1-GGUF
- **Model Hubs**: `Hugging Face <https://huggingface.co/bartowski/Codestral-22B-v0.1-GGUF>`__

Execute the following command to launch the model, remember to replace ``${quantization}`` with your
chosen quantization method from the options listed above::

xinference launch --model-engine ${engine} --model-name codestral-v0.1 --size-in-billions 22 --model-format ggufv2 --quantization ${quantization}

14 changes: 14 additions & 0 deletions doc/source/models/builtin/llm/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -126,6 +126,11 @@ The following is a list of built-in LLM in Xinference:
- 8194
- CodeShell is a multi-language code LLM developed by the Knowledge Computing Lab of Peking University.

* - :ref:`codestral-v0.1 <models_llm_codestral-v0.1>`
- generate
- 32768
- Codestrall-22B-v0.1 is trained on a diverse dataset of 80+ programming languages, including the most popular ones, such as Python, Java, C, C++, JavaScript, and Bash

* - :ref:`cogvlm2 <models_llm_cogvlm2>`
- chat, vision
- 8192
Expand Down Expand Up @@ -276,6 +281,11 @@ The following is a list of built-in LLM in Xinference:
- 4096
- MiniCPM is an End-Size LLM developed by ModelBest Inc. and TsinghuaNLP, with only 2.4B parameters excluding embeddings.

* - :ref:`minicpm-llama3-v-2_5 <models_llm_minicpm-llama3-v-2_5>`
- chat, vision
- 2048
- MiniCPM-Llama3-V 2.5 is the latest model in the MiniCPM-V series. The model is built on SigLip-400M and Llama3-8B-Instruct with a total of 8B parameters.

* - :ref:`mistral-instruct-v0.1 <models_llm_mistral-instruct-v0.1>`
- chat
- 8192
Expand Down Expand Up @@ -570,6 +580,8 @@ The following is a list of built-in LLM in Xinference:

codeshell-chat

codestral-v0.1

cogvlm2

deepseek
Expand Down Expand Up @@ -630,6 +642,8 @@ The following is a list of built-in LLM in Xinference:

minicpm-2b-sft-fp32

minicpm-llama3-v-2_5

mistral-instruct-v0.1

mistral-instruct-v0.2
Expand Down
9 changes: 6 additions & 3 deletions doc/source/models/builtin/llm/internvl-chat.rst
Original file line number Diff line number Diff line change
Expand Up @@ -20,13 +20,14 @@ Model Spec 1 (pytorch, 2 Billion)
- **Model Format:** pytorch
- **Model Size (in billions):** 2
- **Quantizations:** none
- **Engines**: Transformers
- **Model ID:** OpenGVLab/Mini-InternVL-Chat-2B-V1-5
- **Model Hubs**: `Hugging Face <https://huggingface.co/OpenGVLab/Mini-InternVL-Chat-2B-V1-5>`__

Execute the following command to launch the model, remember to replace ``${quantization}`` with your
chosen quantization method from the options listed above::

xinference launch --model-name internvl-chat --size-in-billions 2 --model-format pytorch --quantization ${quantization}
xinference launch --model-engine ${engine} --model-name internvl-chat --size-in-billions 2 --model-format pytorch --quantization ${quantization}


Model Spec 2 (pytorch, 26 Billion)
Expand All @@ -35,13 +36,14 @@ Model Spec 2 (pytorch, 26 Billion)
- **Model Format:** pytorch
- **Model Size (in billions):** 26
- **Quantizations:** none
- **Engines**: Transformers
- **Model ID:** OpenGVLab/InternVL-Chat-V1-5
- **Model Hubs**: `Hugging Face <https://huggingface.co/OpenGVLab/InternVL-Chat-V1-5>`__, `ModelScope <https://modelscope.cn/models/AI-ModelScope/InternVL-Chat-V1-5-{quantization}>`__

Execute the following command to launch the model, remember to replace ``${quantization}`` with your
chosen quantization method from the options listed above::

xinference launch --model-name internvl-chat --size-in-billions 26 --model-format pytorch --quantization ${quantization}
xinference launch --model-engine ${engine} --model-name internvl-chat --size-in-billions 26 --model-format pytorch --quantization ${quantization}


Model Spec 3 (pytorch, 26 Billion)
Expand All @@ -50,11 +52,12 @@ Model Spec 3 (pytorch, 26 Billion)
- **Model Format:** pytorch
- **Model Size (in billions):** 26
- **Quantizations:** Int8
- **Engines**: Transformers
- **Model ID:** OpenGVLab/InternVL-Chat-V1-5-{quantization}
- **Model Hubs**: `Hugging Face <https://huggingface.co/OpenGVLab/InternVL-Chat-V1-5-{quantization}>`__, `ModelScope <https://modelscope.cn/models/AI-ModelScope/InternVL-Chat-V1-5-{quantization}>`__

Execute the following command to launch the model, remember to replace ``${quantization}`` with your
chosen quantization method from the options listed above::

xinference launch --model-name internvl-chat --size-in-billions 26 --model-format pytorch --quantization ${quantization}
xinference launch --model-engine ${engine} --model-name internvl-chat --size-in-billions 26 --model-format pytorch --quantization ${quantization}

6 changes: 3 additions & 3 deletions doc/source/models/builtin/llm/telechat.rst
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,7 @@ Model Spec 3 (pytorch, 12 Billion)
- **Quantizations:** 4-bit, 8-bit, none
- **Engines**: Transformers
- **Model ID:** Tele-AI/TeleChat-12B
- **Model Hubs**: `Hugging Face <https://huggingface.co/Tele-AI/TeleChat-12B>`__, `ModelScope <https://modelscope.cn/models/Tele-AI/TeleChat-12B>`__
- **Model Hubs**: `Hugging Face <https://huggingface.co/Tele-AI/TeleChat-12B>`__, `ModelScope <https://modelscope.cn/models/TeleAI/TeleChat-12B>`__

Execute the following command to launch the model, remember to replace ``${quantization}`` with your
chosen quantization method from the options listed above::
Expand All @@ -70,7 +70,7 @@ Model Spec 4 (gptq, 12 Billion)
- **Quantizations:** int4, int8
- **Engines**: Transformers
- **Model ID:** Tele-AI/TeleChat-12B-{quantization}
- **Model Hubs**: `Hugging Face <https://huggingface.co/Tele-AI/TeleChat-12B-{quantization}>`__, `ModelScope <https://modelscope.cn/models/Tele-AI/TeleChat-12B-{quantization}>`__
- **Model Hubs**: `Hugging Face <https://huggingface.co/Tele-AI/TeleChat-12B-{quantization}>`__, `ModelScope <https://modelscope.cn/models/TeleAI/TeleChat-12B-{quantization}>`__

Execute the following command to launch the model, remember to replace ``${quantization}`` with your
chosen quantization method from the options listed above::
Expand All @@ -86,7 +86,7 @@ Model Spec 5 (pytorch, 52 Billion)
- **Quantizations:** 4-bit, 8-bit, none
- **Engines**: Transformers
- **Model ID:** Tele-AI/TeleChat-52B
- **Model Hubs**: `Hugging Face <https://huggingface.co/Tele-AI/TeleChat-52B>`__, `ModelScope <https://modelscope.cn/models/Tele-AI/TeleChat-52B>`__
- **Model Hubs**: `Hugging Face <https://huggingface.co/Tele-AI/TeleChat-52B>`__, `ModelScope <https://modelscope.cn/models/TeleAI/TeleChat-52B>`__

Execute the following command to launch the model, remember to replace ``${quantization}`` with your
chosen quantization method from the options listed above::
Expand Down
1 change: 1 addition & 0 deletions doc/source/user_guide/backends.rst
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,7 @@ Currently, supported model includes:
- ``baichuan``, ``baichuan-chat``, ``baichuan-2-chat``
- ``internlm-16k``, ``internlm-chat-7b``, ``internlm-chat-8k``, ``internlm-chat-20b``
- ``mistral-v0.1``, ``mistral-instruct-v0.1``, ``mistral-instruct-v0.2``, ``mistral-instruct-v0.3``
- ``codestral-v0.1``
- ``Yi``, ``Yi-1.5``, ``Yi-chat``, ``Yi-1.5-chat``, ``Yi-1.5-chat-16k``
- ``code-llama``, ``code-llama-python``, ``code-llama-instruct``
- ``deepseek``, ``deepseek-coder``, ``deepseek-chat``, ``deepseek-coder-instruct``
Expand Down
43 changes: 43 additions & 0 deletions xinference/model/llm/llm_family.json
Original file line number Diff line number Diff line change
Expand Up @@ -3417,6 +3417,49 @@
]
}
},
{
"version": 1,
"context_length": 32768,
"model_name": "codestral-v0.1",
"model_lang": [
"en"
],
"model_ability": [
"generate"
],
"model_description": "Codestrall-22B-v0.1 is trained on a diverse dataset of 80+ programming languages, including the most popular ones, such as Python, Java, C, C++, JavaScript, and Bash",
"model_specs": [
{
"model_format": "pytorch",
"model_size_in_billions": 22,
"quantizations": [
"4-bit",
"8-bit",
"none"
],
"model_id": "mistralai/Mistral-7B-Instruct-v0.2",
"model_revision": "9552e7b1d9b2d5bbd87a5aa7221817285dbb6366"
},
{
"model_format": "ggufv2",
"model_size_in_billions": 22,
"quantizations": [
"Q2_K",
"Q3_K_S",
"Q3_K_M",
"Q3_K_L",
"Q4_K_S",
"Q4_K_M",
"Q5_K_S",
"Q5_K_M",
"Q6_K",
"Q8_0"
],
"model_id": "bartowski/Codestral-22B-v0.1-GGUF",
"model_file_name_template": "Codestral-22B-v0.1-{quantization}.gguf"
}
]
},
{
"version": 1,
"context_length": 8192,
Expand Down
2 changes: 1 addition & 1 deletion xinference/model/llm/pytorch/chatglm.py
Original file line number Diff line number Diff line change
Expand Up @@ -83,7 +83,7 @@ def match(
if llm_spec.model_format != "pytorch":
return False
model_family = llm_family.model_family or llm_family.model_name
if "chatglm" not in model_family or "glm4" not in model_family:
if "chatglm" not in model_family and "glm4" not in model_family:
return False
if "chat" not in llm_family.model_ability:
return False
Expand Down
5 changes: 5 additions & 0 deletions xinference/model/llm/pytorch/core.py
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,11 @@
"chatglm2",
"chatglm2-32k",
"chatglm2-128k",
"chatglm3",
"chatglm3-32k",
"chatglm3-128k",
"glm4-chat",
"glm4-chat-1m",
"llama-2",
"llama-2-chat",
"internlm2-chat",
Expand Down
1 change: 1 addition & 0 deletions xinference/model/llm/vllm/core.py
Original file line number Diff line number Diff line change
Expand Up @@ -93,6 +93,7 @@ class VLLMGenerateConfig(TypedDict, total=False):
"baichuan",
"internlm-16k",
"mistral-v0.1",
"codestral-v0.1",
"Yi",
"Yi-1.5",
"code-llama",
Expand Down
Loading