Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Model] Initialize deepseek-vl support #5817

Open
wants to merge 66 commits into
base: main
Choose a base branch
from

Conversation

liuyancong-enflame-tech
Copy link

@liuyancong-enflame-tech liuyancong-enflame-tech commented Jun 25, 2024

Test On NVIDIA L40S

  • Test Example
from vllm import LLM
from PIL import Image


from vllm import SamplingParams

sample_params = SamplingParams(temperature=0, max_tokens=1024)

model_path = "/pretrained_models/deepseek-vl-7b-chat"


llm = LLM(
    model=model_path,
    max_model_len=3072,
)
print("model load finish")
prompt = f"You are a helpful language and vision assistant. You are able to understand the visual content that the user provides, and assist the user with a variety of tasks using natural language.\n User: <image_placeholder> Describe each stage of this image detail.\nAssistant:"

image = Image.open("/opt/TV/VLM01/tests/images/cherry_blossom.jpg")
image = image.convert("RGB")
outputs = llm.generate(
    {
        "prompt": prompt,
        "multi_modal_data": {"image": image},
    },
    sample_params,
)
for o in outputs:
    generated_text = o.outputs[0].text
    print(generated_text)

  • Output
The image captures a scene where a tall tower, which appears to be a communication or observation tower, is partially obscured by cherry blossom trees in full bloom. The tower is situated in the background, and the sky behind it is a clear blue.

The cherry blossom trees, which are in the foreground, are in various stages of bloom. Some branches are densely packed with pink blossoms, while others have fewer or no flowers at all. The blossoms are in various shades of pink, ranging from light to deep, indicating the progression of the bloom cycle.

The perspective of the image is from below, looking upwards towards the tower, which gives a sense of scale and grandeur to the tower. The branches of the cherry blossom trees frame the tower, creating a natural border that draws the viewer's eye towards the tower.

There are no discernible texts or other objects in the image. The focus is solely on the tower and the cherry blossom trees, with the blue sky providing a contrasting backdrop. The image does not contain any people or moving elements, suggesting a still, serene moment captured in time.

FIX #3356
FIX #4982

@liuyancong-enflame-tech
Copy link
Author

Contributed by enflame-tech

Copy link
Member

@DarkLight1337 DarkLight1337 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution! I have a few initial comments.

Apart from that, can you add a test case (similar to test_llava.py) to test the correctness of the model in CI?

vllm/model_executor/models/deepseek_vl.py Outdated Show resolved Hide resolved
vllm/model_executor/models/deepseek_vl.py Outdated Show resolved Hide resolved
@liuyancong-enflame-tech liuyancong-enflame-tech changed the title [Model] Initialize deepseek-vl-7b-chat support [Model] Initialize deepseek-vl support Jun 26, 2024
@liuyancong-enflame-tech
Copy link
Author

now support deepseek-ai/deepseek-vl-7b-chat deepseek-ai/deepseek-vl-1.3b-chat

@liuyancong-enflame-tech
Copy link
Author

This model depends on timm>=0.9.16, which depends on torch, but it will conflict with the dependencies of other third-party components and cause the pipeline to fail. Therefore, running this model requires additional installation. I don’t know if this is appropriate. In addition, it depends on many modules of timm, which is difficult to remove.

@DarkLight1337
Copy link
Member

This model depends on timm>=0.9.16, which depends on torch, but it will conflict with the dependencies of other third-party components and cause the pipeline to fail. Therefore, running this model requires additional installation. I don’t know if this is appropriate. In addition, it depends on many modules of timm, which is difficult to remove.

Can you implement the individual timm modules inside vLLM? (where possible, you should use vLLM-specific layers to improve the performance anyway)

@liuyancong-enflame-tech
Copy link
Author

This model depends on timm>=0.9.16, which depends on torch, but it will conflict with the dependencies of other third-party components and cause the pipeline to fail. Therefore, running this model requires additional installation. I don’t know if this is appropriate. In addition, it depends on many modules of timm, which is difficult to remove.

Can you implement the individual timm modules inside vLLM? (where possible, you should use vLLM-specific layers to improve the performance anyway)

OK,I will try to do this and I think it will take some time

@DarkLight1337
Copy link
Member

This model depends on timm>=0.9.16, which depends on torch, but it will conflict with the dependencies of other third-party components and cause the pipeline to fail. Therefore, running this model requires additional installation. I don’t know if this is appropriate. In addition, it depends on many modules of timm, which is difficult to remove.

Can you implement the individual timm modules inside vLLM? (where possible, you should use vLLM-specific layers to improve the performance anyway)

OK,I will try to do this and I think it will take some time

You can make use of our implementation of CLIPVisionModel to save some effort.

@DarkLight1337
Copy link
Member

To speed up the CI queue for #5905, I've cancelled the distributed tests for the latest CI run in this PR since they won't pass anyway until #5905 has been merged. Please merge main into your branch after that happens so that the CI can pass once again.

@liuyancong-enflame-tech
Copy link
Author

liuyancong-enflame-tech commented Jun 28, 2024

This test [tests/models/test_deepseek_vl.py] case depends on the project https://github.com/deepseek-ai/DeepSeek-VL, and it seems that pip installation will fail when build the docker .I think it is possible not to add this test case

And The [examples/deepseek_vl_example.py] can run successfully.

@DarkLight1337
Copy link
Member

DarkLight1337 commented Jul 1, 2024

This test [tests/models/test_deepseek_vl.py] case depends on the project https://github.com/deepseek-ai/DeepSeek-VL, and it seems that pip installation will fail when build the docker .I think it is possible not to add this test case

In this case it won't function for users of vLLM either since they can't install it (so you should still keep the tests). Can you figure out which dependency is causing the issue?

@liuyancong-enflame-tech
Copy link
Author

This test [tests/models/test_deepseek_vl.py] case depends on the project https://github.com/deepseek-ai/DeepSeek-VL, and it seems that pip installation will fail when build the docker .I think it is possible not to add this test case

In this case it won't function for users of vLLM either since they can't install it (so you should still keep the tests). Can you figure out which dependency is causing the issue?
In the tests/models/test_deepseek_vl.py test, when hf runner loads the model, if we do not import deepseek_vl, an error will occur
(ValueError: The checkpoint you are trying to load has model type multi_modality but Transformers does not recognize this architecture. This could be because of an issue with the checkpoint, or because your version of Transformers is out of date.)
This is because the model part of the code is not in the hf repository, but in the github repository. In the code: https://github.com/deepseek-ai/DeepSeek-VL/blob/main/deepseek_vl/models/modeling_vlm.py, we can see some registration codes:
AutoConfig.register("multi_modality", MultiModalityConfig)
AutoModelForCausalLM.register(MultiModalityConfig, MultiModalityCausalLM)
If I add the dependency in vllm\requirements-test.txt:
deepseek_vl@git+https://github.com/deepseek-ai/DeepSeek-VL.git@main
Then, docker will report an error when building the image. The error is that metadata is None and cannot be recognized. The package can be packaged into whl, but it will not be found when it is installed.

@DarkLight1337
Copy link
Member

DarkLight1337 commented Jul 1, 2024

You can manually register the model to HuggingFace inside the test case.

@liuyancong-enflame-tech
Copy link
Author

You can manually register the model to HuggingFace inside the test case.

Ok, I'll try it

@liuyancong-enflame-tech
Copy link
Author

The Test models/test_deepseek_vl.py failed, but no exception stack was thrown. I don’t know what happened. The program seems to be terminated. Have you encountered similar problems?

@DarkLight1337
Copy link
Member

The stack trace is shown near the end of the CI logs:

https://buildkite.com/vllm/ci-aws/builds/4404#0190956c-f526-40a4-b2af-232d40ffbd0c

@liuyancong-enflame-tech
Copy link
Author

buildkite/fastcheck/pr/tensorizer-metrics-tracing-test — Failed (exit status 1)
What functionality does this test case test?

@DarkLight1337
Copy link
Member

It's unrelated to this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[New Model]: DeepSeek VL DeepSeek VL support
2 participants