[New Model]: Florence-2 #5934

localbarrage · 2024-06-27T21:11:37Z

The model to consider.

https://huggingface.co/microsoft/Florence-2-base

The closest model vllm already supports.

phi-3v , its a vlm

What's your difficulty of supporting the model you want?

No response

chandeldivyam · 2024-07-01T20:30:24Z

@DarkLight1337 Anyone working on this?

DarkLight1337 · 2024-07-02T00:51:21Z

No, but please wait for #5852 and #5276 to land first as they involve significant API changes for devs. In the meantime, you can take a look at at this guide to get an idea of how to implement a new model.

chandeldivyam · 2024-07-02T03:30:48Z

Thanks, checking the guide and the previous PRs of adding phi3-vision, also #5276

fcakyon · 2024-08-16T21:06:30Z

Both #5852 and #5276 is merged. Do you still have plans to work on this PR @chandeldivyam ?

chandeldivyam · 2024-08-17T00:13:02Z

@fcakyon Thanks for the reminder, it actually slipped my mind. Yes, I need florence-2 for a project I was working on. So, as an alternative for quick prototyping, I created a flask server but it is not the ideal solution. I will pick it up in the next week. Thanks!

Are you working on something that would need it?

fcakyon · 2024-08-17T10:05:54Z

@chandeldivyam Yes, I also need such a solution for my work. I'm trying to utilize https://github.com/Lightning-AI/LitServe since I only have a little experience with the vllm-project.

chandeldivyam · 2024-08-19T04:24:38Z

@fcakyon have you looked into any benchmarking for litserve? Also, I think using vllm would make sense if there are ton of parallel requests right?

pseudotensor · 2024-08-24T23:15:40Z

@chandeldivyam Would be great to see florence-2 in vllm.

bhavnicksm · 2024-09-05T21:34:09Z

Hey @chandeldivyam,
Is there a PR already to track the progress on Florence-2?
Would be great to have Florence-2 with vllm 😀

SteveKo837 · 2024-09-06T06:16:03Z

Since there's been no update on this issue, this week I referred to the guide here and looked at how to add Phi3-vision to vLLM. I implemented the registry, but I ran into the following issue:

File "/app/vllm/entrypoints/llm.py", line 177, in __init__
    self.llm_engine = LLMEngine.from_engine_args(
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/vllm/engine/llm_engine.py", line 541, in from_engine_args
    engine = cls(
             ^^^^
  File "/app/vllm/engine/llm_engine.py", line 302, in __init__
    self.model_executor = executor_class(
                          ^^^^^^^^^^^^^^^
  File "/app/vllm/executor/executor_base.py", line 47, in __init__
    self._init_executor()
  File "/app/vllm/executor/gpu_executor.py", line 38, in _init_executor
    self.driver_worker = self._create_worker()
                         ^^^^^^^^^^^^^^^^^^^^^
  File "/app/vllm/executor/gpu_executor.py", line 105, in _create_worker
    return create_worker(**self._get_create_worker_kwargs(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/vllm/executor/gpu_executor.py", line 24, in create_worker
    wrapper.init_worker(**kwargs)
  File "/app/vllm/worker/worker_base.py", line 449, in init_worker
    self.worker = worker_class(*args, **kwargs)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/vllm/worker/worker.py", line 101, in __init__
    self.model_runner: GPUModelRunnerBase = ModelRunnerClass(
                                            ^^^^^^^^^^^^^^^^^
  File "/app/vllm/worker/enc_dec_model_runner.py", line 115, in __init__
    assert_enc_dec_mr_supported_scenario(self)
  File "/app/vllm/worker/utils.py", line 43, in assert_enc_dec_mr_supported_scenario
    raise NotImplementedError(
NotImplementedError: Multimodal is not currently supported with encoder/decoder models.

This error indicates that the Florence2 configuration has is_encoder_decoder:true, but the current EncoderDecoderModelRunner does not support multimodal. I think finding a workaround will be difficult since we really need this support. Can anyone give advice or suggest what to do next?

DarkLight1337 · 2024-09-06T06:35:06Z

This error indicates that the Florence2 configuration has is_encoder_decoder:true, but the current EncoderDecoderModelRunner does not support multimodal. I think finding a workaround will be difficult since we really need this support. Can anyone give advice or suggest what to do next?

If only the language part of the model is using encoder-decoder (i.e. there is no cross-attention between text and visual features), then you can try implementing only the language part in vLLM first.

SteveKo837 · 2024-09-06T07:31:45Z

This error indicates that the Florence2 configuration has is_encoder_decoder:true, but the current EncoderDecoderModelRunner does not support multimodal. I think finding a workaround will be difficult since we really need this support. Can anyone give advice or suggest what to do next?

If only the language part of the model is using encoder-decoder (i.e. there is no cross-attention between text and visual features), then you can try implementing only the language part in vLLM first.

@DarkLight1337, thanks for your comment. I think I understand, and it seems feasible. Since Florence2 only uses the encoder-decoder for the language part, specifically in the Florence2LanguageModel class, I can implement the language part and the vision part (DaViT) separately, then combine them later. I just need to organize the massive 2800 lines in the original modeling_florence.py file properly.

Akhilrajeevp · 2024-10-11T12:27:16Z

Hey whats the update on this one?How to do i Run florence 2 using vllm?

joaomsimoes · 2024-10-15T04:59:48Z

+1

github-actions · 2025-01-14T01:57:25Z

This issue has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this issue should remain open. Thank you!

localbarrage added the new model Requests to new models label Jun 27, 2024

localbarrage changed the title ~~[New Model]:~~ [New Model]: Florence-2 Jun 27, 2024

DarkLight1337 mentioned this issue Jun 28, 2024

[RFC]: Multi-modality Support on vLLM #4194

Open

86 tasks

DarkLight1337 mentioned this issue Sep 17, 2024

[RFC]: Encoder/decoder models & feature compatibility #7366

Open

rmccorm4 mentioned this issue Oct 29, 2024

Unrecognized configuration class to build an AutoTokenizer for microsoft/Florence-2-base-ft triton-inference-server/server#7726

Closed

github-actions bot added the stale label Jan 14, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[New Model]: Florence-2 #5934

[New Model]: Florence-2 #5934

localbarrage commented Jun 27, 2024

chandeldivyam commented Jul 1, 2024

DarkLight1337 commented Jul 2, 2024

chandeldivyam commented Jul 2, 2024

fcakyon commented Aug 16, 2024

chandeldivyam commented Aug 17, 2024

fcakyon commented Aug 17, 2024 •

edited

Loading

chandeldivyam commented Aug 19, 2024

pseudotensor commented Aug 24, 2024

bhavnicksm commented Sep 5, 2024

SteveKo837 commented Sep 6, 2024 •

edited

Loading

DarkLight1337 commented Sep 6, 2024

SteveKo837 commented Sep 6, 2024 •

edited

Loading

Akhilrajeevp commented Oct 11, 2024

joaomsimoes commented Oct 15, 2024

github-actions bot commented Jan 14, 2025

[New Model]: Florence-2 #5934

[New Model]: Florence-2 #5934

Comments

localbarrage commented Jun 27, 2024

The model to consider.

The closest model vllm already supports.

What's your difficulty of supporting the model you want?

chandeldivyam commented Jul 1, 2024

DarkLight1337 commented Jul 2, 2024

chandeldivyam commented Jul 2, 2024

fcakyon commented Aug 16, 2024

chandeldivyam commented Aug 17, 2024

fcakyon commented Aug 17, 2024 • edited Loading

chandeldivyam commented Aug 19, 2024

pseudotensor commented Aug 24, 2024

bhavnicksm commented Sep 5, 2024

SteveKo837 commented Sep 6, 2024 • edited Loading

DarkLight1337 commented Sep 6, 2024

SteveKo837 commented Sep 6, 2024 • edited Loading

Akhilrajeevp commented Oct 11, 2024

joaomsimoes commented Oct 15, 2024

github-actions bot commented Jan 14, 2025

fcakyon commented Aug 17, 2024 •

edited

Loading

SteveKo837 commented Sep 6, 2024 •

edited

Loading

SteveKo837 commented Sep 6, 2024 •

edited

Loading