This demo showcases how to use a pretrained model to generate Python code based on an image and a text prompt.

Here's a step-by-step explanation:

Imports and Setup:
- The necessary libraries and modules are imported, including requests, PIL for image processing, and transformers for handling the model and processing.
Loading and Displaying the Image:
- An image file (demo.png) is opened using the PIL library and displayed.
Defining the Prompt:
- A message is created that includes the image and a request to generate Python code to process the image and save it using plt (matplotlib).
Loading the Processor:
- The AutoProcessor is loaded from a pretrained model specified by the out_dir directory. This processor will handle the text and image inputs.
Creating the Prompt:
- The apply_chat_template method is used to format the message into a prompt suitable for the model.
Processing the Inputs:
- The prompt and image are processed into tensors that the model can understand.
Setting Generation Arguments:
- Arguments for the model's generation process are defined, including the maximum number of new tokens to generate and whether to sample the output.
Generating the Code:
- The model generates the Python code based on the inputs and generation arguments. The TextStreamer is used to handle the output, skipping the prompt and special tokens.
Output:
- The generated code is printed, which should include Python code to process the image and save it as specified in the prompt.

This demo illustrates how to leverage a pretrained model using OpenVino to generate code dynamically based on user input and images.

Provide feedback

Saved searches