Skip to content

Commit

Permalink
feat: replace clip with siglip (#304)
Browse files Browse the repository at this point in the history
  • Loading branch information
hugohonda authored Nov 21, 2024
1 parent e932bb2 commit cc29d80
Show file tree
Hide file tree
Showing 3 changed files with 19 additions and 19 deletions.
36 changes: 18 additions & 18 deletions vision_agent/.sim_tools/df.csv
Original file line number Diff line number Diff line change
Expand Up @@ -80,24 +80,6 @@ desc,doc,name
{'label': 'hello world', 'bbox': [0.1, 0.11, 0.35, 0.4], 'score': 0.99},
]
",ocr
'clip' is a tool that can classify an image or a cropped detection given a list of input classes or tags. It returns the same list of the input classes along with their probability scores based on image content.,"clip(image: numpy.ndarray, classes: List[str]) -> Dict[str, Any]:
'clip' is a tool that can classify an image or a cropped detection given a list
of input classes or tags. It returns the same list of the input classes along with
their probability scores based on image content.

Parameters:
image (np.ndarray): The image to classify or tag
classes (List[str]): The list of classes or tags that is associated with the image

Returns:
Dict[str, Any]: A dictionary containing the labels and scores. One dictionary
contains a list of given labels and other a list of scores.

Example
-------
>>> clip(image, ['dog', 'cat', 'bird'])
{""labels"": [""dog"", ""cat"", ""bird""], ""scores"": [0.68, 0.30, 0.02]},
",clip
'vit_image_classification' is a tool that can classify an image. It returns a list of classes and their probability scores based on image content.,"vit_image_classification(image: numpy.ndarray) -> Dict[str, Any]:
'vit_image_classification' is a tool that can classify an image. It returns a
list of classes and their probability scores based on image content.
Expand Down Expand Up @@ -488,6 +470,24 @@ desc,doc,name
... )
>>> save_image(result, ""inpainted_room.png"")
",flux_image_inpainting
'siglip_classification' is a tool that can classify an image or a cropped detection given a list of input labels or tags. It returns the same list of the input labels along with their probability scores based on image content.,"siglip_classification(image: numpy.ndarray, labels: List[str]) -> Dict[str, Any]:
'siglip_classification' is a tool that can classify an image or a cropped detection given a list
of input labels or tags. It returns the same list of the input labels along with
their probability scores based on image content.

Parameters:
image (np.ndarray): The image to classify or tag
labels (List[str]): The list of labels or tags that is associated with the image

Returns:
Dict[str, Any]: A dictionary containing the labels and scores. One dictionary
contains a list of given labels and other a list of scores.

Example
-------
>>> siglip_classification(image, ['dog', 'cat', 'bird'])
{""labels"": [""dog"", ""cat"", ""bird""], ""scores"": [0.68, 0.30, 0.02]},
",siglip_classification
"'extract_frames_and_timestamps' extracts frames and timestamps from a video which can be a file path, url or youtube link, returns a list of dictionaries with keys ""frame"" and ""timestamp"" where ""frame"" is a numpy array and ""timestamp"" is the relative time in seconds where the frame was captured. The frame is a numpy array.","extract_frames_and_timestamps(video_uri: Union[str, pathlib.Path], fps: float = 1) -> List[Dict[str, Union[numpy.ndarray, float]]]:
'extract_frames_and_timestamps' extracts frames and timestamps from a video
which can be a file path, url or youtube link, returns a list of dictionaries
Expand Down
Binary file modified vision_agent/.sim_tools/embs.npy
Binary file not shown.
2 changes: 1 addition & 1 deletion vision_agent/tools/tools.py
Original file line number Diff line number Diff line change
Expand Up @@ -2453,7 +2453,6 @@ def _plot_counting(
owl_v2_image,
owl_v2_video,
ocr,
clip,
vit_image_classification,
vit_nsfw_classification,
countgd_counting,
Expand All @@ -2471,6 +2470,7 @@ def _plot_counting(
qwen2_vl_video_vqa,
video_temporal_localization,
flux_image_inpainting,
siglip_classification,
]

UTIL_TOOLS = [
Expand Down

0 comments on commit cc29d80

Please sign in to comment.