docs: Image support for Prompts (#6811)

Co-authored-by: caitlinwheeless <caitlin@humansignal.com>
HumanSignal · Dec 20, 2024 · 058279d · 058279d
1 parent 6da49c1
commit 058279d
Show file tree

Hide file tree

Showing 3 changed files with 82 additions and 59 deletions.
diff --git a/docs/source/guide/prompts_create.md b/docs/source/guide/prompts_create.md
@@ -114,49 +114,7 @@ From the Prompts page, click **Create Prompt** in the upper right and then compl
 
 !!! note Eligible projects
     Target projects must meet the following criteria:
-    * The project must include text data. While it can include other data types such as images or video, it must include `<Text>`.
     * You must have access to the project. If you are in the Manager role, you need to be added to the project to have access. 
     * The project cannot be located in your Personal Sandbox workspace. 
     * While projects connected to an ML backend will still appear in the list of eligible projects, we do not recommend using Prompts with an ML backend as this can interfere with how accuracy and score are calculated when evaluating the prompt. 
-
-## Example project types
-
-### Text classification  
-
-Text classification is the process of assigning predefined categories or labels to segments of text based on their content. This involves analyzing the text and determining which category or label best describes its subject, sentiment, or purpose. The goal is to organize and categorize textual data in a way that makes it easier to analyze, search, and utilize. 
-
-Text classification labeling tasks are fundamental in many applications, enabling efficient data organization, improving searchability, and providing valuable insights through data analysis. Some examples include:
-
-* **Spam Detection**: Classifying emails as "spam" or "ham" (not spam). 
-* **Sentiment Analysis**: Categorizing user reviews as "positive," "negative," or "neutral."
-* **Topic Categorization**: Assigning articles to categories like "politics," "sports," "technology," etc.
-* **Support Ticket Classification**: Labeling customer support tickets based on the issue type, such as "billing," "technical support," or "account management."
-* **Content Moderation**: Identifying and labeling inappropriate content on social media platforms, such as "offensive language," "hate speech," or "harassment."
-
-### Named entity recognition (NER)
-
-A Named Entity Recognition (NER) labeling task involves identifying and classifying named entities within text. For example, people, organizations, locations, dates, and other proper nouns. The goal is to label these entities with predefined categories that make the text easier to analyze and understand. NER is commonly used in tasks like information extraction, text summarization, and content classification.
-
-For example, in the sentence "Heidi Opossum goes grocery shopping at Aldi in Miami" the NER task would involve identifying "Aldi" as a place or organization, "Heidi Opossum" as a person (even though, to be precise, she is an iconic opossum), and "Miami" as a location. Once labeled, this structured data can be used for various purposes such as improving search functionality, organizing information, or training machine learning models for more complex natural language processing tasks.
-
-NER labeling is crucial for industries such as finance, healthcare, and legal services, where accurate entity identification helps in extracting key information from large amounts of text, improving decision-making, and automating workflows.
-
-Some examples include:
-
-* **News and Media Monitoring**: Media organizations use NER to automatically tag and categorize entities such as people, organizations, and locations in news articles. This helps in organizing news content, enabling efficient search and retrieval, and generating summaries or reports. 
-* **Intelligence and Risk Analysis**: By extracting entities such as personal names, organizations, IP addresses, and financial transactions from suspicious activity reports or communications, organizations can better assess risks and detect fraud or criminal activity.
-* **Specialized Document Review**: Once trained, NER can help extract industry-specific key entities for better document review, searching, and classification. 
-* **Customer Feedback and Product Review**: Extract named entities like product names, companies, or services from customer feedback or reviews. This allows businesses to categorize and analyze feedback based on specific products, people, or regions, helping them make data-driven improvements.
-
-### Text summarization
-
-Text summarization involves condensing large amounts of information into concise, meaningful summaries. 
-
-Models can be trained or fine-tuned to recognize essential information within a document and generate summaries that retain the core ideas while omitting less critical details. This capability is especially valuable in today’s information-heavy landscape, where professionals across various fields are often overwhelmed by the sheer volume of text data.
-
-Some examples include:
-
-* **Customer Support and Feedback Analysis**: Companies receive vast volumes of customer support tickets, reviews, and feedback that are often repetitive or lengthy. Auto-labeling can help summarize these inputs, focusing on core issues or themes, such as “billing issues” or “technical support.” 
-* **News Aggregation and Media Monitoring**: News organizations and media monitoring platforms need to process and distribute news stories efficiently. Auto-labeling can summarize articles while tagging them with labels like “politics,” “economy,” or “health,” making it easier for users to find relevant stories.
-* **Document Summarization**: Professionals often need to quickly understand the key points in lengthy contracts, research papers, and files.
-* **Educational Content Summarization**: EEducators and e-learning platforms need to distill complex material into accessible summaries for students. Auto-labeling can summarize key topics and categorize them under labels like “concept,” “example,” or “important fact.”
+
diff --git a/docs/source/guide/prompts_draft.md b/docs/source/guide/prompts_draft.md
@@ -166,6 +166,15 @@ The cost to run the prompt based on the number of tokens required.
 </tr>
 </table>
 
+### Classification reports
+
+Click **Expand** to view classification reports for the Prompt. Thee reports tell you how many times each class was identified. This is available for the following tags:
+
+`Choices`  
+`Labels`  
+`Pairwise`  
+`Rating`
+
 ## Enhance prompt
 
 You can use **Enhance Prompt** to help you construct and auto-refine your prompts. 

diff --git a/docs/source/guide/prompts_overview.md b/docs/source/guide/prompts_overview.md
@@ -23,6 +23,25 @@ With Prompts, you can:
 * Leverage subject matter expertise to rapidly bootstrap projects with labels, allowing you to decrease time to ML development. 
 * Allow your subject matter experts time focus on higher-level tasks rather than being bogged down by repetitive manual work.
 
+## Features, requirements, and constraints
+
+<div class="noheader rowheader">
+
+| Feature | Support |
+| --- | --- |
+| **Supported data types** | Text <br>Image<br><br>**Note:** Images are only supported when uploaded through cloud storage. |
+| **Supported object tags** | `Text` <br>`HyperText` <br>`Image` |
+| **Supported control tags** | `Choices` (Text and Image)<br>`Labels` (Text)<br>`TextArea` (Text and Image)<br>`Pairwise` (Text and Image)<br>`Number` (Text and Image)<br>`Rating` (Text and Image) |
+| **Class selection** | Multi-selection (the LLM can apply multiple labels per task)|
+| **Supported base models** | OpenAI gpt-3.5-turbo-16k* <br>OpenAI gpt-3.5-turbo* <br>OpenAI gpt-4 <br>OpenAI gpt-4-turbo <br>OpenAI gpt-4o <br>OpenAI gpt-4o-mini<br>[Azure OpenAI chat-based models](https://learn.microsoft.com/en-us/azure/ai-services/openai/concepts/models)<br>[Custom LLM](prompts_create#Add-OpenAI-Azure-OpenAI-or-a-custom-model)<br><br>**Note:** We recommend against using GPT 3.5 models, as these can sometimes be prone to rate limit errors and are not compatible with Image data.   |
+| **Text compatibility** | Task text must be utf-8 compatible |
+| **Task size** | Total size of each task can be no more than 1MB (approximately 200-500 pages of text) |
+| **Network access** | If you are using a firewall or restricting network access to your OpenAI models, you will need to allow the following IPs: <br>3.219.3.197 <br>34.237.73.3 <br>4.216.17.242 |
+| **Required permissions** | **Owners, Administrators, Managers** -- Can create Prompt models and update projects with auto-annotations. Managers can only apply models to projects in which they are already a member. <br><br>**Reviewers and Annotators** -- No access to the Prompts tool, but can see the predictions generated by the prompts from within the project (depending on your [project settings](project_settings_lse)).  |
+| **ML backend support** | Prompts should not be used with a project that is connected to an ML backend, as this can affect how certain evaluation metrics are calculated. |
+| **Enterprise vs. Open Source** | Label Studio Enterprise (Cloud only)<br />Starter Cloud|
+
+</div>
 
 ## Use cases
 
@@ -125,26 +144,63 @@ This feedback loop allows you to iteratively fine-tune your prompts, optimizing
 
 ![Diagram of fine-tuning workflow](/images/prompts/tuning-diagram.png)
 
-## Features, requirements, and constraints
 
-<div class="noheader rowheader">
+## Example project types
 
-| Feature | Support |
-| --- | --- |
-| **Supported data types** | Text |
-| **Supported object tags** | `Text` <br>`HyperText` |
-| **Supported control tags** | `Choices` <br>`Labels` <br>`TextArea` <br>`Pairwise` <br>`Number` <br>`Rating` |
-| **Class selection** | Multi-selection (the LLM can apply multiple labels per task)|
-| **Supported base models** | OpenAI gpt-3.5-turbo-16k* <br>OpenAI gpt-3.5-turbo* <br>OpenAI gpt-4 <br>OpenAI gpt-4-turbo <br>OpenAI gpt-4o <br>OpenAI gpt-4o-mini<br>[Azure OpenAI chat-based models](https://learn.microsoft.com/en-us/azure/ai-services/openai/concepts/models)<br>[Custom LLM](prompts_create#Add-OpenAI-Azure-OpenAI-or-a-custom-model)<br><br>**Note:** We recommend against using GPT 3.5 models, as these can sometimes be prone to rate limit errors.  |
-| **Text compatibility** | Task text must be utf-8 compatible |
-| **Task size** | Total size of each task can be no more than 1MB (approximately 200-500 pages of text) |
-| **Network access** | If you are using a firewall or restricting network access to your OpenAI models, you will need to allow the following IPs: <br>3.219.3.197 <br>34.237.73.3 <br>4.216.17.242 |
-| **Required permissions** | **Owners, Administrators, Managers** -- Can create Prompt models and update projects with auto-annotations. Managers can only apply models to projects in which they are already a member. <br><br>**Reviewers and Annotators** -- No access to the Prompts tool, but can see the predictions generated by the prompts from within the project (depending on your [project settings](project_settings_lse)).  |
-| **ML backend support** | Prompts should not be used with a project that is connected to an ML backend. |
-| **Enterprise vs. Open Source** | Label Studio Enterprise (Cloud only)<br />Starter Cloud|
+### Text classification  
 
-</div>
+Text classification is the process of assigning predefined categories or labels to segments of text based on their content. This involves analyzing the text and determining which category or label best describes its subject, sentiment, or purpose. The goal is to organize and categorize textual data in a way that makes it easier to analyze, search, and utilize. 
+
+Text classification labeling tasks are fundamental in many applications, enabling efficient data organization, improving searchability, and providing valuable insights through data analysis. Some examples include:
+
+* **Spam Detection**: Classifying emails as "spam" or "ham" (not spam). 
+* **Sentiment Analysis**: Categorizing user reviews as "positive," "negative," or "neutral."
+* **Topic Categorization**: Assigning articles to categories like "politics," "sports," "technology," etc.
+* **Support Ticket Classification**: Labeling customer support tickets based on the issue type, such as "billing," "technical support," or "account management."
+* **Content Moderation**: Identifying and labeling inappropriate content on social media platforms, such as "offensive language," "hate speech," or "harassment."
+
+### Named entity recognition (NER)
+
+A Named Entity Recognition (NER) labeling task involves identifying and classifying named entities within text. For example, people, organizations, locations, dates, and other proper nouns. The goal is to label these entities with predefined categories that make the text easier to analyze and understand. NER is commonly used in tasks like information extraction, text summarization, and content classification.
+
+For example, in the sentence "Heidi Opossum goes grocery shopping at Aldi in Miami" the NER task would involve identifying "Aldi" as a place or organization, "Heidi Opossum" as a person (even though, to be precise, she is an iconic opossum), and "Miami" as a location. Once labeled, this structured data can be used for various purposes such as improving search functionality, organizing information, or training machine learning models for more complex natural language processing tasks.
+
+NER labeling is crucial for industries such as finance, healthcare, and legal services, where accurate entity identification helps in extracting key information from large amounts of text, improving decision-making, and automating workflows.
+
+Some examples include:
+
+* **News and Media Monitoring**: Media organizations use NER to automatically tag and categorize entities such as people, organizations, and locations in news articles. This helps in organizing news content, enabling efficient search and retrieval, and generating summaries or reports. 
+* **Intelligence and Risk Analysis**: By extracting entities such as personal names, organizations, IP addresses, and financial transactions from suspicious activity reports or communications, organizations can better assess risks and detect fraud or criminal activity.
+* **Specialized Document Review**: Once trained, NER can help extract industry-specific key entities for better document review, searching, and classification. 
+* **Customer Feedback and Product Review**: Extract named entities like product names, companies, or services from customer feedback or reviews. This allows businesses to categorize and analyze feedback based on specific products, people, or regions, helping them make data-driven improvements.
+
+### Text summarization
+
+Text summarization involves condensing large amounts of information into concise, meaningful summaries. 
+
+Models can be trained or fine-tuned to recognize essential information within a document and generate summaries that retain the core ideas while omitting less critical details. This capability is especially valuable in today’s information-heavy landscape, where professionals across various fields are often overwhelmed by the sheer volume of text data.
+
+Some examples include:
+
+* **Customer Support and Feedback Analysis**: Companies receive vast volumes of customer support tickets, reviews, and feedback that are often repetitive or lengthy. Auto-labeling can help summarize these inputs, focusing on core issues or themes, such as “billing issues” or “technical support.” 
+* **News Aggregation and Media Monitoring**: News organizations and media monitoring platforms need to process and distribute news stories efficiently. Auto-labeling can summarize articles while tagging them with labels like “politics,” “economy,” or “health,” making it easier for users to find relevant stories.
+* **Document Summarization**: Professionals often need to quickly understand the key points in lengthy contracts, research papers, and files.
+* **Educational Content Summarization**: EEducators and e-learning platforms need to distill complex material into accessible summaries for students. Auto-labeling can summarize key topics and categorize them under labels like “concept,” “example,” or “important fact.”
+
+
+### Image captioning and classification
+
+Image captioning involves applying descriptive text for images. This has valuable applications across industries, particularly where visual content needs to be systematically organized, analyzed, or made accessible. 
+
+You can also use Prompts to automatically categorizing images into predefined classes or categories, ensuring consistent labeling of large image datasets.
+
+Some examples include:
+
+* **E-commerce Product Cataloging**: Online retailers often deal with thousands of product images that require captions describing their appearance, features, or categories.
 
+* **Digital Asset Management (DAM)**: Companies managing large libraries of images, such as marketing teams, media organizations, or creative agencies, can use auto-labeling to caption, tag, and classify their assets.
 
+* **Content Moderation and Analysis**: Platforms that host user-generated content can employ image captioning to analyze and describe uploaded visuals. This helps detect inappropriate content, categorize posts (e.g., "Outdoor landscape with a sunset"), and surface relevant content to users. You may also want to train a model to classify image uploads into categories such as “safe,” “explicit,” or “spam.”
 
+* **Accessibility for Visually Impaired Users**: Image captioning is essential for making digital content more accessible to visually impaired users by providing descriptive alt-text for images on websites, apps, or documents. For instance, an image of a cat playing with yarn might generate the caption, "A fluffy orange cat playing with a ball of blue yarn."