Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update custom selection prompt #799

Merged
merged 1 commit into from
Sep 10, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
32 changes: 22 additions & 10 deletions skyvern/forge/prompts/skyvern/auto-completion-choose-option.j2
Original file line number Diff line number Diff line change
@@ -1,23 +1,25 @@
There is an input element on a HTML page. Based on the context and information you're provided, you have two goals:
- Confirm if there is an auto completion attempt showing up after the user input the current value.
- If available auto completion suggestions show up, help user choose the element that's the most relevant to the input value.
There is an input element on an HTML page. Based on the context and information provided, you have two goals:
- Confirm if an auto-completion attempt appears after the user inputs the current value.
- If auto-completion suggestions appear, assist the user in selecting the most appropriate element based on the user’s goal, details, and the context.

You can confirm auto completion attempt based on the following rules:
- Several auto completion suggestions show up for the input value.
- Some messages, like "No results", "No match", also indicate an attempt to give auto completion suggestions.
You can confirm an auto-completion attempt based on the following rules:
- Several auto-completion suggestions appear for the input value.
- Although messages like No results” and “No match” mean no option was matched, they still indicate an attempt to generate auto-completion suggestions.

Potential auto completion suggesstion could only be:
- Element with ID from "HTML elements". Don't hallucinate any potential option outside "HTML elements".
You must identify a potential auto-completion suggestion based on the following rules:
- The option must be an element with an ID from the provided “HTML elements”. Do not create or assume options outside of these elements.
- The content of the option must be meaningful. Do not consider non-message indicators like “No results” or “No match” as valid options.

MAKE SURE YOU OUTPUT VALID JSON. No text before or after JSON, no trailing commas, no comments (//), no unnecessary quotes, etc.
Each interactable element is tagged with an ID.

Reply in JSON format with the following keys:
{
"auto_completion_attempt": bool, // True if there's any auto completion attempt based on the rules. Otherwise, it should be False.
"reasoning": str, // The reasoning behind the decision. Be specific, referencing input value and element ids in your reasoning. Mention why you chose the element id. Keep the reasoning short and to the point.
"reasoning": str, // The reasoning behind the decision. Be specific, referencing the value and the element id in your reasoning. Mention why you chose the element id. Keep the reasoning short and to the point.
"confidence_float": float, // The confidence of the action. Pick a number between 0.0 and 1.0. 0.0 means no confidence, 1.0 means full confidence.
"relevance_float": float, // The relative between the input value and the element. Pick a number between 0.00 and 1.00. 0.00 means no relevance, 1.00 means full relevance, the precision is 0.01.
"relevance_float": float, // The relative between the selected element and the provided information. You should consider how much the selected option is related to the user goal, the user details and the context. Pick a number between 0.00 and 1.00. 0.00 means no relevance, 1.00 means full relevance, the precision is 0.01.
"value": str, // The value to select.
"id": str, // The id of the most relevant and interactable element to take the action. The id must be from "HTML elements". It should be null if no element is relative or there's no auto completion suggestion.
}

Expand All @@ -31,6 +33,16 @@ Input value:
{{ filled_value }}
```

User goal:
```
{{ navigation_goal }}
```

User details:
```
{{ navigation_payload_str }}
```

HTML elements:
```
{{ elements }}
Expand Down
32 changes: 22 additions & 10 deletions skyvern/forge/prompts/skyvern/custom-select.j2
Original file line number Diff line number Diff line change
@@ -1,31 +1,43 @@
You are doing a select action on HTML page. Help to click the best match element for the target value among HTML elements based on the context.
You can find the match element based on the following attempts:
1. Find the semantically most similar element
2. Reconsider if target value is reasonable based on context and the options in the HTML elements. If it doesn't make sense, you can tweak the target value into a reasonable one.
3. Find the element, which semantically is the superset of target value. Like "Others", "None of them matched"
4. If the field is required, don't leave it blank and don't choose the semantical placeholder value, like "Please select", "-", "Select...".
You are performing a selection action on an HTML page. Assist the user in selecting the most appropriate option to advance toward their goal, considering the context, user details, and the DOM elements provided in the list.

You can identify the matching element based on the following guidelines:
1. Select the most suitable element based on the user goal, user details, and the context.
2. If no option is a perfect match, choose a fallback option such as “Others” or “None of the above”.
3. If a field is required, do not leave it blank.
4. If a field is required, do not select a placeholder value, such as “Please select”, “-”, or “Select…”.
5. Exclude loading indicators like “loading more results” as valid options.

MAKE SURE YOU OUTPUT VALID JSON. No text before or after JSON, no trailing commas, no comments (//), no unnecessary quotes, etc.
Each interactable element is tagged with an ID.

Reply in JSON format with the following keys:
{
"reasoning": str, // The reasoning behind the action. Be specific, referencing target value and element ids in your reasoning. Mention why you chose the element id. Keep the reasoning short and to the point.
"reasoning": str, // The reasoning behind the action. Be specific, referencing the value and the element id in your reasoning. Mention why you chose the element id. Keep the reasoning short and to the point.
"confidence_float": float, // The confidence of the action. Pick a number between 0.0 and 1.0. 0.0 means no confidence, 1.0 means full confidence
"id": str, // The id of the element to take action on. The id has to be one from the elements list
"value": str, // The value to select.
"relevant": bool, // True if the value you select is relevant to the target value, otherwise False.
"value": str, // The value to select.{% if target_value %}
"relevant": bool, // True if the value you select is relevant to the target value, otherwise False.{% endif %}
}

Context:
```
{{ context_reasoning }}
```

{% if target_value %}
Target value:
```
{{ target_value }}
```
{% endif %}
User goal:
```
{{ navigation_goal }}
```

User details:
```
{{ navigation_payload_str }}
```

HTML elements:
```
Expand Down
9 changes: 6 additions & 3 deletions skyvern/forge/prompts/skyvern/opened-dropdown-confirm.j2
Original file line number Diff line number Diff line change
@@ -1,6 +1,9 @@
There is a screenshot from part of Web HTML page. Help me confirm it if it's an opened dropdown menu.
An opened dropdown menu could be defined as:
- At least two options show on the screenshot.
There is a screenshot from a part of a web HTML page. Help me confirm if it is an open dropdown menu.

An open dropdown menu can be defined as:
- At least one option is visible in the screenshot.
- Do not consider it an open dropdown menu if the only visible option displays a message like “No results” or “No match”.

MAKE SURE YOU OUTPUT VALID JSON. No text before or after JSON, no trailing commas, no comments (//), no unnecessary quotes, etc.

Reply in JSON format with the following keys:
Expand Down
14 changes: 9 additions & 5 deletions skyvern/webeye/actions/handler.py
Original file line number Diff line number Diff line change
Expand Up @@ -1120,6 +1120,8 @@ async def choose_auto_completion_dropdown(
"auto-completion-choose-option",
context_reasoning=action.reasoning,
filled_value=text,
navigation_goal=task.navigation_goal,
navigation_payload_str=json.dumps(task.navigation_payload),
elements=html,
)
LOG.info(
Expand Down Expand Up @@ -1462,27 +1464,29 @@ async def select_from_dropdown(
prompt = prompt_engine.load_prompt(
"custom-select",
context_reasoning=action.reasoning,
target_value=target_value,
target_value=target_value if not force_select and should_relevant else "",
navigation_goal=task.navigation_goal,
navigation_payload_str=json.dumps(task.navigation_payload),
elements=html,
)

LOG.info(
"Calling LLM to find the match element",
target_value=target_value,
step_id=step.step_id,
task_id=task.task_id,
)
json_response = await llm_handler(prompt=prompt, step=step)
value: str | None = json_response.get("value", None)
single_select_result.value = value

LOG.info(
"LLM response for the matched element",
target_value=target_value,
matched_value=value,
response=json_response,
step_id=step.step_id,
task_id=task.task_id,
)

value: str | None = json_response.get("value", None)
single_select_result.value = value
element_id: str | None = json_response.get("id", None)
if not element_id:
raise NoElementMatchedForTargetOption(target=target_value, reason=json_response.get("reasoning"))
Expand Down
Loading