Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve documentation for Tasks, Save/Load, BEHAVIOR Tasks and Knowledgebase #795

Merged
merged 10 commits into from
Jul 12, 2024
8 changes: 5 additions & 3 deletions docs/modules/object_states.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ Object states have a unified API interface: a getter `state.get_value(...)`, and

Object states are intended to be added when an object is instantiated, during its constructor call via the `abilities` kwarg. This is expected to be a dictionary mapping ability name to a dictionary of keyword-arguments that dictate the instantiated object state's behavior. Normally, this is simply the keyword-arguments to pass to the specific `ObjectState` constructor, but this can be different. Concretely, the raw values in the `abilities` value dictionary are postprocessed via the specific object state's `postprocess_ability_params` classmethod. This is to allow `abilities` to be fully exportable in .json format, without requiring complex datatypes (which may be required as part of an object state's actual constructor) to be stored.

By default, `abilities=None` results in an object's abilities directly being inferred from its `category` kwarg. **`OmniGibson`** leverages a crowdsourced [knowledgebase](https://behavior.stanford.edu/knowledgebase/categories/index.html) to determine what abilities (or "properties" in the knowledgebase) a given entity (called "synset" in the knowledgebase) can have. Every category in **`OmniGibson`**'s asset dataset directly corresponds to a specific synset. By going to the knowledgebase and clicking on the corresponding synset, one can see the annotated abilities (properties) for that given synset, which will be applied to the object being created.
By default, `abilities=None` results in an object's abilities directly being inferred from its `category` kwarg. **`OmniGibson`** leverages the crowdsourced [BEHAVIOR Knowledgebase](https://behavior.stanford.edu/knowledgebase/categories/index.html) to determine what abilities (or "properties" in the knowledgebase) a given entity (called "synset" in the knowledgebase) can have. Every category in **`OmniGibson`**'s asset dataset directly corresponds to a specific synset. By going to the knowledgebase and clicking on the corresponding synset, one can see the annotated abilities (properties) for that given synset, which will be applied to the object being created.

Alternatively, you can programmatically observe which abilities, with the exact default kwargs, correspond to a given category via:

Expand All @@ -43,6 +43,9 @@ synset = OBJECT_TAXONOMY.get_synset_from_category(category)
abilities = OBJECT_TAXONOMY.get_abilities(synset)
```

!!! info annotate "Follow our tutorial on BEHAVIOR knowledgebase!"
To better understand how to use / visualize / modify BEHAVIOR knowledgebase, please read our [tutorial](../tutorials/behavior_knowledgebase.html)!

??? warning annotate "Not all object states are guaranteed to be created!"

Some object states (such as `ParticleApplier` or `ToggledOn`) potentially require specific metadata to be defined for a given object model before the object state can be created. For example, `ToggledOn` represents a pressable virtual button, and requires this button to be defined a-priori in the raw object asset before it is imported. When parsing the `abilities` dictionary, each object state runs a compatibilty check via `state.is_compatible(obj, **kwargs)` before it is created, where `**kwargs` define any relevant keyword arguments that would be passed to the object state constructor. If the check fails, then the object state is **_not_** created!
Expand All @@ -51,8 +54,7 @@ abilities = OBJECT_TAXONOMY.get_abilities(synset)

As mentioned earlier, object states can be potentially read from via `get_state(...)` or written to via `set_state(...)`. The possibility of reading / writing, as well as the arguments expected and return value expected depends on the specific object state class type. For example, object states that inherit the `BooleanStateMixin` class expect `get_state(...)` to return and `set_state(...)` to receive a boolean. `AbsoluteObjectState`s are agnostic to any other object in the scene, and so `get_state()` takes no arguments. In contrast, `RelativeObjectState`s are computed with respect to another object, and so require `other_obj` to be passed into the getter and setter, e.g., `get_state(other_obj)` and `set_state(other_obj, ...)`. A `ValueError` will be raised if a `get_state(...)` or `set_state(...)` is called on an object that does not support that functionality. If `set_state()` is called and is successful, it will return `True`, otherwise, it will return `False`. For more information on specific object state types' behaviors, please see [Object State Types](#object-state-types).

It is important to note that object states are usually queried / computed _on demand_ and immediately cached until its value becomes stale (usually the immediately proceeding simulation step). This is done for efficiency reasons, and also means that object states are usually not automatically updated per-step unless absolutely necessary (1). Calling `state.clear_cache()` forces a clearing of an object state's internal cache.
{ .annotate }
It is important to note that object states are usually queried / computed _on demand_ and immediately cached until its value becomes stale (usually the immediately proceeding simulation step). This is done for efficiency reasons, and also means that object states are usually not automatically updated per-step unless absolutely necessary. Calling `state.clear_cache()` forces a clearing of an object state's internal cache.


## Types
Expand Down
2 changes: 1 addition & 1 deletion docs/modules/scenes.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ Alternatively, a scene can be directly imported at runtime by first creating the

To import an object into a scene, call `scene.add_object(obj)`.

The scene keeps track of and organizes all imported objects via its owned `scene.object_registry`. Objects can quickly be queried by relevant property keys (1), such as `name`, `prim_path`, and `category`, from `env.scene.object_registry` as follows:
The scene keeps track of and organizes all imported objects via its owned `scene.object_registry`. Objects can quickly be queried by relevant property keys, such as `name`, `prim_path`, and `category`, from `env.scene.object_registry` as follows:
{ .annotate }

`scene.object_registry_unique_keys` and `scene.object_registry_group_keys` define the valid possible key queries
Expand Down
231 changes: 231 additions & 0 deletions docs/modules/tasks.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,231 @@
---
icon: material/list-box
---

# 📑 **Tasks**

## Description

`Task`s define the high-level objectives that an agent must complete in a given `Environment`, subject to certain constraints (e.g. not flip over).

`Task`s have two important internal variables:

- `_termination_conditions`: a dict of {`str`: `TerminationCondition`} that define when an episode should be terminated. For each of the termination conditions, `termination_condition.step(...)` returns a tuple of `(done [bool], success [bool])`. If any of the termination conditions returns `done = True`, the episode is terminated. If any returns `success = True`, the episode is cnosidered successful.
- `_reward_functions`: a dict of {`str`: `RewardFunction`} that define how the agent is rewarded. Each reward function has a `reward_function.step(...)` method that returns a tuple of `(reward [float], info [dict])`. The `reward` is a scalar value that is added to the agent's total reward for the current step. The `info` is a dictionary that can contain additional information about the reward.

`Task`s usually specify task-relevant observations (e.g. goal location for a navigation task) via the `_get_obs` method, which returns a tuple of `(low_dim_obs [dict], obs [dict])`, where the first element is a dict of low-dimensional observations that will be automatically flattened into a 1D array, and the second element is everything else that shouldn't be flattened. Different types of tasks should overwrite the `_get_obs` method to return the appropriate observations.

`Task`s also define the reset behavior (in-between episodes) of the environment via the `_reset_scene`, `_reset_agent`, and `_reset_variables` methods.

- `_reset_scene`: reset the scene for the next episode, default is `scene.reset()`.
- `_reset_agent`: reset the agent for the next episode, default is do nothing.
- `_reset_variables`: reset any internal variables as needed, default is do nothing.

Different types of tasks should overwrite these methods for the appropriate reset behavior, e.g. a navigation task might want to randomize the initial pose of the agent and the goal location.

## Usage

### Specifying
Every `Environment` instance includes a task, defined by its config that is passed to the environment constructor via the `task` key.
This is expected to be a dictionary of relevant keyword arguments, specifying the desired task configuration to be created (e.g. reward type and weights, hyperparameters for reset behavior, etc).
The `type` key is required and specifies the desired task class. Additional keys can be specified and will be passed directly to the specific task class constructor.
An example of a task configuration is shown below in `.yaml` form:

??? code "point_nav_example.yaml"
``` yaml linenums="1"
task:
type: PointNavigationTask
robot_idn: 0
floor: 0
initial_pos: null
initial_quat: null
goal_pos: null
goal_tolerance: 0.36 # turtlebot bodywidth
goal_in_polar: false
path_range: [1.0, 10.0]
visualize_goal: true
visualize_path: false
n_vis_waypoints: 25
reward_type: geodesic
termination_config:
max_collisions: 500
max_steps: 500
fall_height: 0.03
reward_config:
r_potential: 1.0
r_collision: 0.1
r_pointgoal: 10.0
```

### Runtime

`Environment` instance has a `task` attribute that is an instance of the specified task class.
Internally, `Environment`'s `reset` method will call the task's `reset` method, `step` method will call the task's `step` method, and the `get_obs` method will call the task's `get_obs` method.

## Types
**`OmniGibson`** currently supports 5 types of tasks, 7 types of termination conditions, and 5 types of reward functions.

### `Task`

<table markdown="span">
<tr>
<td valign="top">
[**`DummyTask`**](../reference/tasks/dummy_task.html)<br><br>
Dummy task with trivial implementations.
<ul>
<li>`termination_conditions`: empty dict.</li>
<li>`reward_functions`: empty dict.</li>
<li>`_get_obs()`: empty dict.</li>
<li>`_reset_scene()`: default.</li>
<li>`_reset_agent()`: default.</li>
</ul>
</td>
</tr>
<tr>
<td valign="top">
[**`PointNavigationTask`**](../reference/tasks/point_navigation_task.html)<br><br>
PointGoal navigation task with fixed / randomized initial pose and goal location.
<ul>
<li>`termination_conditions`: `MaxCollision`, `Timeout`, `PointGoal`.</li>
<li>`reward_functions`: `PotentialReward`, `CollisionReward`, `PointGoalReward`.</li>
<li>`_get_obs()`: returns relative xy position to the goal, and the agent's current linear and angular velocities.</li>
<li>`_reset_scene()`: default.</li>
<li>`_reset_agent()`: sample initial pose and goal location.</li>
</ul>
</td>
</tr>
<tr>
<td valign="top">
[**`PointReachingTask`**](../reference/tasks/point_reaching_task.html)<br><br>
Similar to PointNavigationTask, except the goal is specified with respect to the robot's end effector.
<ul>
<li>`termination_conditions`: `MaxCollision`, `Timeout`, `PointGoal`.</li>
<li>`reward_functions`: `PotentialReward`, `CollisionReward`, `PointGoalReward`.</li>
<li>`_get_obs()`: returns the goal position and the end effector's position in the robot's frame, and the agent's current linear and angular velocities.</li>
<li>`_reset_scene()`: default.</li>
<li>`_reset_agent()`: sample initial pose and goal location.</li>
</ul>
</td>
</tr>
<tr>
<td valign="top">
[**`GraspTask`**](../reference/tasks/grasp_task.html)<br><br>
Grasp task for a single object.
<ul>
<li>`termination_conditions`: `Timeout`.</li>
<li>`reward_functions`: `GraspReward`.</li>
<li>`_get_obs()`: returns the object's pose in the robot's frame</li>
<li>`_reset_scene()`: reset pose for objects in `_objects_config`.</li>
<li>`_reset_agent()`: randomize the robot's pose and joint configurations.</li>
</ul>
</td>
</tr>
<tr>
<td valign="top">
[**`BehaviorTask`**](../reference/tasks/behavior_task.html)<br><br>
BEHAVIOR task of long-horizon household activity.
<ul>
<li>`termination_conditions`: `Timeout`, `PredicateGoal`.</li>
<li>`reward_functions`: `PotentialReward`.</li>
<li>`_get_obs()`: returns the existence, pose, and in-gripper information of all task relevant objects</li>
<li>`_reset_scene()`: default.</li>
<li>`_reset_agent()`: default.</li>
</ul>
</td>
</tr>
</table>

!!! info annotate "Follow our tutorial on BEHAVIOR tasks!"
To better understand how to use / sample / load / customize BEHAVIOR tasks, please read our [tutorial](../tutorials/behavior_tasks.html)!

### `TerminationCondition`
<table markdown="span">
<tr>
<td valign="top">
[**`Timeout`**](../reference/termination_conditions/timeout.html)<br><br>
`FailureCondition`: episode terminates if `max_step` steps have passed.
</td>
</tr>
<tr>
<td valign="top">
[**`Falling`**](../reference/termination_conditions/falling.html)<br><br>
`FailureCondition`: episode terminates if the robot can no longer function (i.e.: falls below the floor height by at least
`fall_height` or tilt too much by at least `tilt_tolerance`).
</td>
</tr>
<tr>
<td valign="top">
[**`MaxCollision`**](../reference/termination_conditions/max_collision.html)<br><br>
`FailureCondition`: episode terminates if the robot has collided more than `max_collisions` times.
</td>
</tr>
<tr>
<td valign="top">
[**`PointGoal`**](../reference/termination_conditions/point_goal.html)<br><br>
`SuccessCondition`: episode terminates if point goal is reached within `distance_tol` by the robot's base.
</td>
</tr>
<tr>
<td valign="top">
[**`ReachingGoal`**](../reference/termination_conditions/reaching_goal.html)<br><br>
`SuccessCondition`: episode terminates if reaching goal is reached within `distance_tol` by the robot's end effector.
</td>
</tr>
<tr>
<td valign="top">
[**`GraspGoal`**](../reference/termination_conditions/grasp_goal.html)<br><br>
`SuccessCondition`: episode terminates if target object has been grasped (by assistive grasping).
</td>
</tr>
<tr>
<td valign="top">
[**`PredicateGoal`**](../reference/termination_conditions/predicate_goal.html)<br><br>
`SuccessCondition`: episode terminates if all the goal predicates of `BehaviorTask` are satisfied.
</td>
</tr>
</table>

### `RewardFunction`

<table markdown="span">
<tr>
<td valign="top">
[**`CollisionReward`**](../reference/reward_functions/collision_reward.html)<br><br>
Penalization of robot collision with non-floor objects, with a negative weight `r_collision`.
</td>
</tr>
<tr>
<td valign="top">
[**`PointGoalReward`**](../reference/reward_functions/point_goal_reward.html)<br><br>
Reward for reaching the goal with the robot's base, with a positive weight `r_pointgoal`.
</td>
</tr>
<tr>
<td valign="top">
[**`ReachingGoalReward`**](../reference/reward_functions/reaching_goal_reward.html)<br><br>
Reward for reaching the goal with the robot's end-effector, with a positive weight `r_reach`.
</td>
</tr>
<tr>
<td valign="top">
[**`PotentialReward`**](../reference/reward_functions/potential_reward.html)<br><br>
Reward for decreasing some arbitrary potential function value, with a positive weight `r_potential`.
It assumes the task already has `get_potential` implemented.
Generally low potential is preferred (e.g. a common potential for goal-directed task is the distance to goal).
</td>
</tr>
<tr>
<td valign="top">
[**`GraspReward`**](../reference/reward_functions/grasp_reward.html)<br><br>
Reward for grasping an object. It not only evaluates the success of object grasping but also considers various penalties and efficiencies.
The reward is calculated based on several factors:
<ul>
<li>Grasping reward: A positive reward is given if the robot is currently grasping the specified object.</li>
<li>Distance reward: A reward based on the inverse exponential distance between the end-effector and the object.</li>
<li>Regularization penalty: Penalizes large magnitude actions to encourage smoother and more energy-efficient movements.</li>
<li>Position and orientation penalties: Discourages excessive movement of the end-effector.</li>
<li>Collision penalty: Penalizes collisions with the environment or other objects.</li>
</ul>
</td>
</tr>
</table>
Loading
Loading