Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revise PriorBoxClustered Spec #6539

Merged
merged 8 commits into from
Jul 19, 2021
114 changes: 54 additions & 60 deletions docs/ops/detection/PriorBoxClustered_1.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,105 +6,99 @@

**Short description**: *PriorBoxClustered* operation generates prior boxes of specified sizes normalized to the input image size.

**Detailed description**

Let
\f[
W \equiv image\_width, \quad H \equiv image\_height.
\f]

Then calculations of *PriorBoxClustered* can be written as
\f[
center_x=(w+offset)*step
\f]
\f[
center_y=(h+offset)*step
\f]
\f[
w \subset \left( 0, W \right )
\f]
\f[
h \subset \left( 0, H \right )
\f]
For each \f$s = \overline{0, W - 1}\f$ calculates the prior boxes coordinates:
\f[
xmin = \frac{center_x - \frac{width_s}{2}}{W}
\f]
\f[
ymin = \frac{center_y - \frac{height_s}{2}}{H}
\f]
\f[
xmax = \frac{center_x - \frac{width_s}{2}}{W}
blesniewski marked this conversation as resolved.
Show resolved Hide resolved
\f]
\f[
ymax = \frac{center_y - \frac{height_s}{2}}{H}
\f]
If *clip* is defined, the coordinates of prior boxes are recalculated with the formula:
\f$coordinate = \min(\max(coordinate,0), 1)\f$

**Attributes**

* *width (height)*
blesniewski marked this conversation as resolved.
Show resolved Hide resolved

* **Description**: *width (height)* specifies desired boxes widths (heights) in pixels.
* **Range of values**: floating point positive numbers
* **Type**: float[]
* **Range of values**: floating-point positive numbers
* **Type**: `float[]`
* **Default value**: 1.0
* **Required**: *no*

* *clip*

* **Description**: *clip* is a flag that denotes if each value in the output tensor should be clipped within [0,1].
* **Description**: *clip* is a flag that denotes if each value in the output tensor should be clipped within `[0,1]`.
* **Range of values**:
* false or 0 - clipping is not performed
* true or 1 - each value in the output tensor is within [0,1]
* **Type**: boolean
* true or 1 - each value in the output tensor is within `[0,1]`
* **Type**: `boolean`
* **Default value**: true
* **Required**: *no*

* *step (step_w, step_h)*
blesniewski marked this conversation as resolved.
Show resolved Hide resolved

* **Description**: *step (step_w, step_h)* is a distance between box centers. For example, *step* equal 85 means that the distance between neighborhood prior boxes centers is 85. If both *step_h* and *step_w* are 0 then they are updated with value of *step*. If after that they are still 0 then they are calculated as input image width(height) divided with first input width(height).
* **Range of values**: floating point positive number
* **Type**: float
* **Range of values**: floating-point positive number
* **Type**: `float`
* **Default value**: 0.0
* **Required**: *no*

* *offset*

* **Description**: *offset* is a shift of box respectively to top left corner. For example, *offset* equal 85 means that the shift of neighborhood prior boxes centers is 85.
* **Range of values**: floating point positive number
* **Type**: float
* **Default value**: None
* **Range of values**: floating-point positive number
* **Type**: `float`
* **Required**: *yes*

* *variance*

* **Description**: *variance* denotes a variance of adjusting bounding boxes.
* **Range of values**: floating point positive numbers
* **Type**: float[]
* **Description**: *variance* denotes a variance of adjusting bounding boxes. The attribute could be 0, 1 or 4 elements.
* **Range of values**: floating-point positive numbers
* **Type**: `float[]`
* **Default value**: []
* **Required**: *no*

* *img_h (img_w)*

* **Description**: *img_h (img_w)* specifies height (width) of input image. These attributes are taken from the second input `image_size` height(width) unless provided explicitly as the value for this attributes.
* **Range of values**: floating point positive number
* **Type**: float
* **Default value**: 0
* **Required**: *no*

**Inputs**:

* **1**: `output_size` - 1D tensor with two integer elements `[height, width]`. Specifies the spatial size of generated grid with boxes. Required.
* **1**: `output_size` - 1D tensor of type *T_INT* with two elements `[height, width]`. Specifies the spatial size of generated grid with boxes. Required.

* **2**: `image_size` - 1D tensor with two integer elements `[image_height, image_width]` that specifies shape of the image for which boxes are generated. Optional.
* **2**: `image_size` - 1D tensor of type *T_INT* with two elements `[image_height, image_width]` that specifies shape of the image for which boxes are generated. Optional.
blesniewski marked this conversation as resolved.
Show resolved Hide resolved

**Outputs**:

* **1**: 2D tensor of shape `[2, 4 * height * width * priors_per_point]` with box coordinates. The `priors_per_point` is the number of boxes generated per each grid element. The number depends on layer attribute values.
* **1**: 2D tensor of shape `[2, 4 * height * width * priors_per_point]` and type *T_OUT* with box coordinates. The `priors_per_point` is the number of boxes generated per each grid element. The number depends on layer attribute values.

**Detailed description**
**Types**

*PriorBoxClustered* computes coordinates of prior boxes by following:
1. Calculates the *center_x* and *center_y* of prior box:
\f[
W \equiv Width \quad Of \quad Image
\f]
\f[
H \equiv Height \quad Of \quad Image
\f]
\f[
center_x=(w+offset)*step
\f]
\f[
center_y=(h+offset)*step
\f]
\f[
w \subset \left( 0, W \right )
\f]
\f[
h \subset \left( 0, H \right )
\f]
2. For each \f$s \subset \left( 0, W \right )\f$ calculates the prior boxes coordinates:
\f[
xmin = \frac{center_x - \frac{width_s}{2}}{W}
\f]
\f[
ymin = \frac{center_y - \frac{height_s}{2}}{H}
\f]
\f[
xmax = \frac{center_x - \frac{width_s}{2}}{W}
\f]
\f[
ymax = \frac{center_y - \frac{height_s}{2}}{H}
\f]
If *clip* is defined, the coordinates of prior boxes are recalculated with the formula:
\f$coordinate = \min(\max(coordinate,0), 1)\f$
* *T_INT*: any supported integer type.
* *T_OUT*: supported floating-point type.

**Example**

Expand Down