Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds Point in Time documentation #1753

Merged
merged 14 commits into from
Nov 4, 2022
32 changes: 19 additions & 13 deletions _api-reference/nodes-apis/nodes-stats.md
Original file line number Diff line number Diff line change
Expand Up @@ -158,19 +158,22 @@ GET _nodes/stats/
"current" : 0
},
"search" : {
"open_contexts" : 0,
"query_total" : 194,
"query_time_in_millis" : 467,
"query_current" : 0,
"fetch_total" : 194,
"fetch_time_in_millis" : 143,
"fetch_current" : 0,
"scroll_total" : 0,
"scroll_time_in_millis" : 0,
"scroll_current" : 0,
"suggest_total" : 0,
"suggest_time_in_millis" : 0,
"suggest_current" : 0
"open_contexts": 4,
"query_total": 194,
"query_time_in_millis": 467,
"query_current": 0,
"fetch_total": 194,
"fetch_time_in_millis": 143,
"fetch_current": 0,
"scroll_total": 0,
"scroll_time_in_millis": 0,
"scroll_current": 0,
"point_in_time_total": 0,
"point_in_time_time_in_millis": 0,
"point_in_time_current": 0,
"suggest_total": 0,
"suggest_time_in_millis": 0,
"suggest_current": 0
},
"merges" : {
"current" : 0,
Expand Down Expand Up @@ -648,6 +651,9 @@ get.<br>&nbsp;&nbsp;&nbsp;&nbsp;missing_total | Integer | The number of failed g
get.<br>&nbsp;&nbsp;&nbsp;&nbsp;missing_time_in_millis | Integer | The total time for all failed get operations, in milliseconds.
get.<br>&nbsp;&nbsp;&nbsp;&nbsp;current | Integer | The number of get operations that are currently running.
search | Object | Statistics about the search operations for the node.
search.<br>&nbsp;&nbsp;&nbsp;&nbsp;point_in_time_total | Integer | The total number of Point in Time contexts that have been created (completed and active) since the node last restarted.
search.<br>&nbsp;&nbsp;&nbsp;&nbsp;point_in_time_time_in_millis | Integer | The amount of time that Point in Time contexts have been held open since the node last restarted, in milliseconds.
search.<br>&nbsp;&nbsp;&nbsp;&nbsp;point_in_time_current | Integer | The number of Point in Time contexts currently open.
search.<br>&nbsp;&nbsp;&nbsp;&nbsp;open_contexts | Integer | The number of open search contexts.
search.<br>&nbsp;&nbsp;&nbsp;&nbsp;query_total | Integer | The total number of query operations.
search.<br>&nbsp;&nbsp;&nbsp;&nbsp;query_time_in_millis | Integer | The total time for all query operations, in milliseconds.
Expand Down
270 changes: 270 additions & 0 deletions _opensearch/point-in-time-api.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,270 @@
---
layout: default
title: Point in Time API
nav_order: 58
has_children: false
parent: Point in Time
---

# Point in Time API

Use the [Point in Time (PIT)]({{site.url}}{{site.baseurl}}/opensearch/point-in-time/) API to manage PITs.

---

#### Table of contents
- TOC
{:toc}

---

## Create a PIT
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we also add info about these cluster settings.

  • We limit the maximum keep-alive of a PIT using a cluster level setting point_in_time.max_keep_alive (defaults to 24h . Users can change it as required.])
  • We limit the number of open PIT contexts on each node using a node level setting search.max_open_pit_context (Defaults to 300.)

Introduced 2.4
{: .label .label-purple }

Creates a PIT. The `keep_alive` query parameter is required; it specifies how long to keep a PIT.

### Path and HTTP methods

```json
POST /<target_indexes>/_search/point_in_time?keep_alive=1h&routing=&expand_wildcards=&preference=
```

### Path parameters

Parameter | Data Type | Description
:--- | :--- | :---
target_indexes | String | The name(s) of the target index(es) for the PIT. May contain a comma-separated list or a wildcard index pattern.

### Query parameters

Parameter | Data Type | Description
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we have a column to specify which param is options or required @bharath-techie @dhruv16dhr ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah this could be useful.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are adhering to our API style guide, which does not have a "required/optional" column. We list the required/optional in the description. But if you feel that it's needed, I can add the column.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets go ahead with the API style guide as we have been following till now

:--- | :--- | :---
keep_alive | Time | The amount of time to keep the PIT. Required.
preference | String | The node or the shard used to perform the search. Optional. Default is random.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PIT do not support preference.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is for create PIT, which does support preference. please add it back @kolchfa-aws

routing | String | Specifies to route search requests to a specific shard. Optional. Default is the document's `_id`.
expand_wildcards | String | The type of index that can match the wildcard pattern. Supports comma-separated values. Valid values are the following:<br>- `all`: Match any index or data stream, including hidden ones. <br>- `open`: Match open, non-hidden indexes or non-hidden data streams. <br>- `closed`: Match closed, non-hidden indexes or non-hidden data streams. <br>- `hidden`: Match hidden indexes or data streams. Must be combined with `open`, `closed` or both `open` and `closed`.<br>- `none`: No wildcard patterns are accepted.<br> Optional. Default is `open`.
allow_partial_pit_creation | Boolean | Specifies whether to create a PIT with partial failures. Optional. Default is `false`.

#### Sample request

```json
POST /my-index-1/_search/point_in_time?keep_alive=100m
```

#### Sample response

```json
{
"pit_id": "o463QQEPbXktaW5kZXgtMDAwMDAxFnNOWU43ckt3U3IyaFVpbGE1UWEtMncAFjFyeXBsRGJmVFM2RTB6eVg1aVVqQncAAAAAAAAAAAIWcDVrM3ZIX0pRNS1XejE5YXRPRFhzUQEWc05ZTjdyS3dTcjJoVWlsYTVRYS0ydwAA",
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"creation_time": 1658146050064
}
```

### Response fields

Field | Data Type | Description
:--- | :--- | :---
pit_id | [Base64 encoded binary]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/binary) | The PIT ID.
creation_time | long | The time the PIT was created, in milliseconds since the epoch.

## Extend a PIT time

You can extend a PIT time by providing a `keep_alive` parameter in the `pit` object when you perform a search:

```json
GET /_search
{
"size": 10000,
"query": {
"match" : {
"user.id" : "elkbee"
}
},
"pit": {
"id": "46ToAwMDaWR5BXV1aWQyKwZub2RlXzMAAAAAAAAAACoBYwADaWR4BXV1aWQxAgZub2RlXzEAAAAAAAAAAAEBYQADaWR5BXV1aWQyKgZub2RlXzIAAAAAAAAAAAwBYgACBXV1aWQyAAAFdXVpZDEAAQltYXRjaF9hbGw_gAAAAA==",
"keep_alive": "100m"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we give a comment that this is optional in search to give more stress? and provide it's default value as well?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will add the note below the request that the parameter is optional. What is the default value?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1m is the default value

Copy link
Contributor

@bharath-techie bharath-techie Nov 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In search it should be completely optional. there is no default val as of now.

},
"sort": [
{"@timestamp": {"order": "asc", "format": "strict_date_optional_time_nanos"}},
{"_shard_doc": "desc"}
],
"search_after": [
"2021-05-20T05:30:04.832Z"
]
}
```

The `keep_alive` parameter in a search request is optional. It specifies the amount by which to extend the time to keep a PIT.
{: .note}

## List all PITs
Copy link
Contributor

@bharath-techie bharath-techie Oct 29, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we also add a snippet where user can extend keep alive using search API.

```json
GET /_search
{
  "size": 10000,
  "query": {
    "match" : {
      "user.id" : "elkbee"
    }
  },
  "pit": {
    "id":  "46ToAwMDaWR5BXV1aWQyKwZub2RlXzMAAAAAAAAAACoBYwADaWR4BXV1aWQxAgZub2RlXzEAAAAAAAAAAAEBYQADaWR5BXV1aWQyKgZub2RlXzIAAAAAAAAAAAwBYgACBXV1aWQyAAAFdXVpZDEAAQltYXRjaF9hbGw_gAAAAA==", 
    "keep_alive": "100m" <--- //optional to extend a PIT's keep alive. The new expiry will be last accessed time + max of (request keep alive , current keep alive )
  },
  "sort": [ 
    {"@timestamp": {"order": "asc", "format": "strict_date_optional_time_nanos"}},
    {"_shard_doc": "desc"}
  ]
}

Introduced 2.4
{: .label .label-purple }

Returns all PITs in the OpenSearch cluster.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we give an example of list all PITS api here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The example is in the "sample request" section below.


### Cross-cluster behavior
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bharath-techie @dhruv16dhr Should we have the cross cluster behaviour in different section for PIT apis? Defining it like this can be confusing for a new user since we are giving sample example under cross-cluster behaviour for delete and list PITS

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like as a user , they might miss reading upon cross cluster behavior if its down below.


The List All PITs API returns only local PITs or mixed PITs (PITs created in both local and remote clusters). It does not return fully remote PITs.

#### Sample request

```json
GET /_search/point_in_time/_all
```

#### Sample response

```json
{
"pits": [
{
"pit_id": "o463QQEPbXktaW5kZXgtMDAwMDAxFnNOWU43ckt3U3IyaFVpbGE1UWEtMncAFjFyeXBsRGJmVFM2RTB6eVg1aVVqQncAAAAAAAAAAAEWcDVrM3ZIX0pRNS1XejE5YXRPRFhzUQEWc05ZTjdyS3dTcjJoVWlsYTVRYS0ydwAA",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we mention in comments that this one is local PIT or remote pit? @bharath-techie

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets not do that for now, it might be confusing for user.

"creation_time": 1658146048666,
"keep_alive": 6000000
},
{
"pit_id": "o463QQEPbXktaW5kZXgtMDAwMDAxFnNOWU43ckt3U3IyaFVpbGE1UWEtMncAFjFyeXBsRGJmVFM2RTB6eVg1aVVqQncAAAAAAAAAAAIWcDVrM3ZIX0pRNS1XejE5YXRPRFhzUQEWc05ZTjdyS3dTcjJoVWlsYTVRYS0ydwAA",
"creation_time": 1658146050064,
"keep_alive": 6000000
}
]
}
```

### Response fields

Field | Data Type | Description
:--- | :--- | :---
pits | Array of JSON objects | The list of all PITs.

Each PIT object contains the following fields.

Field | Data Type | Description
:--- | :--- | :---
pit_id | [Base64 encoded binary]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/binary) | The PIT ID.
creation_time | long | The time the PIT was created, in milliseconds since the epoch.
keep_alive | long | The amount of time to keep the PIT, in milliseconds.

## Delete PITs
Introduced 2.4
{: .label .label-purple }

Deletes one, several, or all PITs. PITs are automatically deleted when the `keep_alive` time period elapses. However, to deallocate resources, you can delete a PIT using the Delete PIT API. The Delete PIT API supports deleting a list of PITs by ID or deleting all PITs at once.

### Cross-cluster behavior

The Delete PITs by ID API fully supports deleting cross-cluster PITs.

The Delete All PITs API deletes only local PITs or mixed PITs (PITs created in both local and remote clusters). It does not delete fully remote PITs.

#### Sample Request: Delete all PITs

```json
DELETE /_search/point_in_time/_all
```

If you want to delete one or several PITs, specify their PIT IDs in the request body.

### Request fields

Field | Data Type | Description
:--- | :--- | :---
pit_id | [Base64 encoded binary]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/binary) or an array of binaries | The PIT IDs of the PITs to be deleted. Required.

#### Sample request: Delete PITs by ID

```json
DELETE /_search/point_in_time

{
"pit_id": [
"o463QQEPbXktaW5kZXgtMDAwMDAxFkhGN09fMVlPUkVPLXh6MUExZ1hpaEEAFjBGbmVEZHdGU1EtaFhhUFc4ZkR5cWcAAAAAAAAAAAEWaXBPNVJtZEhTZDZXTWFFR05waXdWZwEWSEY3T18xWU9SRU8teHoxQTFnWGloQQAA",
"o463QQEPbXktaW5kZXgtMDAwMDAxFkhGN09fMVlPUkVPLXh6MUExZ1hpaEEAFjBGbmVEZHdGU1EtaFhhUFc4ZkR5cWcAAAAAAAAAAAIWaXBPNVJtZEhTZDZXTWFFR05waXdWZwEWSEY3T18xWU9SRU8teHoxQTFnWGloQQAA"
]
}
```

#### Sample response

For each PIT, the response contains a JSON object with a PIT ID and a `successful` field that specifies whether the deletion was successful. Partial failures are treated as failures.

```json
{
"pits": [
{
"successful": true,
"pit_id": "o463QQEPbXktaW5kZXgtMDAwMDAxFkhGN09fMVlPUkVPLXh6MUExZ1hpaEEAFjBGbmVEZHdGU1EtaFhhUFc4ZkR5cWcAAAAAAAAAAAEWaXBPNVJtZEhTZDZXTWFFR05waXdWZwEWSEY3T18xWU9SRU8teHoxQTFnWGloQQAA"
},
{
"successful": false,
"pit_id": "o463QQEPbXktaW5kZXgtMDAwMDAxFkhGN09fMVlPUkVPLXh6MUExZ1hpaEEAFjBGbmVEZHdGU1EtaFhhUFc4ZkR5cWcAAAAAAAAAAAIWaXBPNVJtZEhTZDZXTWFFR05waXdWZwEWSEY3T18xWU9SRU8teHoxQTFnWGloQQAA"
}
]
}
```

### Response fields

Field | Data Type | Description
:--- | :--- | :---
successful | Boolean | Whether the delete operation was successful.
pit_id | [Base64 encoded binary]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/binary) | The PIT ID of the PIT to be deleted.

## PIT segments
Introduced 2.4
{: .label .label-purple }

Similarly to the [CAT Segments API]({{site.url}}{{site.baseurl}}/api-reference/cat/cat-segments), the PIT Segments API provides low-level information about the disk utilization of a PIT by describing its Lucene segments. The PIT Segments API supports listing segment information of a specific PIT by ID or of all PITs at once.

#### Sample request: PIT segments of all PITs

```json
GET /_cat/pit_segments/_all
```

If you want to list segments for one or several PITs, specify their PIT IDs in the request body.

### Request fields

Field | Data Type | Description
:--- | :--- | :---
pit_id | [Base64 encoded binary]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/binary) or an array of binaries | The PIT IDs of the PITs whose segments are to be listed. Required.

#### Sample request: PIT segments of PITs by ID

```json
GET /_cat/pit_segments

{
"pit_id": [
"o463QQEPbXktaW5kZXgtMDAwMDAxFkhGN09fMVlPUkVPLXh6MUExZ1hpaEEAFjBGbmVEZHdGU1EtaFhhUFc4ZkR5cWcAAAAAAAAAAAEWaXBPNVJtZEhTZDZXTWFFR05waXdWZwEWSEY3T18xWU9SRU8teHoxQTFnWGloQQAA",
"o463QQEPbXktaW5kZXgtMDAwMDAxFkhGN09fMVlPUkVPLXh6MUExZ1hpaEEAFjBGbmVEZHdGU1EtaFhhUFc4ZkR5cWcAAAAAAAAAAAIWaXBPNVJtZEhTZDZXTWFFR05waXdWZwEWSEY3T18xWU9SRU8teHoxQTFnWGloQQAA"
]
}
```

#### Sample response

```json
index shard prirep ip segment generation docs.count docs.deleted size size.memory committed searchable version compound
index1 0 r 10.212.36.190 _0 0 4 0 3.8kb 1364 false true 8.8.2 true
index1 1 p 10.212.36.190 _0 0 3 0 3.7kb 1364 false true 8.8.2 true
index1 2 r 10.212.74.139 _0 0 2 0 3.6kb 1364 false true 8.8.2 true
```

## PIT settings

You can specify the following settings for a PIT.

Setting | Description | Default
:--- | :--- | :---
point_in_time.max_keep_alive | A cluster-level setting that specifies the maximum value for the `keep_alive` parameter. | 24h
search.max_open_pit_context | A node-level setting that specifies the maximum number of open PIT contexts for the node. | 300
Loading