Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds Point in Time documentation #1753

Merged
merged 14 commits into from
Nov 4, 2022
8 changes: 7 additions & 1 deletion _api-reference/nodes-apis/nodes-stats.md
Original file line number Diff line number Diff line change
Expand Up @@ -158,7 +158,10 @@ GET _nodes/stats/
"current" : 0
},
"search" : {
"open_contexts" : 0,
"search.point_in_time_total" : 9,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The results for _node/stats for search attribute is:

"search": {
                    "open_contexts": 0,
                    "query_total": 0,
                    "query_time_in_millis": 0,
                    "query_current": 0,
                    "fetch_total": 0,
                    "fetch_time_in_millis": 0,
                    "fetch_current": 0,
                    "scroll_total": 0,
                    "scroll_time_in_millis": 0,
                    "scroll_current": 0,
                    "point_in_time_total": 0,
                    "point_in_time_time_in_millis": 0,
                    "point_in_time_current": 0,
                    "suggest_total": 0,
                    "suggest_time_in_millis": 0,
                    "suggest_current": 0
                }

please remove search.

"search.point_in_time_time" : 11451670,
"search.point_in_time_current" : 4,
"open_contexts" : 4,
"query_total" : 194,
"query_time_in_millis" : 467,
"query_current" : 0,
Expand Down Expand Up @@ -648,6 +651,9 @@ get.<br>&nbsp;&nbsp;&nbsp;&nbsp;missing_total | Integer | The number of failed g
get.<br>&nbsp;&nbsp;&nbsp;&nbsp;missing_time_in_millis | Integer | The total time for all failed get operations, in milliseconds.
get.<br>&nbsp;&nbsp;&nbsp;&nbsp;current | Integer | The number of get operations that are currently running.
search | Object | Statistics about the search operations for the node.
search.<br>&nbsp;&nbsp;&nbsp;&nbsp;search.point_in_time_total | Integer | The total number of Point in Time contexts that have been created (completed and active) since the node last restarted.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use point_in_time_total instead of search.point_in_time_total

search.<br>&nbsp;&nbsp;&nbsp;&nbsp;search.point_in_time_time | Integer | The time that Point in Time contexts have been held open since the node last restarted.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above

search.<br>&nbsp;&nbsp;&nbsp;&nbsp;search.point_in_time_current | Integer | The number of Point in Time contexts currently open.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above

search.<br>&nbsp;&nbsp;&nbsp;&nbsp;open_contexts | Integer | The number of open search contexts.
search.<br>&nbsp;&nbsp;&nbsp;&nbsp;query_total | Integer | The total number of query operations.
search.<br>&nbsp;&nbsp;&nbsp;&nbsp;query_time_in_millis | Integer | The total time for all query operations, in milliseconds.
Expand Down
267 changes: 267 additions & 0 deletions _opensearch/point-in-time-api.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,267 @@
---
layout: default
title: Point in Time API
nav_order: 58
has_children: false
parent: Point in Time
---

# Point in Time API

Use the [Point in Time (PIT)]({{site.url}}{{site.baseurl}}/opensearch/point-in-time/) API to manage Points in Time.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are defining Point in Time as PIT, we should define it on first appearance and then use the acronym thereafter. Please apply to all files.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Use the [Point in Time (PIT)]({{site.url}}{{site.baseurl}}/opensearch/point-in-time/) API to manage Points in Time.
Use the [Point in Time (PIT)]({{site.url}}{{site.baseurl}}/opensearch/point-in-time/) API to manage PITs.


---

#### Table of contents
- TOC
{:toc}

---

## Create a PIT
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we also add info about these cluster settings.

  • We limit the maximum keep-alive of a PIT using a cluster level setting point_in_time.max_keep_alive (defaults to 24h . Users can change it as required.])
  • We limit the number of open PIT contexts on each node using a node level setting search.max_open_pit_context (Defaults to 300.)

Introduced 2.4
{: .label .label-purple }

Creates a PIT. The `keep_alive` query parameter is required; it specifies how long to keep this PIT.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Creates a PIT. The `keep_alive` query parameter is required; it specifies how long to keep this PIT.
Creates a PIT. The `keep_alive` query parameter is required; it specifies how long to keep a PIT.


### Path and HTTP methods

```json
POST /<target_indexes>/_search/point_in_time?keep_alive=1h&routing=&expand_wildcards=&preference=
```

### Path parameters

Parameter | Data Type | Description
:--- | :--- | :---
target_indexes | String | The name(s) of the target index(es) for the PIT. May contain a comma-separated list or a wildcard index pattern.

### Query parameters

Parameter | Data Type | Description
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we have a column to specify which param is options or required @bharath-techie @dhruv16dhr ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah this could be useful.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are adhering to our API style guide, which does not have a "required/optional" column. We list the required/optional in the description. But if you feel that it's needed, I can add the column.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets go ahead with the API style guide as we have been following till now

:--- | :--- | :---
keep_alive | Time | The amount of time to keep the PIT. Required.
preference | String | The node or the shard used to perform the search. Optional. Default is random.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PIT do not support preference.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is for create PIT, which does support preference. please add it back @kolchfa-aws

routing | String | Specifies to route search requests to a specific shard. Optional. Default is the document's `_id`.
expand_wildcards | String | The type of index that can match the wildcard pattern. Supports comma-separated values. Valid values are the following:<br>- `all`: Match any index or data stream, including hidden ones. <br>- `open`: Match open, non-hidden indexes or non-hidden data streams. <br>- `closed`: Match closed, non-hidden indexes or non-hidden data streams. <br>- `hidden`: Match hidden indexes or data streams. Must be combined with `open`, `closed` or both `open` and `closed`.<br>- `none`: No wildcard patterns are accepted.<br> Optional. Default is `open`.
allow_partial_pit_creation | Boolean | Specifies whether to create a PIT with partial failures. Optional. Default is `false`.

#### Sample request

```json
POST /my-index-1/_search/point_in_time?keep_alive=100m
```

#### Sample response

```json
{
"pit_id": "o463QQEPbXktaW5kZXgtMDAwMDAxFnNOWU43ckt3U3IyaFVpbGE1UWEtMncAFjFyeXBsRGJmVFM2RTB6eVg1aVVqQncAAAAAAAAAAAIWcDVrM3ZIX0pRNS1XejE5YXRPRFhzUQEWc05ZTjdyS3dTcjJoVWlsYTVRYS0ydwAA",
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"creation_time": 1658146050064
}
```

### Response fields

Field | Data Type | Description
:--- | :--- | :---
pit_id | [Base64 encoded binary]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/binary) | The PIT ID.
creation_time | long | The time the PIT was created in milliseconds since the epoch.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
creation_time | long | The time the PIT was created in milliseconds since the epoch.
creation_time | long | The time the PIT was created, in milliseconds since the epoch.


## Extend a PIT time

You can extend a PIT time by providing an optional `keep_alive` parameter in the `pit` object when you perform a search:

```json
GET /_search
{
"size": 10000,
"query": {
"match" : {
"user.id" : "elkbee"
}
},
"pit": {
"id": "46ToAwMDaWR5BXV1aWQyKwZub2RlXzMAAAAAAAAAACoBYwADaWR4BXV1aWQxAgZub2RlXzEAAAAAAAAAAAEBYQADaWR5BXV1aWQyKgZub2RlXzIAAAAAAAAAAAwBYgACBXV1aWQyAAAFdXVpZDEAAQltYXRjaF9hbGw_gAAAAA==",
"keep_alive": "100m"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we give a comment that this is optional in search to give more stress? and provide it's default value as well?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will add the note below the request that the parameter is optional. What is the default value?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1m is the default value

Copy link
Contributor

@bharath-techie bharath-techie Nov 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In search it should be completely optional. there is no default val as of now.

},
"sort": [
{"@timestamp": {"order": "asc", "format": "strict_date_optional_time_nanos"}},
{"_shard_doc": "desc"}
],
"search_after": [
"2021-05-20T05:30:04.832Z"
]
}
```

## List all PITs
Copy link
Contributor

@bharath-techie bharath-techie Oct 29, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we also add a snippet where user can extend keep alive using search API.

```json
GET /_search
{
  "size": 10000,
  "query": {
    "match" : {
      "user.id" : "elkbee"
    }
  },
  "pit": {
    "id":  "46ToAwMDaWR5BXV1aWQyKwZub2RlXzMAAAAAAAAAACoBYwADaWR4BXV1aWQxAgZub2RlXzEAAAAAAAAAAAEBYQADaWR5BXV1aWQyKgZub2RlXzIAAAAAAAAAAAwBYgACBXV1aWQyAAAFdXVpZDEAAQltYXRjaF9hbGw_gAAAAA==", 
    "keep_alive": "100m" <--- //optional to extend a PIT's keep alive. The new expiry will be last accessed time + max of (request keep alive , current keep alive )
  },
  "sort": [ 
    {"@timestamp": {"order": "asc", "format": "strict_date_optional_time_nanos"}},
    {"_shard_doc": "desc"}
  ]
}

Introduced 2.4
{: .label .label-purple }

Returns all PITs in the OpenSearch cluster.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we give an example of list all PITS api here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The example is in the "sample request" section below.


### Cross-cluster behavior
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bharath-techie @dhruv16dhr Should we have the cross cluster behaviour in different section for PIT apis? Defining it like this can be confusing for a new user since we are giving sample example under cross-cluster behaviour for delete and list PITS

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like as a user , they might miss reading upon cross cluster behavior if its down below.


The List All PITs API returns only local PITs or mixed PITs (PITs created in both local and remote clusters). It does not return fully remote PITs.

#### Sample request

```json
GET /_search/point_in_time/_all
```

#### Sample response

```json
{
"pits": [
{
"pit_id": "o463QQEPbXktaW5kZXgtMDAwMDAxFnNOWU43ckt3U3IyaFVpbGE1UWEtMncAFjFyeXBsRGJmVFM2RTB6eVg1aVVqQncAAAAAAAAAAAEWcDVrM3ZIX0pRNS1XejE5YXRPRFhzUQEWc05ZTjdyS3dTcjJoVWlsYTVRYS0ydwAA",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we mention in comments that this one is local PIT or remote pit? @bharath-techie

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets not do that for now, it might be confusing for user.

"creation_time": 1658146048666,
"keep_alive": 6000000
},
{
"pit_id": "o463QQEPbXktaW5kZXgtMDAwMDAxFnNOWU43ckt3U3IyaFVpbGE1UWEtMncAFjFyeXBsRGJmVFM2RTB6eVg1aVVqQncAAAAAAAAAAAIWcDVrM3ZIX0pRNS1XejE5YXRPRFhzUQEWc05ZTjdyS3dTcjJoVWlsYTVRYS0ydwAA",
"creation_time": 1658146050064,
"keep_alive": 6000000
}
]
}
```

### Response fields

Field | Data Type | Description
:--- | :--- | :---
pits | Array of JSON objects | The list of all PITs.

Each PIT object contains the following fields:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Each PIT object contains the following fields:
Each PIT object contains the following fields.


Field | Data Type | Description
:--- | :--- | :---
pit_id | [Base64 encoded binary]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/binary) | The PIT ID.
creation_time | long | The time the PIT was created in milliseconds since the epoch.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
creation_time | long | The time the PIT was created in milliseconds since the epoch.
creation_time | long | The time the PIT was created, in milliseconds since the epoch.

keep_alive | long | The amount of time to keep the PIT in milliseconds.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
keep_alive | long | The amount of time to keep the PIT in milliseconds.
keep_alive | long | The amount of time to keep the PIT, in milliseconds.


## Delete PITs
Introduced 2.4
{: .label .label-purple }

Deletes one, several, or all PITs. PITs are automatically deleted when the `keep_alive` time period elapses. However, to free resources, you can delete a PIT using the Delete PIT API. The Delete PIT API supports deleting a list of PITs by ID or deleting all PITs at once.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of "to free resources", would "to make resources available" work?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not the same thing. Changed to "de-allocate".

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"deallocate" (no hyphen)


### Cross-cluster behavior

The Delete PITs by ID API fully supports deleting cross-cluster PITs.

The Delete All PITs API deletes only local PITs or mixed PITs (PITs created in both local and remote clusters). It does not delete fully remote PITs.

#### Sample Request: Delete all PITs

```json
DELETE /_search/point_in_time/_all
```

If you want to delete one or several PITs, specify their PIT IDs in the request body.

### Request fields

Field | Data Type | Description
:--- | :--- | :---
pit_id | [Base64 encoded binary]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/binary) or an array of binaries | The PIT IDs of the PITs to be deleted. Required.

#### Sample request: Delete PITs by ID

```json
DELETE /_search/point_in_time

{
"pit_id": [
"o463QQEPbXktaW5kZXgtMDAwMDAxFkhGN09fMVlPUkVPLXh6MUExZ1hpaEEAFjBGbmVEZHdGU1EtaFhhUFc4ZkR5cWcAAAAAAAAAAAEWaXBPNVJtZEhTZDZXTWFFR05waXdWZwEWSEY3T18xWU9SRU8teHoxQTFnWGloQQAA",
"o463QQEPbXktaW5kZXgtMDAwMDAxFkhGN09fMVlPUkVPLXh6MUExZ1hpaEEAFjBGbmVEZHdGU1EtaFhhUFc4ZkR5cWcAAAAAAAAAAAIWaXBPNVJtZEhTZDZXTWFFR05waXdWZwEWSEY3T18xWU9SRU8teHoxQTFnWGloQQAA"
]
}
```

#### Sample response

For each PIT, the response contains a JSON object with a PIT ID and a `successful` field that specifies if the deletion was successful. Partial failures are treated as failures.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
For each PIT, the response contains a JSON object with a PIT ID and a `successful` field that specifies if the deletion was successful. Partial failures are treated as failures.
For each PIT, the response contains a JSON object with a PIT ID and a `successful` field that specifies whether the deletion was successful. Partial failures are treated as failures.


```json
{
"pits": [
{
"successful": true,
"pit_id": "o463QQEPbXktaW5kZXgtMDAwMDAxFkhGN09fMVlPUkVPLXh6MUExZ1hpaEEAFjBGbmVEZHdGU1EtaFhhUFc4ZkR5cWcAAAAAAAAAAAEWaXBPNVJtZEhTZDZXTWFFR05waXdWZwEWSEY3T18xWU9SRU8teHoxQTFnWGloQQAA"
},
{
"successful": false,
"pit_id": "o463QQEPbXktaW5kZXgtMDAwMDAxFkhGN09fMVlPUkVPLXh6MUExZ1hpaEEAFjBGbmVEZHdGU1EtaFhhUFc4ZkR5cWcAAAAAAAAAAAIWaXBPNVJtZEhTZDZXTWFFR05waXdWZwEWSEY3T18xWU9SRU8teHoxQTFnWGloQQAA"
}
]
}
```

### Response fields

Field | Data Type | Description |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Field | Data Type | Description |
Field | Data Type | Description

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implemented, although this does not make a difference in rendering :)

:--- | :--- | :---
successful | Boolean | Whether the delete operation was successful.
pit_id | [Base64 encoded binary]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/binary) | The PIT ID of the PIT to be deleted.

## PIT Segments
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should "Segments" be capitalized in this section?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's the API name so should be capitalized.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had the same question. There are previous headings, for example, "Delete all PITs", where we did not capitalize them as the name of the API. If the headings are intended to reflect the names of the APIs, let's capitalize them consistently, and would it make sense to include "API" in the headings for maximum clarity?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Since there is no "official" name, I will make these sentence case and assume they refer to the action.

Introduced 2.4
{: .label .label-purple }

Similarly to the [CAT Segments API]({{site.url}}{{site.baseurl}}/api-reference/cat/cat-segments), the PIT Segments API provides low-level information about the disk utilization of a PIT by describing its Lucene segments. The PIT Segments API supports listing segment information of a specific PIT by ID or of all PITs at once.

#### Sample request: PIT Segments of all PITs

```json
GET /_cat/pit_segments/_all
```

If you want to list segments for one or several PITs, specify their PIT IDs in the request body.

### Request fields

Field | Data Type | Description
:--- | :--- | :---
pit_id | [Base64 encoded binary]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/binary) or an array of binaries | The PIT IDs of the PITs whose segments are to be listed. Required.

#### Sample request: Delete PITs by ID
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be PITs by ID I think. @bharath-techie please do give your view.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah this should be "PIT segments for list of PITs"


```json
GET /_cat/pit_segments

{
"pit_id": [
"o463QQEPbXktaW5kZXgtMDAwMDAxFkhGN09fMVlPUkVPLXh6MUExZ1hpaEEAFjBGbmVEZHdGU1EtaFhhUFc4ZkR5cWcAAAAAAAAAAAEWaXBPNVJtZEhTZDZXTWFFR05waXdWZwEWSEY3T18xWU9SRU8teHoxQTFnWGloQQAA",
"o463QQEPbXktaW5kZXgtMDAwMDAxFkhGN09fMVlPUkVPLXh6MUExZ1hpaEEAFjBGbmVEZHdGU1EtaFhhUFc4ZkR5cWcAAAAAAAAAAAIWaXBPNVJtZEhTZDZXTWFFR05waXdWZwEWSEY3T18xWU9SRU8teHoxQTFnWGloQQAA"
]
}
```

#### Sample response

```json
index shard prirep ip segment generation docs.count docs.deleted size size.memory committed searchable version compound
index1 0 r 10.212.36.190 _0 0 4 0 3.8kb 1364 false true 8.8.2 true
index1 1 p 10.212.36.190 _0 0 3 0 3.7kb 1364 false true 8.8.2 true
index1 2 r 10.212.74.139 _0 0 2 0 3.6kb 1364 false true 8.8.2 true
```

## PIT Settings
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If "Settings" isn't a proper name, lowercase it.


You can specify the following settings for a PIT.

Setting | Description | Default
:--- | :--- | :---
point_in_time.max_keep_alive | A cluster level setting that specifies the maximum value for the `keep_alive` parameter. | 24h
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
point_in_time.max_keep_alive | A cluster level setting that specifies the maximum value for the `keep_alive` parameter. | 24h
point_in_time.max_keep_alive | A cluster-level setting that specifies the maximum value for the `keep_alive` parameter. | 24h

search.max_open_pit_context | A node level setting that specifies the maximum number of open PIT contexts for the node. | 300
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
search.max_open_pit_context | A node level setting that specifies the maximum number of open PIT contexts for the node. | 300
search.max_open_pit_context | A node-level setting that specifies the maximum number of open PIT contexts for the node. | 300

Loading