diff --git a/docs/images/cli.gif b/docs/images/cli.gif new file mode 100644 index 00000000..e55c47ba Binary files /dev/null and b/docs/images/cli.gif differ diff --git a/docs/images/expression.png b/docs/images/expression.png new file mode 100644 index 00000000..ab853f1d Binary files /dev/null and b/docs/images/expression.png differ diff --git a/docs/images/expressionAtom.png b/docs/images/expressionAtom.png new file mode 100644 index 00000000..9572c10e Binary files /dev/null and b/docs/images/expressionAtom.png differ diff --git a/docs/images/joinPart.png b/docs/images/joinPart.png new file mode 100644 index 00000000..635f4838 Binary files /dev/null and b/docs/images/joinPart.png differ diff --git a/docs/images/mac_installer_destination.png b/docs/images/mac_installer_destination.png new file mode 100644 index 00000000..54ba84f2 Binary files /dev/null and b/docs/images/mac_installer_destination.png differ diff --git a/docs/images/mac_installer_home.png b/docs/images/mac_installer_home.png new file mode 100644 index 00000000..cacb1ce4 Binary files /dev/null and b/docs/images/mac_installer_home.png differ diff --git a/docs/images/mac_installer_license.png b/docs/images/mac_installer_license.png new file mode 100644 index 00000000..5fea55b9 Binary files /dev/null and b/docs/images/mac_installer_license.png differ diff --git a/docs/images/mac_installer_password.png b/docs/images/mac_installer_password.png new file mode 100644 index 00000000..5f91355d Binary files /dev/null and b/docs/images/mac_installer_password.png differ diff --git a/docs/images/mac_installer_readme.png b/docs/images/mac_installer_readme.png new file mode 100644 index 00000000..2958c1a4 Binary files /dev/null and b/docs/images/mac_installer_readme.png differ diff --git a/docs/images/mac_installer_select_and_browse.png b/docs/images/mac_installer_select_and_browse.png new file mode 100644 index 00000000..f2a418d0 Binary files /dev/null and b/docs/images/mac_installer_select_and_browse.png differ diff --git a/docs/images/mac_installer_succesful.png b/docs/images/mac_installer_succesful.png new file mode 100644 index 00000000..84c6834b Binary files /dev/null and b/docs/images/mac_installer_succesful.png differ diff --git a/docs/images/mac_signing_error_1.png b/docs/images/mac_signing_error_1.png new file mode 100644 index 00000000..4395db76 Binary files /dev/null and b/docs/images/mac_signing_error_1.png differ diff --git a/docs/images/mac_signing_error_2.png b/docs/images/mac_signing_error_2.png new file mode 100644 index 00000000..251f188c Binary files /dev/null and b/docs/images/mac_signing_error_2.png differ diff --git a/docs/images/predicate.png b/docs/images/predicate.png new file mode 100644 index 00000000..ebc83fdc Binary files /dev/null and b/docs/images/predicate.png differ diff --git a/docs/images/selectElement.png b/docs/images/selectElement.png new file mode 100644 index 00000000..89ee1812 Binary files /dev/null and b/docs/images/selectElement.png differ diff --git a/docs/images/selectElements.png b/docs/images/selectElements.png new file mode 100644 index 00000000..22d6e43d Binary files /dev/null and b/docs/images/selectElements.png differ diff --git a/docs/images/showFilter.png b/docs/images/showFilter.png new file mode 100644 index 00000000..47dbc074 Binary files /dev/null and b/docs/images/showFilter.png differ diff --git a/docs/images/showStatement.png b/docs/images/showStatement.png new file mode 100644 index 00000000..a1939e7d Binary files /dev/null and b/docs/images/showStatement.png differ diff --git a/docs/images/singleDeleteStatement.png b/docs/images/singleDeleteStatement.png new file mode 100644 index 00000000..9b1a88c4 Binary files /dev/null and b/docs/images/singleDeleteStatement.png differ diff --git a/docs/images/sql.png b/docs/images/sql.png new file mode 100644 index 00000000..39a7546a Binary files /dev/null and b/docs/images/sql.png differ diff --git a/docs/images/tableName.png b/docs/images/tableName.png new file mode 100644 index 00000000..c3b1c5d0 Binary files /dev/null and b/docs/images/tableName.png differ diff --git a/docs/images/tableSource.png b/docs/images/tableSource.png new file mode 100644 index 00000000..f109f44d Binary files /dev/null and b/docs/images/tableSource.png differ diff --git a/docs/images/tableau_connection.png b/docs/images/tableau_connection.png new file mode 100644 index 00000000..8e318390 Binary files /dev/null and b/docs/images/tableau_connection.png differ diff --git a/docs/images/tableau_connection_error.png b/docs/images/tableau_connection_error.png new file mode 100644 index 00000000..bdedaf85 Binary files /dev/null and b/docs/images/tableau_connection_error.png differ diff --git a/docs/images/tableau_dsn.png b/docs/images/tableau_dsn.png new file mode 100644 index 00000000..0cca8ce5 Binary files /dev/null and b/docs/images/tableau_dsn.png differ diff --git a/docs/images/tableau_sample_data.png b/docs/images/tableau_sample_data.png new file mode 100644 index 00000000..dbb0bb4c Binary files /dev/null and b/docs/images/tableau_sample_data.png differ diff --git a/docs/images/tableau_sample_viz.png b/docs/images/tableau_sample_viz.png new file mode 100644 index 00000000..59a91951 Binary files /dev/null and b/docs/images/tableau_sample_viz.png differ diff --git a/docs/images/windows_dsn_customize.png b/docs/images/windows_dsn_customize.png new file mode 100644 index 00000000..aafcfc0c Binary files /dev/null and b/docs/images/windows_dsn_customize.png differ diff --git a/docs/images/windows_installer_home.png b/docs/images/windows_installer_home.png new file mode 100644 index 00000000..64b2e1eb Binary files /dev/null and b/docs/images/windows_installer_home.png differ diff --git a/docs/images/windows_installer_install.png b/docs/images/windows_installer_install.png new file mode 100644 index 00000000..a99b43e5 Binary files /dev/null and b/docs/images/windows_installer_install.png differ diff --git a/docs/images/windows_installer_select_and_browse.png b/docs/images/windows_installer_select_and_browse.png new file mode 100644 index 00000000..5a99f231 Binary files /dev/null and b/docs/images/windows_installer_select_and_browse.png differ diff --git a/docs/images/windows_signing_error_1.png b/docs/images/windows_signing_error_1.png new file mode 100644 index 00000000..2cbd1055 Binary files /dev/null and b/docs/images/windows_signing_error_1.png differ diff --git a/docs/images/windows_singing_error_2.png b/docs/images/windows_singing_error_2.png new file mode 100644 index 00000000..696d8772 Binary files /dev/null and b/docs/images/windows_singing_error_2.png differ diff --git a/docs/images/workbench.gif b/docs/images/workbench.gif new file mode 100644 index 00000000..cbfa312a Binary files /dev/null and b/docs/images/workbench.gif differ diff --git a/docs/sql/basic.md b/docs/sql/basic.md new file mode 100644 index 00000000..63d4bb3a --- /dev/null +++ b/docs/sql/basic.md @@ -0,0 +1,358 @@ +--- +layout: default +title: Basic Queries +parent: SQL +nav_order: 5 +--- + + +# Basic queries + +Use the `SELECT` clause, along with `FROM`, `WHERE`, `GROUP BY`, `HAVING`, `ORDER BY`, and `LIMIT` to search and aggregate data. + +Among these clauses, `SELECT` and `FROM` are required, as they specify which fields to retrieve and which indices to retrieve them from. All other clauses are optional. Use them according to your needs. + +### Syntax + +The complete syntax for searching and aggregating data is as follows: + +```sql +SELECT [DISTINCT] (* | expression) [[AS] alias] [, ...] +FROM index_name +[WHERE predicates] +[GROUP BY expression [, ...] + [HAVING predicates]] +[ORDER BY expression [IS [NOT] NULL] [ASC | DESC] [, ...]] +[LIMIT [offset, ] size] +``` + +### Fundamentals + +Apart from the predefined keywords of SQL, the most basic elements are literal and identifiers. +A literal is a numeric, string, date or boolean constant. An identifier is an Elasticsearch index or field name. +With arithmetic operators and SQL functions, use literals and identifiers to build complex expressions. + +Rule `expressionAtom`: + +![expressionAtom](../../images/expressionAtom.png) + +The expression in turn can be combined into a predicate with logical operator. Use a predicate in the `WHERE` and `HAVING` clause to filter out data by specific conditions. + +Rule `expression`: + +![expression](../../images/expression.png) + +Rule `predicate`: + +![expression](../../images/predicate.png) + +### Execution Order + +These SQL clauses execute in an order different from how they appear: + +```sql +FROM index + WHERE predicates + GROUP BY expressions + HAVING predicates + SELECT expressions + ORDER BY expressions + LIMIT size +``` + +## Select + +Specify the fields to be retrieved. + +### Syntax + +Rule `selectElements`: + +![selectElements](../../images/selectElements.png) + +Rule `selectElement`: + +![selectElements](../../images/selectElement.png) + +*Example 1*: Use `*` to retrieve all fields in an index: + +```sql +SELECT * +FROM accounts +``` + +| id | account_number | firstname | gender | city | balance | employer | state | email | address | lastname | age +:--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- +0 | 1 | Amber | M | Brogan | 39225 | Pyrami | IL | amberduke@pyrami.com | 880 Holmes Lane | Duke | 32 +1 | 16 | Hattie | M | Dante | 5686 | Netagy | TN | hattiebond@netagy.com | 671 Bristol Street | Bond | 36 +2 | 13 | Nanette | F | Nogal | 32838 | Quility | VA | nanettebates@quility.com | 789 Madison Street | Bates | 28 +3 | 18 | Dale | M | Orick | 4180 | | MD | daleadams@boink.com | 467 Hutchinson Court | Adams | 33 + +*Example 2*: Use field name(s) to retrieve only specific fields: + +```sql +SELECT firstname, lastname +FROM accounts +``` + +| id | firstname | lastname +:--- | :--- | :--- +0 | Amber | Duke +1 | Hattie | Bond +2 | Nanette | Bates +3 | Dale | Adams + +*Example 3*: Use field aliases instead of field names. Field aliases are used to make field names more readable: + +```sql +SELECT account_number AS num +FROM accounts +``` + +| id | num +:--- | :--- +0 | 1 +1 | 6 +2 | 13 +3 | 18 + +*Example 4*: Use the `DISTINCT` clause to get back only unique field values. You can specify one or more field names: + +```sql +SELECT DISTINCT age +FROM accounts +``` + +| id | age +:--- | :--- +0 | 28 +1 | 32 +2 | 33 +3 | 36 + +## From + +Specify the index that you want search. +You can specify subqueries within the `FROM` clause. + +### Syntax + +Rule `tableName`: + +![tableName](../../images/tableName.png) + +*Example 1*: Use index aliases to query across indexes. To learn about index aliases, see [Index Alias](../elasticsearch/index-alias/). +In this sample query, `acc` is an alias for the `accounts` index: + +```sql +SELECT account_number, accounts.age +FROM accounts +``` + +or + +```sql +SELECT account_number, acc.age +FROM accounts acc +``` + +| id | account_number | age +:--- | :--- | :--- +0 | 1 | 32 +1 | 6 | 36 +2 | 13 | 28 +3 | 18 | 33 + +*Example 2*: Use index patterns to query indices that match a specific pattern: + +```sql +SELECT account_number +FROM account* +``` + +| id | account_number +:--- | :--- +0 | 1 +1 | 6 +2 | 13 +3 | 18 + +## Where + +Specify a condition to filter the results. + +| Operators | Behavior +:--- | :--- +`=` | Equal to. +`<>` | Not equal to. +`>` | Greater than. +`<` | Less than. +`>=` | Greater than or equal to. +`<=` | Less than or equal to. +`IN` | Specify multiple `OR` operators. +`BETWEEN` | Similar to a range query. For more information about range queries, see [Range query](../elasticsearch/term/#range). +`LIKE` | Use for full text search. For more information about full-text queries, see [Full-text queries](../elasticsearch/full-text/). +`IS NULL` | Check if the field value is `NULL`. +`IS NOT NULL` | Check if the field value is `NOT NULL`. + +Combine comparison operators (`=`, `<>`, `>`, `>=`, `<`, `<=`) with boolean operators `NOT`, `AND`, or `OR` to build more complex expressions. + +*Example 1*: Use comparison operators for numbers, strings, or dates: + +```sql +SELECT account_number +FROM accounts +WHERE account_number = 1 +``` + +| id | account_number +:--- | :--- +0 | 1 + +*Example 2*: Elasticsearch allows for flexible schema so documents in an index may have different fields. Use `IS NULL` or `IS NOT NULL` to retrieve only missing fields or existing fields. We do not differentiate between missing fields and fields explicitly set to `NULL`: + +```sql +SELECT account_number, employer +FROM accounts +WHERE employer IS NULL +``` + +| id | account_number | employer +:--- | :--- | :--- +0 | 18 | + +*Example 3*: Deletes a document that satisfies the predicates in the `WHERE` clause: + +```sql +DELETE FROM accounts +WHERE age > 30 +``` + +## Group By + +Group documents with the same field value into buckets. + +*Example 1*: Group by fields: + +```sql +SELECT age +FROM accounts +GROUP BY age +``` + +| id | age +:--- | :--- +0 | 28 +1 | 32 +2 | 33 +3 | 36 + +*Example 2*: Group by field alias: + +```sql +SELECT account_number AS num +FROM accounts +GROUP BY num +``` + +| id | num +:--- | :--- +0 | 1 +1 | 6 +2 | 13 +3 | 18 + +*Example 4*: Use scalar functions in the `GROUP BY` clause: + +```sql +SELECT ABS(age) AS a +FROM accounts +GROUP BY ABS(age) +``` + +| id | a +:--- | :--- +0 | 28.0 +1 | 32.0 +2 | 33.0 +3 | 36.0 + +## Having + +Use the `HAVING` clause to aggregate inside each bucket based on aggregation functions (`COUNT`, `AVG`, `SUM`, `MIN`, and `MAX`). +The `HAVING` clause filters results from the `GROUP BY` clause: + +*Example 1*: + +```sql +SELECT age, MAX(balance) +FROM accounts +GROUP BY age HAVING MIN(balance) > 10000 +``` + +| id | age | MAX (balance) +:--- | :--- +0 | 28 | 32838 +1 | 32 | 39225 + +## Order By + +Use the `ORDER BY` clause to sort results into your desired order. + +*Example 1*: Use `ORDER BY` to sort by ascending or descending order. Besides regular field names, using `ordinal`, `alias`, or `scalar` functions are supported: + +```sql +SELECT account_number +FROM accounts +ORDER BY account_number DESC +``` + +| id | account_number +:--- | :--- +0 | 18 +1 | 13 +2 | 6 +3 | 1 + +*Example 2*: Specify if documents with missing fields are to be put at the beginning or at the end of the results. The default behavior of Elasticsearch is to return nulls or missing fields at the end. To push them before non-nulls, use the `IS NOT NULL` operator: + +```sql +SELECT employer +FROM accounts +ORDER BY employer IS NOT NULL +``` + +| id | employer +:--- | :--- +0 | +1 | Netagy +2 | Pyrami +3 | Quility + +## Limit + +Specify the maximum number of documents that you want to retrieve. This is similar to the `size` parameter in Elasticsearch. Used to prevent fetching large amounts of data into memory. + +*Example 1*: Specify the number of results to be returned: + +```sql +SELECT account_number +FROM accounts +ORDER BY account_number LIMIT 1 +``` + +| id | account_number +:--- | :--- +0 | 1 + +*Example 2*: Specify the document number that you want to start returning the results from. The second argument is equivalent to the `from` parameter in Elasticsearch. Use `ORDER BY` to ensure the same order between pages: + +```sql +SELECT account_number +FROM accounts +ORDER BY account_number LIMIT 1, 1 +``` + +| id | account_number +:--- | :--- +0 | 6 diff --git a/docs/sql/cli.md b/docs/sql/cli.md new file mode 100644 index 00000000..4a840a29 --- /dev/null +++ b/docs/sql/cli.md @@ -0,0 +1,101 @@ +--- +layout: default +title: SQL CLI +parent: SQL +nav_order: 2 +--- + +# SQL CLI + +SQL CLI is a stand-alone Python application that you can launch with the `odfesql` command. + +Install the SQL plugin to your Elasticsearch instance, run the CLI using MacOS or Linux, and connect to any valid Elasticsearch end-point. + +![SQL CLI](../../images/cli.gif) + +## Features + +SQL CLI has the following features: + +- Multi-line input +- Autocomplete for SQL syntax and index names +- Syntax highlighting +- Formatted output: + - Tabular format + - Field names with color + - Enabled horizontal display (by default) and vertical display when output is too wide for your terminal, for better visualization + - Pagination for large output +- Works with or without security enabled +- Supports loading configuration files +- Supports all SQL plugin queries + +## Install + +Launch your local Elasticsearch instance and make sure you have the SQL plugin installed. + +To install the SQL CLI: + +1. We suggest you install and activate a python3 virtual environment to avoid changing your local environment: +``` +pip install virtualenv +virtualenv venv +cd venv +source ./bin/activate +``` + +2. Install the CLI: +``` +pip3 install odfe-sql-cli +``` + +The SQL CLI only works with Python 3. +{: .note } + +3. To launch the CLI, run: +``` +odfesql https://localhost:9200 --username admin --password admin +``` +By default, the `odfesql` command connects to http://localhost:9200. + +## Configure + +When you first launch the SQL CLI, a configuration file is automatically created at `~/.config/odfesql-cli/config` (for MacOS and Linux), the configuration is auto-loaded thereafter. + +You can configure the following connection properties: + +- `endpoint`: You do not need to specify an option, anything that follows the launch command `odfesql` is considered as the endpoint. If you do not provide an endpoint, by default, the SQL CLI connects to http://localhost:9200. +- `-u/-w`: Supports username and password for HTTP basic authentication, such as with the security plugin or fine-grained access control for Amazon Elasticsearch Service. +- `--aws-auth`: Turns on AWS sigV4 authentication to connect to an Amazon Elasticsearch endpoint. Use with the AWS CLI (`aws configure`) to retrieve the local AWS configuration to authenticate and connect. + +For a list of all available configurations, see [clirc](https://github.com/opendistro-for-elasticsearch/sql-cli/blob/master/src/conf/clirc). + +## Using the CLI + +1. Save the sample [accounts test data](https://github.com/opendistro-for-elasticsearch/sql/blob/master/src/test/resources/doctest/testdata/accounts.json) file. + +1. Index the sample data. +``` +curl -H "Content-Type: application/x-ndjson" -POST https://localhost:9200/data/_bulk -u admin:admin --insecure --data-binary "@accounts.json" +``` + +1. Run a sample SQL command: +``` +SELECT * FROM accounts; +``` + +By default, you see a maximum output of 200 rows. To show more results, add a `LIMIT` clause with the desired value. + +## Query options + +Run a single query with the following options: + +- `--help`: Help page for options +- `-q`: Follow by a single query +- `-f`: Specify JDBC or raw format output +- `-v`: Display data vertically +- `-e`: Translate SQL to DSL + +## CLI options + +- `-p`: Always use pager to display output +- `--clirc`: Provide path for the configuration file diff --git a/docs/sql/complex.md b/docs/sql/complex.md new file mode 100644 index 00000000..7147d484 --- /dev/null +++ b/docs/sql/complex.md @@ -0,0 +1,420 @@ +--- +layout: default +title: Complex Queries +parent: SQL +nav_order: 6 +--- + +# Complex queries + +Besides simple SFW (`SELECT-FROM-WHERE`) queries, the SQL plugin supports complex queries such as subquery, join, union, and minus. These queries operate on more than one Elasticsearch index. To examine how these queries execute behind the scenes, use the `explain` operation. + + +## Joins + +Open Distro for Elasticsearch SQL supports inner joins, cross joins, and left outer joins. + +### Constraints + +Joins have a number of constraints: + +1. You can only join two indices. +1. You must use aliases for indices (e.g. `people p`). +1. Within an ON clause, you can only use AND conditions. +1. In a WHERE statement, don't combine trees that contain multiple indices. For example, the following statement works: + + ``` + WHERE (a.type1 > 3 OR a.type1 < 0) AND (b.type2 > 4 OR b.type2 < -1) + ``` + + The following statement does not: + + ``` + WHERE (a.type1 > 3 OR b.type2 < 0) AND (a.type1 > 4 OR b.type2 < -1) + ``` + +1. You can't use GROUP BY or ORDER BY for results. +1. LIMIT with OFFSET (e.g. `LIMIT 25 OFFSET 25`) is not supported. + +### Description + +The `JOIN` clause combines columns from one or more indices using values common to each. + +### Syntax + +Rule `tableSource`: + +![tableSource](../../images/tableSource.png) + +Rule `joinPart`: + +![joinPart](../../images/joinPart.png) + +### Example 1: Inner join + +Inner join creates a new result set by combining columns of two indices based on your join predicates. It iterates the two indices and compares each document to find the ones that satisfy the join predicates. You can optionally precede the `JOIN` clause with an `INNER` keyword. + +The join predicate(s) is specified by the ON clause. + +SQL query: + +```sql +SELECT + a.account_number, a.firstname, a.lastname, + e.id, e.name +FROM accounts a +JOIN employees_nested e + ON a.account_number = e.id +``` + +Explain: + +The `explain` output is complicated, because a `JOIN` clause is associated with two Elasticsearch DSL queries that execute in separate query planner frameworks. You can interpret it by examining the `Physical Plan` and `Logical Plan` objects. + +```json +{ + "Physical Plan" : { + "Project [ columns=[a.account_number, a.firstname, a.lastname, e.name, e.id] ]" : { + "Top [ count=200 ]" : { + "BlockHashJoin[ conditions=( a.account_number = e.id ), type=JOIN, blockSize=[FixedBlockSize with size=10000] ]" : { + "Scroll [ employees_nested as e, pageSize=10000 ]" : { + "request" : { + "size" : 200, + "from" : 0, + "_source" : { + "excludes" : [ ], + "includes" : [ + "id", + "name" + ] + } + } + }, + "Scroll [ accounts as a, pageSize=10000 ]" : { + "request" : { + "size" : 200, + "from" : 0, + "_source" : { + "excludes" : [ ], + "includes" : [ + "account_number", + "firstname", + "lastname" + ] + } + } + }, + "useTermsFilterOptimization" : false + } + } + } + }, + "description" : "Hash Join algorithm builds hash table based on result of first query, and then probes hash table to find matched rows for each row returned by second query", + "Logical Plan" : { + "Project [ columns=[a.account_number, a.firstname, a.lastname, e.name, e.id] ]" : { + "Top [ count=200 ]" : { + "Join [ conditions=( a.account_number = e.id ) type=JOIN ]" : { + "Group" : [ + { + "Project [ columns=[a.account_number, a.firstname, a.lastname] ]" : { + "TableScan" : { + "tableAlias" : "a", + "tableName" : "accounts" + } + } + }, + { + "Project [ columns=[e.name, e.id] ]" : { + "TableScan" : { + "tableAlias" : "e", + "tableName" : "employees_nested" + } + } + } + ] + } + } + } + } +} +``` + +Result set: + +| a.account_number | a.firstname | a.lastname | e.id | e.name +:--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- +6 | Hattie | Bond | 6 | Jane Smith + +### Example 2: Cross join + +Cross join, also known as cartesian join, combines each document from the first index with each document from the second. +The result set is the the cartesian product of documents of both indices. +This operation is similar to the inner join without the `ON` clause that specifies the join condition. + +It's risky to perform cross join on two indices of large or even medium size. It might trigger a circuit breaker that terminates the query to avoid running out of memory. +{: .warning } + +SQL query: + +```sql +SELECT + a.account_number, a.firstname, a.lastname, + e.id, e.name +FROM accounts a +JOIN employees_nested e +``` + +Result set: + +| a.account_number | a.firstname | a.lastname | e.id | e.name +:--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- +1 | Amber | Duke | 3 | Bob Smith +1 | Amber | Duke | 4 | Susan Smith +1 | Amber | Duke | 6 | Jane Smith +6 | Hattie | Bond | 3 | Bob Smith +6 | Hattie | Bond | 4 | Susan Smith +6 | Hattie | Bond | 6 | Jane Smith +13 | Nanette | Bates | 3 | Bob Smith +13 | Nanette | Bates | 4 | Susan Smith +13 | Nanette | Bates | 6 | Jane Smith +18 | Dale | Adams | 3 | Bob Smith +18 | Dale | Adams | 4 | Susan Smith +18 | Dale | Adams | 6 | Jane Smith + +### Example 3: Left outer join + +Use left outer join to retain rows from the first index if it does not satisfy the join predicate. The keyword `OUTER` is optional. + +SQL query: + +```sql +SELECT + a.account_number, a.firstname, a.lastname, + e.id, e.name +FROM accounts a +LEFT JOIN employees_nested e + ON a.account_number = e.id +``` + +Result set: + +| a.account_number | a.firstname | a.lastname | e.id | e.name +:--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- +1 | Amber | Duke | null | null +6 | Hattie | Bond | 6 | Jane Smith +13 | Nanette | Bates | null | null +18 | Dale | Adams | null | null + +## Subquery + +A subquery is a complete `SELECT` statement used within another statement and enclosed in parenthesis. +From the explain output, you can see that some subqueries are actually transformed to an equivalent join query to execute. + +### Example 1: Table subquery + +SQL query: + +```sql +SELECT a1.firstname, a1.lastname, a1.balance +FROM accounts a1 +WHERE a1.account_number IN ( + SELECT a2.account_number + FROM accounts a2 + WHERE a2.balance > 10000 +) +``` + +Explain: + +```json +{ + "Physical Plan" : { + "Project [ columns=[a1.balance, a1.firstname, a1.lastname] ]" : { + "Top [ count=200 ]" : { + "BlockHashJoin[ conditions=( a1.account_number = a2.account_number ), type=JOIN, blockSize=[FixedBlockSize with size=10000] ]" : { + "Scroll [ accounts as a2, pageSize=10000 ]" : { + "request" : { + "size" : 200, + "query" : { + "bool" : { + "filter" : [ + { + "bool" : { + "adjust_pure_negative" : true, + "must" : [ + { + "bool" : { + "adjust_pure_negative" : true, + "must" : [ + { + "bool" : { + "adjust_pure_negative" : true, + "must_not" : [ + { + "bool" : { + "adjust_pure_negative" : true, + "must_not" : [ + { + "exists" : { + "field" : "account_number", + "boost" : 1 + } + } + ], + "boost" : 1 + } + } + ], + "boost" : 1 + } + }, + { + "range" : { + "balance" : { + "include_lower" : false, + "include_upper" : true, + "from" : 10000, + "boost" : 1, + "to" : null + } + } + } + ], + "boost" : 1 + } + } + ], + "boost" : 1 + } + } + ], + "adjust_pure_negative" : true, + "boost" : 1 + } + }, + "from" : 0 + } + }, + "Scroll [ accounts as a1, pageSize=10000 ]" : { + "request" : { + "size" : 200, + "from" : 0, + "_source" : { + "excludes" : [ ], + "includes" : [ + "firstname", + "lastname", + "balance", + "account_number" + ] + } + } + }, + "useTermsFilterOptimization" : false + } + } + } + }, + "description" : "Hash Join algorithm builds hash table based on result of first query, and then probes hash table to find matched rows for each row returned by second query", + "Logical Plan" : { + "Project [ columns=[a1.balance, a1.firstname, a1.lastname] ]" : { + "Top [ count=200 ]" : { + "Join [ conditions=( a1.account_number = a2.account_number ) type=JOIN ]" : { + "Group" : [ + { + "Project [ columns=[a1.balance, a1.firstname, a1.lastname, a1.account_number] ]" : { + "TableScan" : { + "tableAlias" : "a1", + "tableName" : "accounts" + } + } + }, + { + "Project [ columns=[a2.account_number] ]" : { + "Filter [ conditions=[AND ( AND account_number ISN null, AND balance GT 10000 ) ] ]" : { + "TableScan" : { + "tableAlias" : "a2", + "tableName" : "accounts" + } + } + } + } + ] + } + } + } + } +} +``` + +Result set: + +| a1.firstname | a1.lastname | a1.balance +:--- | :--- | :--- | :--- | :--- | :--- +Amber | Duke | 39225 +Nanette | Bates | 32838 + +### Example 2: From subquery + +SQL query: + +```sql +SELECT a.f, a.l, a.a +FROM ( + SELECT firstname AS f, lastname AS l, age AS a + FROM accounts + WHERE age > 30 +) AS a +``` + +Explain: + +```json +{ + "from" : 0, + "size" : 200, + "query" : { + "bool" : { + "filter" : [ + { + "bool" : { + "must" : [ + { + "range" : { + "age" : { + "from" : 30, + "to" : null, + "include_lower" : false, + "include_upper" : true, + "boost" : 1.0 + } + } + } + ], + "adjust_pure_negative" : true, + "boost" : 1.0 + } + } + ], + "adjust_pure_negative" : true, + "boost" : 1.0 + } + }, + "_source" : { + "includes" : [ + "firstname", + "lastname", + "age" + ], + "excludes" : [ ] + } +} +``` + +Result set: + +| f | l | a +:--- | :--- | :--- +Amber | Duke | 32 +Dale | Adams | 33 +Hattie | Bond | 36 diff --git a/docs/sql/delete.md b/docs/sql/delete.md new file mode 100644 index 00000000..32b38333 --- /dev/null +++ b/docs/sql/delete.md @@ -0,0 +1,78 @@ +--- +layout: default +title: Delete +parent: SQL +nav_order: 11 +--- + + +# Delete + +The `DELETE` statement deletes documents that satisfy the predicates in the `WHERE` clause. +If you don't specify the `WHERE` clause, all documents are deleted. + +### Syntax + +Rule `singleDeleteStatement`: + +![singleDeleteStatement](../../images/singleDeleteStatement.png) + +### Example + +SQL query: + +```sql +DELETE FROM accounts +WHERE age > 30 +``` + +Explain: + +```json +{ + "size" : 1000, + "query" : { + "bool" : { + "must" : [ + { + "range" : { + "age" : { + "from" : 30, + "to" : null, + "include_lower" : false, + "include_upper" : true, + "boost" : 1.0 + } + } + } + ], + "adjust_pure_negative" : true, + "boost" : 1.0 + } + }, + "_source" : false +} +``` + +Result set: + +```json +{ + "schema" : [ + { + "name" : "deleted_rows", + "type" : "long" + } + ], + "total" : 1, + "datarows" : [ + [ + 3 + ] + ], + "size" : 1, + "status" : 200 +} +``` + +The `datarows` field shows the number of documents deleted. diff --git a/docs/sql/endpoints.md b/docs/sql/endpoints.md index 1cb1e79a..27613acf 100644 --- a/docs/sql/endpoints.md +++ b/docs/sql/endpoints.md @@ -1,23 +1,12 @@ --- layout: default -title: Endpoints +title: Endpoint parent: SQL -nav_order: 4 +nav_order: 12 --- -# Endpoints - ---- - -#### Table of contents -- TOC -{:toc} - - ---- - -## Introduction +# Endpoint To send query request to SQL plugin, you can either use a request parameter in HTTP GET or request body by HTTP POST request. POST request @@ -113,3 +102,123 @@ Explain: } } ``` + + +## Cursor + +### Description + +To get back a paginated response, use the `fetch_size` parameter. The value of `fetch_size` should be greater than 0. The default value is 1,000. A value of 0 will fallback to a non-paginated response. + +The `fetch_size` parameter is only supported for the JDBC response format. +{: .note } + + +### Example + +SQL query: + +```console +>> curl -H 'Content-Type: application/json' -X POST localhost:9200/_opendistro/_sql -d '{ + "fetch_size" : 5, + "query" : "SELECT firstname, lastname FROM accounts WHERE age > 20 ORDER BY state ASC" +}' +``` + +Result set: + +```json +{ + "schema": [ + { + "name": "firstname", + "type": "text" + }, + { + "name": "lastname", + "type": "text" + } + ], + "cursor": "d:eyJhIjp7fSwicyI6IkRYRjFaWEo1UVc1a1JtVjBZMmdCQUFBQUFBQUFBQU1XZWpkdFRFRkZUMlpTZEZkeFdsWnJkRlZoYnpaeVVRPT0iLCJjIjpbeyJuYW1lIjoiZmlyc3RuYW1lIiwidHlwZSI6InRleHQifSx7Im5hbWUiOiJsYXN0bmFtZSIsInR5cGUiOiJ0ZXh0In1dLCJmIjo1LCJpIjoiYWNjb3VudHMiLCJsIjo5NTF9", + "total": 956, + "datarows": [ + [ + "Cherry", + "Carey" + ], + [ + "Lindsey", + "Hawkins" + ], + [ + "Sargent", + "Powers" + ], + [ + "Campos", + "Olsen" + ], + [ + "Savannah", + "Kirby" + ] + ], + "size": 5, + "status": 200 +} +``` + +To fetch subsequent pages, use the `cursor` from last response: + +```console +>> curl -H 'Content-Type: application/json' -X POST localhost:9200/_opendistro/_sql -d '{ + "cursor": "d:eyJhIjp7fSwicyI6IkRYRjFaWEo1UVc1a1JtVjBZMmdCQUFBQUFBQUFBQU1XZWpkdFRFRkZUMlpTZEZkeFdsWnJkRlZoYnpaeVVRPT0iLCJjIjpbeyJuYW1lIjoiZmlyc3RuYW1lIiwidHlwZSI6InRleHQifSx7Im5hbWUiOiJsYXN0bmFtZSIsInR5cGUiOiJ0ZXh0In1dLCJmIjo1LCJpIjoiYWNjb3VudHMiLCJsIjo5NTF9" +}' +``` + +The result only has the `fetch_size` number of `datarows` and `cursor`. +The last page has only `datarows` and no `cursor`. +The `datarows` can have more than the `fetch_size` number of records in case the nested fields are flattened. + +```json +{ + "cursor": "d:eyJhIjp7fSwicyI6IkRYRjFaWEo1UVc1a1JtVjBZMmdCQUFBQUFBQUFBQU1XZWpkdFRFRkZUMlpTZEZkeFdsWnJkRlZoYnpaeVVRPT0iLCJjIjpbeyJuYW1lIjoiZmlyc3RuYW1lIiwidHlwZSI6InRleHQifSx7Im5hbWUiOiJsYXN0bmFtZSIsInR5cGUiOiJ0ZXh0In1dLCJmIjo1LCJpIjoiYWNjb3VudHMabcde12345", + "datarows": [ + [ + "Abbas", + "Hussain" + ], + [ + "Chen", + "Dai" + ], + [ + "Anirudha", + "Jadhav" + ], + [ + "Peng", + "Huo" + ], + [ + "John", + "Doe" + ] + ] +} +``` + +The `cursor` context is automatically cleared on the last page. +To explicitly clear cursor context, use the `_opendistro/_sql/close endpoint` operation. + +```console +>> curl -H 'Content-Type: application/json' -X POST localhost:9200/_opendistro/_sql/close -d '{ + "cursor": "d:eyJhIjp7fSwicyI6IkRYRjFaWEo1UVc1a1JtVjBZMmdCQUFBQUFBQUFBQU1XZWpkdFRFRkZUMlpTZEZkeFdsWnJkRlZoYnpaeVVRPT0iLCJjIjpbeyJuYW1lIjoiZmlyc3RuYW1lIiwidHlwZSI6InRleHQifSx7Im5hbWUiOiJsYXN0bmFtZSIsInR5cGUiOiJ0ZXh0In1dLCJmIjo1LCJpIjoiYWNjb3VudHMiLCJsIjo5NTF9" +}' +``` + +#### Sample response + +```json +{"succeeded":true} +``` diff --git a/docs/sql/functions.md b/docs/sql/functions.md new file mode 100644 index 00000000..7fe1d636 --- /dev/null +++ b/docs/sql/functions.md @@ -0,0 +1,81 @@ +--- +layout: default +title: Functions +parent: SQL +nav_order: 10 +--- + +# Functions + +You must enable fielddata in the document mapping for most string functions to work properly. + +The specification shows the return type of the function with a generic type `T` as the argument. +For example, `abs(number T) -> T` means that the function `abs` accepts a numerical argument of type `T`, which could be any sub-type of the `number` type, and it returns the actual type of `T` as the return type. + +Function | Specification | Example +:--- | :--- | :--- +abs | `abs(number T) -> T` | `SELECT abs(0.5) FROM my-index LIMIT 1` +acos | `acos(number T) -> double` | `SELECT acos(0.5) FROM my-index LIMIT 1` +add | `add(number T, number) -> T` | `SELECT add(1, 5) FROM my-index LIMIT 1` +ascii | `ascii(string T) -> integer` | `SELECT ascii(name.keyword) FROM my-index LIMIT 1` +asin | `asin(number T) -> double` | `SELECT asin(0.5) FROM my-index LIMIT 1` +atan | `atan(number T) -> double` | `SELECT atan(0.5) FROM my-index LIMIT 1` +atan2 | `atan2(number T, number) -> double` | `SELECT atan2(1, 0.5) FROM my-index LIMIT 1` +cbrt | `cbrt(number T) -> T` | `SELECT cbrt(0.5) FROM my-index LIMIT 1` +ceil | `ceil(number T) -> T` | `SELECT ceil(0.5) FROM my-index LIMIT 1` +concat_ws | `concat_ws(separator, string, string…) -> string` | `SELECT concat_ws("-", "Tutorial", "is", "fun!") FROM my-index LIMIT 1` +cos | `cos(number T) -> double` | `SELECT cos(0.5) FROM my-index LIMIT 1` +cosh | `cosh(number T) -> double` | `SELECT cosh(0.5) FROM my-index LIMIT 1` +cot | `cot(number T) -> double` | `SELECT cot(0.5) FROM my-index LIMIT 1` +curdate | `curdate() -> date` | `SELECT curdate() FROM my-index LIMIT 1` +date | `date(date) -> date` | `SELECT date() FROM my-index LIMIT 1` +date_format | `date_format(date, string) -> string` or `date_format(date, string, string) -> string` | `SELECT date_format(date, 'Y') FROM my-index LIMIT 1` +dayofmonth | `dayofmonth(date) -> integer` | `SELECT dayofmonth(date) FROM my-index LIMIT 1` +degrees | `degrees(number T) -> double` | `SELECT degrees(0.5) FROM my-index LIMIT 1` +divide | `divide(number T, number) -> T` | `SELECT divide(1, 0.5) FROM my-index LIMIT 1` +e | `e() -> double` | `SELECT e() FROM my-index LIMIT 1` +exp | `exp(number T) -> T` | `SELECT exp(0.5) FROM my-index LIMIT 1` +expm1 | `expm1(number T) -> T` | `SELECT expm1(0.5) FROM my-index LIMIT 1` +floor | `floor(number T) -> T` | `SELECT floor(0.5) AS Rounded_Down FROM my-index LIMIT 1` +if | `if(boolean, es_type, es_type) -> es_type` | `SELECT if(false, 0, 1) FROM my-index LIMIT 1`, `SELECT if(true, 0, 1) FROM my-index LIMIT 1` +ifnull | `ifnull(es_type, es_type) -> es_type` | `SELECT ifnull('hello', 1) FROM my-index LIMIT 1`, `SELECT ifnull(null, 1) FROM my-index LIMIT 1` +isnull | `isnull(es_type) -> integer` | `SELECT isnull(null) FROM my-index LIMIT 1`, `SELECT isnull(1) FROM my-index LIMIT 1` +left | `left(string T, integer) -> T` | `SELECT left('hello', 2) FROM my-index LIMIT 1` +length | `length(string) -> integer` | `SELECT length('hello') FROM my-index LIMIT 1` +ln | `ln(number T) -> double` | `SELECT ln(10) FROM my-index LIMIT 1` +locate | `locate(string, string, integer) -> integer` or `locate(string, string) -> INTEGER` | `SELECT locate('o', 'hello') FROM my-index LIMIT 1`, `SELECT locate('l', 'hello', 3) FROM my-index LIMIT 1` +log | `log(number T) -> double` or `log(number T, number) -> double` | `SELECT log(10) FROM my-index LIMIT 1` +log2 | `log2(number T) -> double` | `SELECT log2(10) FROM my-index LIMIT 1` +log10 | `log10(number T) -> double` | `SELECT log10(10) FROM my-index LIMIT 1` +lower | `lower(string T) -> T` or `lower(string T, string) -> T` | `SELECT lower(name.keyword) FROM my-index LIMIT 1` +ltrim | `ltrim(string T) -> T` | `SELECT ltrim(name.keyword) FROM my-index` +maketime | `maketime(integer, integer, integer) -> date` | `SELECT maketime(1, 2, 3) FROM my-index LIMIT 1` +modulus | `modulus(number T, number) -> T` | `SELECT modulus(2, 3) FROM my-index LIMIT 1` +month | `month(date) -> integer` | `SELECT month(date) FROM my-index` +monthname | `monthname(date) -> string` | `SELECT monthname(date) FROM my-index` +multiply | `multiply(number T, number) -> number` | `SELECT multiply(2, 3) FROM my-index LIMIT 1` +now | `now() -> date` | `SELECT now() FROM my-index LIMIT 1` +pi | `pi() -> double` | `SELECT pi() FROM my-index LIMIT 1` +pow | `pow(number T) -> T` or `pow(number T, number) -> T` | `SELECT pow(2, 3) FROM my-index LIMIT 1` +power | `power(number T) -> T` or `power(number T, number) -> T` | `SELECT power(2, 3) FROM my-index LIMIT 1` +radians | `radians(number T) -> double` | `SELECT radians(0.5) FROM my-index LIMIT 1` +rand | `rand() -> number` or `rand(number T) -> T` | `SELECT rand(0.5) FROM my-index LIMIT 1` +replace | `replace(string T, string, string) -> T` | `SELECT replace('hello', 'l', 'x') FROM my-index LIMIT 1` +right | `right(string T, integer) -> T` | `SELECT right('hello', 1) FROM my-index LIMIT 1` +rint | `rint(number T) -> T` | `SELECT rint(1.5) FROM my-index LIMIT 1` +round | `round(number T) -> T` | `SELECT round(1.5) FROM my-index LIMIT 1` +rtrim | `rtrim(string T) -> T` | `SELECT rtrim(name.keyword) FROM my-index LIMIT 1` +sign | `sign(number T) -> T` | `SELECT sign(1.5) FROM my-index LIMIT 1` +signum | `signum(number T) -> T` | `SELECT signum(0.5) FROM my-index LIMIT 1` +sin | `sin(number T) -> double` | `SELECT sin(0.5) FROM my-index LIMIT 1` +sinh | `sinh(number T) -> double` | `SELECT sinh(0.5) FROM my-index LIMIT 1` +sqrt | `sqrt(number T) -> T` | `SELECT sqrt(0.5) FROM my-index LIMIT 1` +substring | `substring(string T, integer, integer) -> T` | `SELECT substring(name.keyword, 2,5) FROM my-index LIMIT 1` +subtract | `subtract(number T, number) -> T` | `SELECT subtract(3, 2) FROM my-index LIMIT 1` +tan | `tan(number T) -> double` | `SELECT tan(0.5) FROM my-index LIMIT 1` +timestamp | `timestamp(date) -> date` | `SELECT timestamp(date) FROM my-index LIMIT 1` +trim | `trim(string T) -> T` | `SELECT trim(name.keyword) FROM my-index LIMIT 1` +upper | `upper(string T) -> T` or `upper(string T, string) -> T` | `SELECT upper(name.keyword) FROM my-index LIMIT 1` +year | `year(date) -> integer` | `SELECT year(date) FROM my-index LIMIT 1` +/ | `number [op] number -> number` | `SELECT 1 / 100 FROM my-index LIMIT 1` +% | `number [op] number -> number` | `SELECT 1 % 100 FROM my-index LIMIT 1` diff --git a/docs/sql/index.md b/docs/sql/index.md index f9e4d579..913b0d6e 100644 --- a/docs/sql/index.md +++ b/docs/sql/index.md @@ -9,85 +9,76 @@ has_children: true Open Distro for Elasticsearch SQL lets you write queries in SQL rather than the [Elasticsearch query domain-specific language (DSL)](../elasticsearch/full-text). If you're already familiar with SQL and don't want to learn the query DSL, this feature is a great option. -To use the feature, send requests to the `_opendistro/_sql` URI. You can use a request parameter or the request body (recommended). +## Quick start -```sql -GET https://:/_opendistro/_sql?sql=select * from my-index limit 50 -``` +To get started with the SQL plugin, choose **SQL Workbench** in Kibana. -```json -POST https://:/_opendistro/_sql -{ - "query": "SELECT * FROM my-index LIMIT 50" -} -``` +![Kibana SQL UI plugin](../images/sql.png) -You can query multiple indices by listing them or using wildcards: +### Index data -```json -POST _opendistro/_sql -{ - "query": "SELECT * FROM my-index1,myindex2,myindex3 LIMIT 50" -} - -POST _opendistro/_sql -{ - "query": "SELECT * FROM my-index* LIMIT 50" -} -``` - -For a sample [curl](https://curl.haxx.se/) command, try: - -```bash -curl -XPOST https://localhost:9200/_opendistro/_sql -u admin:admin -k -d '{"query": "SELECT * FROM kibana_sample_data_flights LIMIT 10"}' -H 'Content-Type: application/json' -``` +The SQL plugin is for read-only purposes, so you cannot index or update data using SQL. -By default, queries return data in JDBC format, but you can also return data in standard Elasticsearch JSON, CSV, or raw formats: +Use the `bulk` operation to index some sample data: ```json -POST _opendistro/_sql?format=json|csv|raw -{ - "query": "SELECT * FROM my-index LIMIT 50" -} +PUT accounts/_bulk?refresh +{"index":{"_id":"1"}} +{"account_number":1,"balance":39225,"firstname":"Amber","lastname":"Duke","age":32,"gender":"M","address":"880 Holmes Lane","employer":"Pyrami","email":"amberduke@pyrami.com","city":"Brogan","state":"IL"} +{"index":{"_id":"6"}} +{"account_number":6,"balance":5686,"firstname":"Hattie","lastname":"Bond","age":36,"gender":"M","address":"671 Bristol Street","employer":"Netagy","email":"hattiebond@netagy.com","city":"Dante","state":"TN"} +{"index":{"_id":"13"}} +{"account_number":13,"balance":32838,"firstname":"Nanette","lastname":"Bates","age":28,"gender":"F","address":"789 Madison Street","employer":"Quility","email":"nanettebates@quility.com","city":"Nogal","state":"VA"} +{"index":{"_id":"18"}} +{"account_number":18,"balance":4180,"firstname":"Dale","lastname":"Adams","age":33,"gender":"M","address":"467 Hutchinson Court","email":"daleadams@boink.com","city":"Orick","state":"MD"} ``` -When you return data in CSV or raw format, each row corresponds to a *document*, and each column corresponds to a *field*. Conceptually, you might find it useful to think of each Elasticsearch index as a database table. +Here’s how core SQL concepts map to Elasticsearch: +| SQL | Elasticsearch | Example +:--- | :--- | :--- +Table | Index | `accounts` +Row | Document | `1` +Column | Field | `account_number` -## User interfaces +To list all your indices: -You can test queries using **Dev Tools** in Kibana (`https://:5601`). +```sql +SHOW TABLES LIKE % +``` +| id | TABLE_NAME +:--- | :--- +0 | accounts -## Troubleshoot queries +### Read data -The most common error is the dreaded null pointer exception, which can occur during parsing errors or when using the wrong HTTP method (POST vs. GET and vice versa). The POST method and HTTP request body offer the most consistent results: +After you index a document, retrieve it using the following SQL expression: -```json -POST _opendistro/_sql -{ - "query": "SELECT * FROM my-index WHERE ['name.firstname']='saanvi' LIMIT 5" -} +```sql +SELECT * +FROM accounts +WHERE _id = 1 ``` -If a query isn't behaving the way you expect, use the `_explain` API to see the translated query, which you can then troubleshoot. For most operations, `_explain` returns Elasticsearch query DSL. For `UNION`, `MINUS`, and `JOIN`, it returns something more akin to a SQL execution plan. +| id | account_number | firstname | gender | city | balance | employer | state | email | address | lastname | age +:--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- +0 | 1 | Amber | M | Brogan | 39225 | Pyrami | IL | amberduke@pyrami.com | 880 Holmes Lane | Duke | 32 +### Delete data -#### Sample request +To delete a document from an index, use the `DELETE` clause: -```json -POST _opendistro/_sql/_explain -{ - "query": "SELECT * FROM my-index LIMIT 50" -} +```sql +DELETE +FROM accounts +WHERE _id = 0 ``` +| id | deleted_rows +:--- | :--- +0 | 1 -#### Sample response +## Contributing -```json -{ - "from": 0, - "size": 50 -} -``` +To get involved and help us improve the SQL plugin, see the [development guide](https://github.com/opendistro-for-elasticsearch/sql/blob/master/docs/developing.rst) for help setting up your development project. diff --git a/docs/sql/jdbc.md b/docs/sql/jdbc.md index 400e1abb..d690b68d 100644 --- a/docs/sql/jdbc.md +++ b/docs/sql/jdbc.md @@ -2,7 +2,7 @@ layout: default title: JDBC Driver parent: SQL -nav_order: 2 +nav_order: 3 --- # JDBC driver diff --git a/docs/sql/joins.md b/docs/sql/joins.md deleted file mode 100644 index 58abff3b..00000000 --- a/docs/sql/joins.md +++ /dev/null @@ -1,28 +0,0 @@ ---- -layout: default -title: Joins -parent: SQL -nav_order: 3 ---- - -# Joins - -Open Distro for Elasticsearch SQL supports inner joins, left outer joins, and cross joins. Joins have a number of constraints: - -1. You can only join two indices. -1. You must use aliases for indices (e.g. `people p`). -1. Within an ON clause, you can only use AND conditions. -1. In a WHERE statement, don't combine trees that contain multiple indices. For example, the following statement works: - - ``` - WHERE (a.type1 > 3 OR a.type1 < 0) AND (b.type2 > 4 OR b.type2 < -1) - ``` - - The following statement does not: - - ``` - WHERE (a.type1 > 3 OR b.type2 < 0) AND (a.type1 > 4 OR b.type2 < -1) - ``` - -1. You can't use GROUP BY or ORDER BY for results. -1. LIMIT with OFFSET (e.g. `LIMIT 25 OFFSET 25`) is not supported. diff --git a/docs/sql/limitation.md b/docs/sql/limitation.md new file mode 100644 index 00000000..ca112921 --- /dev/null +++ b/docs/sql/limitation.md @@ -0,0 +1,119 @@ +--- +layout: default +title: Limitations +parent: SQL +nav_order: 17 +--- + +# Limitations + +The SQL plugin has the following limitations: + +## SELECT FROM WHERE + +### Select literal is not supported + +The select literal expression is not supported. For example, `Select 1` is not supported. +Here's a link to the Github issue - [Issue #256](https://github.com/opendistro-for-elasticsearch/sql/issues/256). + +### Where clause does not support arithmetic operations + +The `WHERE` clause does not support expressions. For example, `SELECT FlightNum FROM kibana_sample_data_flights where (AvgTicketPrice + 100) <= 1000` is not supported. +Here's a link to the Github issue - [Issue #234](https://github.com/opendistro-for-elasticsearch/sql/issues/234). + +### Aggregation over expression is not supported + +You can only apply aggregation on fields, aggregations can't accept an expression as a parameter. For example, `avg(log(age))` is not supported. +Here's a link to the Github issue - [Issue #288](https://github.com/opendistro-for-elasticsearch/sql/issues/288). + +### Conflict type in multiple index query + +Queries using wildcard index fail if the index has the field with a conflict type. +For example, if you have two indices with field `a`: + +``` +POST conflict_index_1/_doc/ +{ + "a": { + "b": 1 + } +} + +POST conflict_index_2/_doc/ +{ + "a": { + "b": 1, + "c": 2 + } +} +``` + +Then, the query fails because of the field mapping conflict. The query `SELECT * FROM conflict_index*` also fails for the same reason. + +```sql +Error occurred in Elasticsearch engine: Different mappings are not allowed for the same field[a]: found [{properties:{b:{type:long},c:{type:long}}}] and [{properties:{b:{type:long}}}] ", + "details": "com.amazon.opendistroforelasticsearch.sql.rewriter.matchtoterm.VerificationException: Different mappings are not allowed for the same field[a]: found [{properties:{b:{type:long},c:{type:long}}}] and [{properties:{b:{type:long}}}] \nFor more details, please send request for Json format to see the raw response from elasticsearch engine.", + "type": "VerificationException +``` + +Here's a link to the Github issue - [Issue #445](https://github.com/opendistro-for-elasticsearch/sql/issues/445). + +## Subquery in the FROM clause + +Subquery in the `FROM` clause in this format: `SELECT outer FROM (SELECT inner)` is supported only when the query is merged into one query. For example, the following query is supported: + +```sql +SELECT t.f, t.d +FROM ( + SELECT FlightNum as f, DestCountry as d + FROM kibana_sample_data_flights + WHERE OriginCountry = 'US') t +``` + +But, if the outer query has `GROUP BY` or `ORDER BY`, then it's not supported. + +## JOIN does not support aggregations on the joined result + +The `join` query does not support aggregations on the joined result. +For example, e.g. `SELECT depo.name, avg(empo.age) FROM empo JOIN depo WHERE empo.id == depo.id GROUP BY depo.name` is not supported. +Here's a link to the Github issue - [Issue 110](https://github.com/opendistro-for-elasticsearch/sql/issues/110). + +## Pagination only supports basic queries + +The pagination query enables you to get back paginated responses. +Currently, the pagination only supports basic queries. For example, the following query returns the data with cursor id. + +```json +POST _opendistro/_sql/ +{ + "fetch_size" : 5, + "query" : "SELECT OriginCountry, DestCountry FROM kibana_sample_data_flights ORDER BY OriginCountry ASC" +} +``` + +The response in JDBC format with cursor id. + +```json +{ + "schema": [ + { + "name": "OriginCountry", + "type": "keyword" + }, + { + "name": "DestCountry", + "type": "keyword" + } + ], + "cursor": "d:eyJhIjp7fSwicyI6IkRYRjFaWEo1UVc1a1JtVjBZMmdCQUFBQUFBQUFCSllXVTJKVU4yeExiWEJSUkhsNFVrdDVXVEZSYkVKSmR3PT0iLCJjIjpbeyJuYW1lIjoiT3JpZ2luQ291bnRyeSIsInR5cGUiOiJrZXl3b3JkIn0seyJuYW1lIjoiRGVzdENvdW50cnkiLCJ0eXBlIjoia2V5d29yZCJ9XSwiZiI6MSwiaSI6ImtpYmFuYV9zYW1wbGVfZGF0YV9mbGlnaHRzIiwibCI6MTMwNTh9", + "total": 13059, + "datarows": [[ + "AE", + "CN" + ]], + "size": 1, + "status": 200 +} +``` + +The query with `aggregation` and `join` does not support pagination for now. diff --git a/docs/sql/metadata.md b/docs/sql/metadata.md new file mode 100644 index 00000000..8a67c367 --- /dev/null +++ b/docs/sql/metadata.md @@ -0,0 +1,70 @@ +--- +layout: default +title: Metadata Queries +parent: SQL +nav_order: 9 +--- + +# Metadata queries + +To see basic metadata about your indices, use the `SHOW` and `DESCRIBE` commands. + +### Syntax + +Rule `showStatement`: + +![showStatement](../../images/showStatement.png) + +Rule `showFilter`: + +![showFilter](../../images/showFilter.png) + +### Example 1: See metadata for indices + +To see metadata for indices that match a specific pattern, use the `SHOW` command. +Use the wildcard `%` to match all indices: + +```sql +SHOW TABLES LIKE % +``` + +| TABLE_CAT | TABLE_SCHEM | TABLE_NAME | TABLE_TYPE | REMARKS | TYPE_CAT | TYPE_SCHEM | TYPE_NAME | SELF_REFERENCING_COL_NAME | REF_GENERATION +:--- | :--- +docker-cluster | null | accounts | BASE TABLE | null | null | null | null | null | null +docker-cluster | null | employees_nested | BASE TABLE | null | null | null | null | null | null + + +### Example 2: See metadata for a specific index + +To see metadata for an index name with a prefix of `acc`: + +```sql +SHOW TABLES LIKE acc% +``` + +| TABLE_CAT | TABLE_SCHEM | TABLE_NAME | TABLE_TYPE | REMARKS | TYPE_CAT | TYPE_SCHEM | TYPE_NAME | SELF_REFERENCING_COL_NAME | REF_GENERATION +:--- | :--- +docker-cluster | null | accounts | BASE TABLE | null | null | null | null | null | null + + +### Example 3: See metadata for fields + +To see metadata for field names that match a specific pattern, use the `DESCRIBE` command: + +```sql +DESCRIBE TABLES LIKE accounts +``` + +| TABLE_CAT | TABLE_SCHEM | TABLE_NAME | COLUMN_NAME | DATA_TYPE | TYPE_NAME | COLUMN_SIZE | BUFFER_LENGTH | DECIMAL_DIGITS | NUM_PREC_RADIX | NULLABLE | REMARKS | COLUMN_DEF | SQL_DATA_TYPE | SQL_DATETIME_SUB | CHAR_OCTET_LENGTH | ORDINAL_POSITION | IS_NULLABLE | SCOPE_CATALOG | SCOPE_SCHEMA | SCOPE_TABLE | SOURCE_DATA_TYPE | IS_AUTOINCREMENT | IS_GENERATEDCOLUMN +:--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- +docker-cluster | null | accounts | account_number | null | long | null | null | null | 10 | 2 | null | null | null | null | null | 1 | | null | null | null | null | NO | +docker-cluster | null | accounts | firstname | null | text | null | null | null | 10 | 2 | null | null | null | null | null | 2 | | null | null | null | null | NO | +docker-cluster | null | accounts | address | null | text | null | null | null | 10 | 2 | null | null | null | null | null | 3 | | null | null | null | null | NO | +docker-cluster | null | accounts | balance | null | long | null | null | null | 10 | 2 | null | null | null | null | null | 4 | | null | null | null | null | NO | +docker-cluster | null | accounts | gender | null | text | null | null | null | 10 | 2 | null | null | null | null | null | 5 | | null | null | null | null | NO | +docker-cluster | null | accounts | city | null | text | null | null | null | 10 | 2 | null | null | null | null | null | 6 | | null | null | null | null | NO | +docker-cluster | null | accounts | employer | null | text | null | null | null | 10 | 2 | null | null | null | null | null | 7 | | null | null | null | null | NO | +docker-cluster | null | accounts | state | null | text | null | null | null | 10 | 2 | null | null | null | null | null | 8 | | null | null | null | null | NO | +docker-cluster | null | accounts | age | null | long | null | null | null | 10 | 2 | null | null | null | null | null | 9 | | null | null | null | null | NO | +docker-cluster | null | accounts | email | null | text | null | null | null | 10 | 2 | null | null | null | null | null | 10 | | null | null | null | null | NO | +docker-cluster | null | accounts | lastname | null | text | null | null | null | 10 | 2 | null | null | null | null | null | 11 | | null | null | null | null | NO | diff --git a/docs/sql/monitoring.md b/docs/sql/monitoring.md index 5498fbde..d6b80923 100644 --- a/docs/sql/monitoring.md +++ b/docs/sql/monitoring.md @@ -1,22 +1,11 @@ --- layout: default -title: Plugin Monitoring +title: Monitoring parent: SQL -nav_order: 7 +nav_order: 14 --- -# Plugin Monitoring - ---- - -#### Table of contents -- TOC -{:toc} - - ---- - -## Introduction +# Monitoring By a stats endpoint, you are able to collect metrics for the plugin within the interval. Note that only node level statistics collecting is diff --git a/docs/sql/odbc.md b/docs/sql/odbc.md new file mode 100644 index 00000000..c5f567ff --- /dev/null +++ b/docs/sql/odbc.md @@ -0,0 +1,257 @@ +--- +layout: default +title: ODBC Driver +parent: SQL +nav_order: 4 +--- + +# ODBC driver + +The Open Database Connectivity (ODBC) driver is a read-only ODBC driver for Windows and MacOS that lets you connect business intelligence (BI) applications to the SQL plugin. + +## Specifications + +The ODBC driver is compatible with ODBC version 3.51. + +## Supported OS versions + +The following operating systems are supported: + +| Operating System | Version +:--- | :--- +Windows | Windows 10 +MacOS | Catalina 10.15.4 and Mojave 10.14.6 + +## Concepts + +| Term | Definition +:--- | :--- +| **DSN** | A DSN (Data Source Name) is used to store driver information in the system. By storing the information in the system, the information does not need to be specified each time the driver connects. +| **.tdc** file | The TDC file contains configuration information that Tableau applies to any connection matching the database vendor name and driver name defined in the file. This configuration allows you to fine-tune parts of your ODBC data connection and turn on/off certain features not supported by the data source. + +## Install driver + +To install the driver, download the bundled distribution installer from [here](https://opendistro.github.io/for-elasticsearch/downloads.html) or by build from the source. + + +### Windows + +1. Open the downloaded `ODFE SQL ODBC Driver--Windows.msi` installer. +- In the case installer is unsigned it shows the following pop up. Choose **More info** and **Run anyway**. +Choose **Next** to proceed with the installation. + +

+ + +

+ +

+ +

+ +2. Accept the agreement and choose **Next**. + +3. The installer comes bundled with documentation and useful resources files to connect with various BI tools, for example `.tdc` file for Tableau. You can choose to keep the documentation and resources, or remove it. You can also choose the download location. Choose **Next**. + +

+ +

+ +4. Choose **Install**, **Finish**. + +

+ +

+ +5. The **DSN** comes already set up with the installation. +- The following connection information is set up as part of the default DSN: + +``` +Host: localhost +Port: 9200 +Auth: NONE +``` + +To customize the DSN, use **ODBC Data Source Administrator** which is pre-installed on Windows10. + +

+ +

+ + +### MacOS + +1. Open the downloaded `ODFE SQL ODBC Driver--Darwin.pkg` installer. +- In case the installer is unsigned, it shows the following pop up. Right click on the installer and select **Open**. +Choose **Continue** to proceed with the installation. + +

+ + +

+ + +

+ +

+ +2. Choose **Continue** to move past the **Introduction** and **Readme** . + +3. Accept the agreement and choose **Continue**. + +

+ +

+ + +4. Choose the **Destination** to install the driver files. + +5. The installer comes bundled with documentation and useful resources files to connect with various BI tools, for example `.tdc` file for Tableau. You can choose to keep the documentation and resources, or remove it. Choose **Customize** to choose the needed files. Choose **Continue**. + +

+ +

+ +6. Choose **Install**, **Close**. + +

+ +

+ +7. Currently, the **DSN** is not set up as part of the installation and needs to be configured manually. + +Make sure to install `iODBC Driver Manager` before installing the ODBC Driver on Mac. + +- Open `iODBC Administrator`: + +``` +sudo /Applications/iODBC/iODBC\ Administrator64.app/Contents/MacOS/iODBC\ Administrator64 +``` + +This gives the application permissions to save the driver and DSN configurations. + +- Choose **ODBC Drivers** tab. +- Choose **Add a Driver** and fill in the following details: + - **Description of the Driver**: Enter the driver name that you used for the ODBC connection (for example, ODFE SQL ODBC Driver). + - **Driver File Name**: Enter the path to the driver file (default: `/bin/libodfesqlodbc.dylib`). + - **Setup File Name**: Enter the path to the setup file (default: `/bin/libodfesqlodbc.dylib`). + - Choose the user driver. + - Choose **OK** to save the options. +- Choose the **User DSN** tab. +- Select **Add**. + - Choose the driver that you added above. + - For **Data Source Name (DSN)**, enter the name of the DSN used to store connection options (for example, ODFE SQL ODBC DSN). + - For **Comment**, add an optional comment. + - Add key-value pairs by using the `+` button. We recommend the following options for a default local Elasticsearch installation: + - **Host**: `localhost` - Elasticsearch server endpoint + - **Port**: `9200` - The server port + - **Auth**: `NONE` - The authentication mode + - **Username**: `(blank)` - The username used for BASIC auth + - **Password**: `(blank)`- The password used for BASIC auth + - **ResponseTimeout**: `10` - The number of seconds to wait for a response from the server + - **UseSSL**: `0` - Do not use SSL for connections + - Choose **OK** to save the DSN configuration. +- Choose **OK** to exit the iODBC Administrator. + +## Customizing the ODBC driver + +The driver is in the form of a library file: `odfesqlodbc.dll` for Windows and `libodfesqlodbc.dylib` for MacOS. + +If you're using with ODBC compatible BI tools, refer to your BI tool documentation for configuring a new ODBC driver. +Typically, all that's required is to make the BI tool aware of the location of the driver library file and then use it to set up the database (i.e., Elasticsearch) connection. + +### Connection strings and other settings + +The ODBC driver uses an ODBC connection string. +The connection strings are semicolon-delimited strings that specify the set of options that you can use for a connection. +Typically, a connection string will either: + - Specify a Data Source Name (DSN) that contains a pre-configured set of options (`DSN=xxx;User=xxx;Password=xxx;`). + - Or, configure options explicitly using the string (`Host=xxx;Port=xxx;LogLevel=ES_DEBUG;...`). + +You can configure the following driver options using a DSN or connection string: + +All option names are case-insensitive. +{: .note } + +#### Basic options + +| Option | Description | Type | Default +:--- | :--- +`DSN` | Data source name that you used for configuring the connection. | `string` | - +`Host / Server` | Hostname or IP address for the target cluster. | `string` | - +`Port` | Port number on which the Elasticsearch cluster's REST interface is listening. | `string` | - + +#### Authentication Options + +| Option | Description | Type | Default +:--- | :--- +`Auth` | Authentication mechanism to use. | `BASIC` (basic HTTP), `AWS_SIGV4` (AWS auth), or `NONE` | `NONE` +`User / UID` | [`Auth=BASIC`] Username for the connection. | `string` | - +`Password / PWD` | [`Auth=BASIC`] Password for the connection. | `string` | - +`Region` | [`Auth=AWS_SIGV4`] Region used for signing requests. | `AWS region (for example, us-west-1)` | - + +#### Advanced options + +| Option | Description | Type | Default +:--- | :--- +`UseSSL` | Whether to establish the connection over SSL/TLS. | `boolean (0 or 1)` | `false (0)` +`HostnameVerification` | Indicates whether certificate hostname verification should be performed for an SSL/TLS connection. | `boolean` (0 or 1) | `true (1)` +`ResponseTimeout` | The maximum time to wait for responses from the host, in seconds. | `integer` | `10` + +#### Logging options + +| Option | Description | Type | Default +:--- | :--- +`LogLevel` | Severity level for driver logs. | one of `ES_OFF`, `ES_FATAL`, `ES_ERROR`, `ES_INFO`, `ES_DEBUG`, `ES_TRACE`, `ES_ALL` | `ES_WARNING` +`LogOutput` | Location for storing driver logs. | `string` | `WIN: C:\`, `MAC: /tmp` + +You need administrative privileges to change the logging options. +{: .note } + + +## Connecting to Tableau + +Pre-requisites: + +- Make sure the DSN is already set up. +- Make sure Elasticsearch is running on _host_ and _port_ as configured in DSN. +- Make sure the `.tdc` is copied to `/Documents/My Tableau Repository/Datasources` in both MacOS and Windows. + +1. Start Tableau. Under the **Connect** section, go to **To a Server** and choose **Other Databases (ODBC)**. + +

+ +

+ +2. In the **DSN drop-down**, select the Elasticsearch DSN you set up in the previous set of steps. The options you added will be automatically filled into the **Connection Attributes**. + +

+ +

+ +3. Select **Sign In**. After a few seconds, Tableau connects to your Elasticsearch server. Once connected, you will directed to **Datasource** window. The **Database** will be already populated with name of the Elasticsearch cluster. +To list all the indices, click the search icon under **Table**. + +

+ +

+ +4. Start playing with data by dragging table to connection area. Choose **Update Now** or **Automatically Update** to populate table data. + +

+ +

+ +### Troubleshooting + +**Problem:** Unable to connect to server. A error window after signing in as below. + +

+ +

+ +**Workaround:** + +This is most likely due to Elasticsearch server not running on **host** and **post** configured in DSN. +Confirm **host** and **post** are correct and Elasticsearch server is running with ODFE SQL plugin. +Also make sure `.tdc` that was downloaded with the installer is copied correctly to `/Documents/My Tableau Repository/Datasources` directory. diff --git a/docs/sql/operations.md b/docs/sql/operations.md deleted file mode 100644 index 745b2620..00000000 --- a/docs/sql/operations.md +++ /dev/null @@ -1,110 +0,0 @@ ---- -layout: default -title: Supported Operations -parent: SQL -nav_order: 1 ---- - -# Supported operations - -Open Distro for Elasticsearch supports the following SQL operations. - - ---- - -#### Table of contents -- TOC -{:toc} - - ---- - -## Statements - -Statement | Example -:--- | :--- -Select | `SELECT * FROM my-index` -Delete | `DELETE FROM my-index WHERE _id=1` -Where | `SELECT * FROM my-index WHERE ['field']='value'` -Order by | `SELECT * FROM my-index ORDER BY _id asc` -Group by | `SELECT * FROM my-index GROUP BY range(age, 20,30,39)` -Limit | `SELECT * FROM my-index LIMIT 50` (default is 200) -Union | `SELECT * FROM my-index1 UNION SELECT * FROM my-index2` -Minus | `SELECT * FROM my-index1 MINUS SELECT * FROM my-index2` - -Like any complex query, large UNION and MINUS statements can strain or even crash your cluster. -{: .warning } - - -## Conditions - -Condition | Example -:--- | :--- -Like | `SELECT * FROM my-index WHERE name LIKE 'j%'` -And | `SELECT * FROM my-index WHERE name LIKE 'j%' AND age > 21` -Or | `SELECT * FROM my-index WHERE name LIKE 'j%' OR age > 21` -Count distinct | `SELECT count(distinct age) FROM my-index` -In | `SELECT * FROM my-index WHERE name IN ('alejandro', 'carolina')` -Not | `SELECT * FROM my-index WHERE name NOT IN ('jane')` -Between | `SELECT * FROM my-index WHERE age BETWEEN 20 AND 30` -Aliases | `SELECT avg(age) AS Average_Age FROM my-index` -Date | `SELECT * FROM my-index WHERE birthday='1990-11-15'` -Null | `SELECT * FROM my-index WHERE name IS NULL` - - -## Aggregations - -Aggregation | Example -:--- | :--- -avg() | `SELECT avg(age) FROM my-index` -count() | `SELECT count(age) FROM my-index` -max() | `SELECT max(age) AS Highest_Age FROM my-index` -min() | `SELECT min(age) AS Lowest_Age FROM my-index` -sum() | `SELECT sum(age) AS Age_Sum FROM my-index` - - -## Include and exclude fields - -Pattern | Example -:--- | :--- -include() | `SELECT include('a*'), exclude('age') FROM my-index` -exclude() | `SELECT exclude('*name') FROM my-index` - - -## Functions - -You must enable fielddata in the document mapping for most string functions to work properly. - -Function | Example -:--- | :--- -floor | `SELECT floor(number) AS Rounded_Down FROM my-index` -trim | `SELECT trim(name) FROM my-index` -log | `SELECT log(number) FROM my-index` -log10 | `SELECT log10(number) FROM my-index` -substring | `SELECT substring(name, 2,5) FROM my-index` -round | `SELECT round(number) FROM my-index` -sqrt | `SELECT sqrt(number) FROM my-index` -concat_ws | `SELECT concat_ws(' ', age, height) AS combined FROM my-index` -/ | `SELECT number / 100 FROM my-index` -% | `SELECT number % 100 FROM my-index` -date_format | `SELECT date_format(date, 'Y') FROM my-index` - - -## Joins - -See [Joins](../joins) for constraints and limitations. - -Join | Example -:--- | :--- -Inner join | `SELECT p.firstname, p.lastname, p.gender, dogs.name FROM people p JOIN dogs d ON d.holdersName = p.firstname WHERE p.age > 12 AND d.age > 1` -Left outer join | `SELECT p.firstname, p.lastname, p.gender, dogs.name FROM people p LEFT JOIN dogs d ON d.holdersName = p.firstname` -Cross join | `SELECT p.firstname, p.lastname, p.gender, dogs.name FROM people p CROSS JOIN dogs d` - - -## Show - -Show commands, well, show you indices and mappings that match an index pattern. You can use `*` or `%` for wildcards. - -Show | Example -:--- | :--- -Show tables like | `SHOW TABLES LIKE logs-*` diff --git a/docs/sql/partiql.md b/docs/sql/partiql.md new file mode 100644 index 00000000..c7ca96a9 --- /dev/null +++ b/docs/sql/partiql.md @@ -0,0 +1,215 @@ +--- +layout: default +title: JSON Support +parent: SQL +nav_order: 7 +--- + +# JSON Support + +SQL plugin supports JSON by following [PartiQL](https://partiql.org/) specification, a SQL-compatible query language that lets you query semi-structured and nested data for any data format. The SQL plugin only supports a subset of the PartiQL specification. + +## Querying nested collection + +PartiQL extends SQL to allow you to query and unnest nested collections. In Elasticsearch, this is very useful to query a JSON index with nested objects or fields. + +To follow along, use the `bulk` operation to index some sample data: + +```json +POST employees_nested/_bulk?refresh +{"index":{"_id":"1"}} +{"id":3,"name":"Bob Smith","title":null,"projects":[{"name":"SQL Spectrum querying","started_year":1990},{"name":"SQL security","started_year":1999},{"name":"Elasticsearch security","started_year":2015}]} +{"index":{"_id":"2"}} +{"id":4,"name":"Susan Smith","title":"Dev Mgr","projects":[]} +{"index":{"_id":"3"}} +{"id":6,"name":"Jane Smith","title":"Software Eng 2","projects":[{"name":"SQL security","started_year":1998},{"name":"Hello security","started_year":2015,"address":[{"city":"Dallas","state":"TX"}]}]} +``` + +### Example 1: Unnesting a nested collection + +This example finds the nested document (`projects`) with a field value (`name`) that satisfies the predicate (contains `security`). Because each parent document can have more than one nested documents, the nested document that matches is flattened. In other words, the final result is the cartesian product between the parent and nested documents. + +```sql +SELECT e.name AS employeeName, + p.name AS projectName +FROM employees_nested AS e, + e.projects AS p +WHERE p.name LIKE '%security%' +``` + +Explain: + +```json +{ + "from" : 0, + "size" : 200, + "query" : { + "bool" : { + "filter" : [ + { + "bool" : { + "must" : [ + { + "nested" : { + "query" : { + "wildcard" : { + "projects.name" : { + "wildcard" : "*security*", + "boost" : 1.0 + } + } + }, + "path" : "projects", + "ignore_unmapped" : false, + "score_mode" : "none", + "boost" : 1.0, + "inner_hits" : { + "ignore_unmapped" : false, + "from" : 0, + "size" : 3, + "version" : false, + "seq_no_primary_term" : false, + "explain" : false, + "track_scores" : false, + "_source" : { + "includes" : [ + "projects.name" + ], + "excludes" : [ ] + } + } + } + } + ], + "adjust_pure_negative" : true, + "boost" : 1.0 + } + } + ], + "adjust_pure_negative" : true, + "boost" : 1.0 + } + }, + "_source" : { + "includes" : [ + "name" + ], + "excludes" : [ ] + } +} +``` + +Result set: + +| employeeName | projectName +:--- | :--- +Bob Smith | Elasticsearch Security +Bob Smith | SQL security +Jane Smith | Hello security +Jane Smith | SQL security + +### Example 2: Unnesting in existential subquery + +To unnest a nested collection in a subquery to check if it satisfies a condition: + +```sql +SELECT e.name AS employeeName +FROM employees_nested AS e +WHERE EXISTS ( + SELECT * + FROM e.projects AS p + WHERE p.name LIKE '%security%' +) +``` + +Explain: + +```json +{ + "from" : 0, + "size" : 200, + "query" : { + "bool" : { + "filter" : [ + { + "bool" : { + "must" : [ + { + "nested" : { + "query" : { + "bool" : { + "must" : [ + { + "bool" : { + "must" : [ + { + "bool" : { + "must_not" : [ + { + "bool" : { + "must_not" : [ + { + "exists" : { + "field" : "projects", + "boost" : 1.0 + } + } + ], + "adjust_pure_negative" : true, + "boost" : 1.0 + } + } + ], + "adjust_pure_negative" : true, + "boost" : 1.0 + } + }, + { + "wildcard" : { + "projects.name" : { + "wildcard" : "*security*", + "boost" : 1.0 + } + } + } + ], + "adjust_pure_negative" : true, + "boost" : 1.0 + } + } + ], + "adjust_pure_negative" : true, + "boost" : 1.0 + } + }, + "path" : "projects", + "ignore_unmapped" : false, + "score_mode" : "none", + "boost" : 1.0 + } + } + ], + "adjust_pure_negative" : true, + "boost" : 1.0 + } + } + ], + "adjust_pure_negative" : true, + "boost" : 1.0 + } + }, + "_source" : { + "includes" : [ + "name" + ], + "excludes" : [ ] + } +} +``` + +Result set: + +| employeeName | +:--- | :--- +Bob Smith | +Jane Smith | diff --git a/docs/sql/protocol.md b/docs/sql/protocol.md index 29637eda..a280698f 100644 --- a/docs/sql/protocol.md +++ b/docs/sql/protocol.md @@ -2,22 +2,11 @@ layout: default title: Protocol parent: SQL -nav_order: 5 +nav_order: 13 --- # Protocol ---- - -#### Table of contents -- TOC -{:toc} - - ---- - -## Introduction - For the protocol, SQL plugin provides multiple response formats for different purposes while the request format is same for all. Among them JDBC format is widely used because it provides schema information and diff --git a/docs/sql/settings.md b/docs/sql/settings.md index c60cc9aa..4e6d51f9 100644 --- a/docs/sql/settings.md +++ b/docs/sql/settings.md @@ -1,297 +1,39 @@ --- layout: default -title: Plugin Settings +title: Settings parent: SQL -nav_order: 6 +nav_order: 15 --- -# Plugin Settings - ---- - -#### Table of contents -- TOC -{:toc} - - ---- - -## Introduction +# Settings When Elasticsearch bootstraps, SQL plugin will register a few settings in Elasticsearch cluster settings. Most of the settings are able to change dynamically so you can control the behavior of SQL plugin without need to bounce your cluster. -## opendistro.sql.enabled - -### Description - -You can disable SQL plugin to reject all coming requests. - -1. The default value is true. -2. This setting is node scope. -3. This setting can be updated dynamically. - -### Example 1 - -You can update the setting with a new value like this. - -SQL query: - -```console ->> curl -H 'Content-Type: application/json' -X PUT localhost:9200/_cluster/settings -d '{ - "transient" : { - "opendistro.sql.enabled" : false - } -}' -``` - -Result set: - -```json -{ - "acknowledged": true, - "persistent": {}, - "transient": { - "opendistro": { - "sql": { - "enabled": "false" - } - } - } -} -``` - -### Example 2 - -Query result after the setting updated is like: - -SQL query: - -```console ->> curl -H 'Content-Type: application/json' -X POST localhost:9200/_opendistro/_sql -d '{ - "query" : "SELECT * FROM accounts" -}' -``` - -Result set: - -```json -{ - "error": { - "reason": "Invalid SQL query", - "details": "Either opendistro.sql.enabled or rest.action.multi.allow_explicit_index setting is false", - "type": "SQLFeatureDisabledException" - }, - "status": 400 -} -``` - -## opendistro.sql.query.slowlog - -### Description - -You can configure the time limit (seconds) for slow query which would be -logged as 'Slow query: elapsed=xxx (ms)' in elasticsearch.log. - -1. The default value is 2. -2. This setting is node scope. -3. This setting can be updated dynamically. - -### Example - -You can update the setting with a new value like this. - -SQL query: - -```console ->> curl -H 'Content-Type: application/json' -X PUT localhost:9200/_cluster/settings -d '{ - "transient" : { - "opendistro.sql.query.slowlog" : 10 - } -}' -``` - -Result set: - -```json -{ - "acknowledged": true, - "persistent": {}, - "transient": { - "opendistro": { - "sql": { - "query": { - "slowlog": "10" - } - } - } - } -} -``` - -## opendistro.sql.query.analysis.enabled - -### Description - -You can disable query analyzer to bypass strict syntactic and semantic -analysis. - -1. The default value is true. -2. This setting is node scope. -3. This setting can be updated dynamically. - -### Example - -You can update the setting with a new value like this. - -SQL query: - -```console ->> curl -H 'Content-Type: application/json' -X PUT localhost:9200/_cluster/settings -d '{ - "transient" : { - "opendistro.sql.query.analysis.enabled" : false - } -}' -``` - -Result set: - -```json -{ - "acknowledged": true, - "persistent": {}, - "transient": { - "opendistro": { - "sql": { - "query": { - "analysis": { - "enabled": "false" - } - } - } - } - } -} -``` - -## opendistro.sql.query.analysis.semantic.suggestion - -### Description - -You can enable query analyzer to suggest correct field names for quick -fix. - -1. The default value is false. -2. This setting is node scope. -3. This setting can be updated dynamically. - -### Example 1 - -You can update the setting with a new value like this. - -SQL query: - -```console ->> curl -H 'Content-Type: application/json' -X PUT localhost:9200/_cluster/settings -d '{ - "transient" : { - "opendistro.sql.query.analysis.semantic.suggestion" : true - } -}' -``` - -Result set: - -```json -{ - "acknowledged": true, - "persistent": {}, - "transient": { - "opendistro": { - "sql": { - "query": { - "analysis": { - "semantic": { - "suggestion": "true" - } - } - } - } - } - } -} -``` - -### Example 2 - -Query result after the setting updated is like: +You can update a setting with a new value like this. SQL query: -```console ->> curl -H 'Content-Type: application/json' -X POST localhost:9200/_opendistro/_sql -d '{ - "query" : "SELECT first FROM accounts" -}' -``` - -Result set: - ```json -{ - "error": { - "reason": "Invalid SQL query", - "details": "Field [first] cannot be found or used here. Did you mean [firstname]?", - "type": "SemanticAnalysisException" - }, - "status": 400 -} -``` - -## opendistro.sql.query.analysis.semantic.threshold - -### Description - -Because query analysis needs to build semantic context in memory, index -with large number of field would be skipped. You can update it to apply -analysis to smaller or larger index as needed. - -1. The default value is 200. -2. This setting is node scope. -3. This setting can be updated dynamically. - -### Example - -You can update the setting with a new value like this. - -SQL query: - -```console ->> curl -H 'Content-Type: application/json' -X PUT localhost:9200/_cluster/settings -d '{ +>> curl -H 'Content-Type: application/json' -u admin:admin -k -XPUT https://localhost:9200/_cluster/settings -d '{ "transient" : { - "opendistro.sql.query.analysis.semantic.threshold" : 50 + "opendistro.sql.enabled" : false } }' ``` -Result set: +You can update the following settings: -```json -{ - "acknowledged": true, - "persistent": {}, - "transient": { - "opendistro": { - "sql": { - "query": { - "analysis": { - "semantic": { - "threshold": "50" - } - } - } - } - } - } -} -``` +Setting | Default | Description +:--- | :--- | :--- +`opendistro.sql.enabled` | True | You can disable SQL plugin to reject all coming requests. +`opendistro.sql.query.slowlog` | 2 seconds | You can configure the time limit (seconds) for slow query which would be logged as `Slow query: elapsed=xxx (ms)` in `elasticsearch.log`. +`opendistro.sql.query.analysis.enabled` | True | You can disable query analyzer to bypass strict syntactic and semantic analysis. +`opendistro.sql.query.analysis.semantic.suggestion` | False | You can enable query analyzer to suggest correct field names for quick fix. +`opendistro.sql.query.analysis.semantic.threshold` | 200 | Because query analysis needs to build semantic context in memory, index with large number of field would be skipped. You can update it to apply analysis to smaller or larger index as needed. +`opendistro.sql.query.response.format` | JDBC | You can set default response format of the query. The supported formats include: JDBC, JSON, CSV, raw, and table. +`opendistro.sql.cursor.enabled` | False | You can enable or disable pagination for all queries that are supported. +`opendistro.sql.cursor.fetch_size` | 1,000 | You can set the default `fetch_size` for all queries that are supported by pagination. An explicit `fetch_size` passed in request overrides this value. +`opendistro.sql.cursor.keep_alive` | 1 minute | You can set this value to indicate how long the cursor context is kept open. Cursor contexts are resource heavy, we recommend using a lower value, if possible. diff --git a/docs/sql/sql-full-text.md b/docs/sql/sql-full-text.md new file mode 100644 index 00000000..736cd2d4 --- /dev/null +++ b/docs/sql/sql-full-text.md @@ -0,0 +1,119 @@ +--- +layout: default +title: Full-Text Search +parent: SQL +nav_order: 8 +--- + +# Full-text search + +Use SQL commands for full-text search. The SQL plugin supports a subset of the full-text queries available in Elasticsearch. + +To learn about full-text queries in Elasticsearch, see [Full-text queries](../../elasticsearch/full-text/). + +## Match + +To search for text in a single field, use `MATCHQUERY` or `MATCH_QUERY` functions. + +Pass in your search query and the field name that you want to search against. + + +```sql +SELECT account_number, address +FROM accounts +WHERE MATCH_QUERY(address, 'Holmes') +``` + +Alternate syntax: + +```sql +SELECT account_number, address +FROM accounts +WHERE address = MATCH_QUERY('Holmes') +``` + + +| account_number | address +:--- | :--- +1 | 880 Holmes Lane + + +## Multi match + +To search for text in multiple fields, use `MULTI_MATCH`, `MULTIMATCH`, or `MULTIMATCHQUERY` functions. + +For example, search for `Dale` in either the `firstname` or `lastname` fields: + + +```sql +SELECT firstname, lastname +FROM accounts +WHERE MULTI_MATCH('query'='Dale', 'fields'='*name') +``` + + +| firstname | lastname +:--- | :--- +Dale | Adams + + +## Query string + +To split text based on operators, use the `QUERY` function. + + +```sql +SELECT account_number, address +FROM accounts +WHERE QUERY('address:Lane OR address:Street') +``` + + +| account_number | address +:--- | :--- +1 | 880 Holmes Lane +6 | 671 Bristol Street +13 | 789 Madison Street + + +The `QUERY` function supports logical connectives, wildcard, regex, and proximity search. + + +## Match phrase + +To search for exact phrases, use `MATCHPHRASE`, `MATCH_PHRASE`, or `MATCHPHRASEQUERY` functions. + + +```sql +SELECT account_number, address +FROM accounts +WHERE MATCH_PHRASE(address, '880 Holmes Lane') +``` + + +| account_number | address +:--- | :--- +1 | 880 Holmes Lane + + +## Score query + +To return a relevance score along with every matching document, use `SCORE`, `SCOREQUERY`, or `SCORE_QUERY` functions. + +You need to pass in two arguments. The first is the `MATCH_QUERY` expression. The second is an optional floating point number to boost the score (default value is 1.0). + + +```sql +SELECT account_number, address, _score +FROM accounts +WHERE SCORE(MATCH_QUERY(address, 'Lane'), 0.5) OR + SCORE(MATCH_QUERY(address, 'Street'), 100) +ORDER BY _score +``` + + +| account_number | address | score +:--- | :--- +1 | 880 Holmes Lane | 0.5 +6 | 671 Bristol Street | 100 +13 | 789 Madison Street | 100 diff --git a/docs/sql/troubleshoot.md b/docs/sql/troubleshoot.md new file mode 100644 index 00000000..a229afb6 --- /dev/null +++ b/docs/sql/troubleshoot.md @@ -0,0 +1,90 @@ +--- +layout: default +title: Troubleshooting +parent: SQL +nav_order: 16 +--- + +# Troubleshooting + +The SQL plugin is stateless, so troubleshooting is mostly focused on why a particular query fails. + +The most common error is the dreaded null pointer exception, which can occur during parsing errors or when using the wrong HTTP method (POST vs. GET and vice versa). The POST method and HTTP request body offer the most consistent results: + +```json +POST _opendistro/_sql +{ + "query": "SELECT * FROM my-index WHERE ['name.firstname']='saanvi' LIMIT 5" +} +``` + +If a query isn't behaving the way you expect, use the `_explain` API to see the translated query, which you can then troubleshoot. For most operations, `_explain` returns Elasticsearch query DSL. For `UNION`, `MINUS`, and `JOIN`, it returns something more akin to a SQL execution plan. + +#### Sample request + +```json +POST _opendistro/_sql/_explain +{ + "query": "SELECT * FROM my-index LIMIT 50" +} +``` + + +#### Sample response + +```json +{ + "from": 0, + "size": 50 +} +``` + +## Syntax analysis exception + +You might receive the following error if the plugin can't parse your query: + +```json +{ + "reason": "Invalid SQL query", + "details": "Failed to parse query due to offending symbol [:] at: 'SELECT * FROM xxx WHERE xxx:' <--- HERE... + More details: Expecting tokens in {, 'AND', 'BETWEEN', 'GROUP', 'HAVING', 'IN', 'IS', 'LIKE', 'LIMIT', + 'NOT', 'OR', 'ORDER', 'REGEXP', '*', '/', '%', '+', '-', 'DIV', 'MOD', '=', '>', '<', '!', + '|', '&', '^', '.', DOT_ID}", + "type": "SyntaxAnalysisException" +} +``` + +To resolve this error: + +1. Check if your syntax follows the [MySQL grammar](https://dev.mysql.com/doc/refman/8.0/en/). +2. If your syntax is correct, disable strict query analysis: + + ```json + PUT _cluster/settings + { + "persistent" : { + "opendistro.sql.query.analysis.enabled" : false + } + } + ``` + +3. Run the query again to see if it works. + +## Index mapping verification exception + +If you see the following verification exception: + +```json +{ + "error": { + "reason": "There was internal problem at backend", + "details": "When using multiple indices, the mappings must be identical.", + "type": "VerificationException" + }, + "status": 503 +} +``` + +Make sure the index in your query is not an index pattern and is not an index pattern and doesn't have multiple types. + +If these steps don't work, submit a Github issue [here](https://github.com/opendistro-for-elasticsearch/sql/issues). diff --git a/docs/sql/workbench.md b/docs/sql/workbench.md new file mode 100644 index 00000000..6885960d --- /dev/null +++ b/docs/sql/workbench.md @@ -0,0 +1,13 @@ +--- +layout: default +title: Workbench +parent: SQL +nav_order: 1 +--- + + +# Workbench + +Use the SQL workbench to easily run on-demand SQL queries, translate SQL into its REST equivalent, and view and save results as text, JSON, JDBC, or CSV. + +![SQL workbench](../../images/workbench.gif)