Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PPL describe command #646

Merged
merged 20 commits into from
Jun 28, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions docs/category.json
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@
],
"ppl_cli": [
"user/ppl/cmd/dedup.rst",
"user/ppl/cmd/describe.rst",
"user/ppl/cmd/eval.rst",
"user/ppl/cmd/fields.rst",
"user/ppl/cmd/head.rst",
Expand Down
65 changes: 65 additions & 0 deletions docs/user/ppl/cmd/describe.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
=============
describe
=============

.. rubric:: Table of contents

.. contents::
:local:
:depth: 2


Description
============
| Using ``describe`` command to query metadata of the index. ``describe`` command could be only used as the first command in the PPL query.


Syntax
============
describe <index>

* index: mandatory. describe command must specify which index to query from.


Example 1: Fetch all the metadata
=================================

The example describes accounts index.
seankao-az marked this conversation as resolved.
Show resolved Hide resolved

PPL query::

os> describe accounts;
fetched rows / total rows = 11/11
+----------------+---------------+--------------+----------------+-------------+-------------+---------------+-----------------+------------------+------------------+------------+-----------+--------------+-----------------+--------------------+---------------------+--------------------+---------------+-----------------+----------------+---------------+--------------------+--------------------+----------------------+
| TABLE_CAT | TABLE_SCHEM | TABLE_NAME | COLUMN_NAME | DATA_TYPE | TYPE_NAME | COLUMN_SIZE | BUFFER_LENGTH | DECIMAL_DIGITS | NUM_PREC_RADIX | NULLABLE | REMARKS | COLUMN_DEF | SQL_DATA_TYPE | SQL_DATETIME_SUB | CHAR_OCTET_LENGTH | ORDINAL_POSITION | IS_NULLABLE | SCOPE_CATALOG | SCOPE_SCHEMA | SCOPE_TABLE | SOURCE_DATA_TYPE | IS_AUTOINCREMENT | IS_GENERATEDCOLUMN |
|----------------+---------------+--------------+----------------+-------------+-------------+---------------+-----------------+------------------+------------------+------------+-----------+--------------+-----------------+--------------------+---------------------+--------------------+---------------+-----------------+----------------+---------------+--------------------+--------------------+----------------------|
| docTestCluster | null | accounts | account_number | null | long | null | null | null | 10 | 2 | null | null | null | null | null | 0 | | null | null | null | null | NO | |
| docTestCluster | null | accounts | firstname | null | text | null | null | null | 10 | 2 | null | null | null | null | null | 1 | | null | null | null | null | NO | |
| docTestCluster | null | accounts | address | null | text | null | null | null | 10 | 2 | null | null | null | null | null | 2 | | null | null | null | null | NO | |
| docTestCluster | null | accounts | balance | null | long | null | null | null | 10 | 2 | null | null | null | null | null | 3 | | null | null | null | null | NO | |
| docTestCluster | null | accounts | gender | null | text | null | null | null | 10 | 2 | null | null | null | null | null | 4 | | null | null | null | null | NO | |
| docTestCluster | null | accounts | city | null | text | null | null | null | 10 | 2 | null | null | null | null | null | 5 | | null | null | null | null | NO | |
| docTestCluster | null | accounts | employer | null | text | null | null | null | 10 | 2 | null | null | null | null | null | 6 | | null | null | null | null | NO | |
| docTestCluster | null | accounts | state | null | text | null | null | null | 10 | 2 | null | null | null | null | null | 7 | | null | null | null | null | NO | |
| docTestCluster | null | accounts | age | null | long | null | null | null | 10 | 2 | null | null | null | null | null | 8 | | null | null | null | null | NO | |
| docTestCluster | null | accounts | email | null | text | null | null | null | 10 | 2 | null | null | null | null | null | 9 | | null | null | null | null | NO | |
| docTestCluster | null | accounts | lastname | null | text | null | null | null | 10 | 2 | null | null | null | null | null | 10 | | null | null | null | null | NO | |
+----------------+---------------+--------------+----------------+-------------+-------------+---------------+-----------------+------------------+------------------+------------+-----------+--------------+-----------------+--------------------+---------------------+--------------------+---------------+-----------------+----------------+---------------+--------------------+--------------------+----------------------+

Example 2: Fetch metadata with condition and filter
===================================================

The example retrieves columns with type long in accounts index.

PPL query::

os> describe accounts | where TYPE_NAME="long" | fields COLUMN_NAME;
fetched rows / total rows = 3/3
+----------------+
| COLUMN_NAME |
|----------------|
| account_number |
| balance |
| age |
+----------------+

2 changes: 1 addition & 1 deletion docs/user/ppl/cmd/syntax.rst
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ Syntax

Command Order
=============
The PPL query started with ``search`` command to reference a table search from. All the following command could be in any order. In the following example, ``search`` command refer the accounts index as the source, then using fields and where command to do the further processing.
The PPL query starts with either the ``search`` command to reference a table to search from, or the ``describe`` command to reference a table to get its metadata. All the following command could be in any order. In the following example, ``search`` command refer the accounts index as the source, then using fields and where command to do the further processing.

.. code-block::
Expand Down
2 changes: 2 additions & 0 deletions docs/user/ppl/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,8 @@ The query start with search command and then flowing a set of command delimited

- `dedup command <cmd/dedup.rst>`_

- `describe command <cmd/describe.rst>`_

- `eval command <cmd/eval.rst>`_

- `fields command <cmd/fields.rst>`_
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,90 @@
/*
* Copyright OpenSearch Contributors
* SPDX-License-Identifier: Apache-2.0
*/


package org.opensearch.sql.ppl;

import org.json.JSONObject;
import org.junit.jupiter.api.Test;
import org.opensearch.client.Request;
import org.opensearch.client.ResponseException;

import java.io.IOException;

import static org.opensearch.sql.legacy.TestsConstants.TEST_INDEX_DOG;
import static org.opensearch.sql.util.MatcherUtils.columnName;
import static org.opensearch.sql.util.MatcherUtils.verifyColumn;
import static org.opensearch.sql.util.MatcherUtils.verifyDataRows;

public class DescribeCommandIT extends PPLIntegTestCase {

@Override
public void init() throws IOException {
loadIndex(Index.DOG);
}

@Test
public void testDescribeAllFields() throws IOException {
JSONObject result = executeQuery(String.format("describe %s", TEST_INDEX_DOG));
verifyColumn(
result,
columnName("TABLE_CAT"),
columnName("TABLE_SCHEM"),
columnName("TABLE_NAME"),
columnName("COLUMN_NAME"),
columnName("DATA_TYPE"),
columnName("TYPE_NAME"),
columnName("COLUMN_SIZE"),
columnName("BUFFER_LENGTH"),
columnName("DECIMAL_DIGITS"),
columnName("NUM_PREC_RADIX"),
columnName("NULLABLE"),
columnName("REMARKS"),
columnName("COLUMN_DEF"),
columnName("SQL_DATA_TYPE"),
columnName("SQL_DATETIME_SUB"),
columnName("CHAR_OCTET_LENGTH"),
columnName("ORDINAL_POSITION"),
columnName("IS_NULLABLE"),
columnName("SCOPE_CATALOG"),
columnName("SCOPE_SCHEMA"),
columnName("SCOPE_TABLE"),
columnName("SOURCE_DATA_TYPE"),
columnName("IS_AUTOINCREMENT"),
columnName("IS_GENERATEDCOLUMN")
);
}

@Test
public void testDescribeFilterFields() throws IOException {
JSONObject result = executeQuery(String.format("describe %s | fields TABLE_NAME, COLUMN_NAME, TYPE_NAME", TEST_INDEX_DOG));
verifyColumn(
result,
columnName("TABLE_NAME"),
columnName("COLUMN_NAME"),
columnName("TYPE_NAME")
);
}

@Test
public void testDescribeWithSpecialIndexName() throws IOException {
executeRequest(new Request("PUT", "/logs-2021.01.11"));
verifyDataRows(executeQuery("describe logs-2021.01.11"));

executeRequest(new Request("PUT", "/logs-7.10.0-2021.01.11"));
verifyDataRows(executeQuery("describe logs-7.10.0-2021.01.11"));
}

@Test
public void describeCommandWithoutIndexShouldFailToParse() throws IOException {
try {
executeQuery("describe");
fail();
} catch (ResponseException e) {
assertTrue(e.getMessage().contains("RuntimeException"));
assertTrue(e.getMessage().contains("Failed to parse query due to offending symbol"));
}
}
}
1 change: 1 addition & 0 deletions ppl/src/main/antlr/OpenSearchPPLLexer.g4
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ channels { WHITESPACE, ERRORCHANNEL }

// COMMAND KEYWORDS
SEARCH: 'SEARCH';
DESCRIBE: 'DESCRIBE';
FROM: 'FROM';
WHERE: 'WHERE';
FIELDS: 'FIELDS';
Expand Down
19 changes: 16 additions & 3 deletions ppl/src/main/antlr/OpenSearchPPLParser.g4
Original file line number Diff line number Diff line change
Expand Up @@ -14,10 +14,15 @@ root

/** statement */
pplStatement
: searchCommand (PIPE commands)*
: pplCommands (PIPE commands)*
;

/** commands */
pplCommands
: searchCommand
| describeCommand
;

commands
: whereCommand | fieldsCommand | renameCommand | statsCommand | dedupCommand | sortCommand | evalCommand | headCommand
| topCommand | rareCommand | parseCommand | kmeansCommand | adCommand;
Expand All @@ -28,6 +33,10 @@ searchCommand
| (SEARCH)? logicalExpression fromClause #searchFilterFrom
;

describeCommand
: DESCRIBE tableSourceClause
;

whereCommand
: WHERE logicalExpression
;
Expand Down Expand Up @@ -119,8 +128,12 @@ adParameter

/** clauses */
fromClause
: SOURCE EQUAL tableSource (COMMA tableSource)*
| INDEX EQUAL tableSource (COMMA tableSource)*
: SOURCE EQUAL tableSourceClause
| INDEX EQUAL tableSourceClause
;

tableSourceClause
: tableSource (COMMA tableSource)*
;

renameClasue
Expand Down
22 changes: 20 additions & 2 deletions ppl/src/main/java/org/opensearch/sql/ppl/parser/AstBuilder.java
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,9 @@

package org.opensearch.sql.ppl.parser;

import static org.opensearch.sql.ast.dsl.AstDSL.qualifiedName;
import static org.opensearch.sql.ppl.antlr.parser.OpenSearchPPLParser.DedupCommandContext;
import static org.opensearch.sql.ppl.antlr.parser.OpenSearchPPLParser.DescribeCommandContext;
import static org.opensearch.sql.ppl.antlr.parser.OpenSearchPPLParser.EvalCommandContext;
import static org.opensearch.sql.ppl.antlr.parser.OpenSearchPPLParser.FieldsCommandContext;
import static org.opensearch.sql.ppl.antlr.parser.OpenSearchPPLParser.FromClauseContext;
Expand All @@ -19,8 +21,10 @@
import static org.opensearch.sql.ppl.antlr.parser.OpenSearchPPLParser.SearchFromFilterContext;
import static org.opensearch.sql.ppl.antlr.parser.OpenSearchPPLParser.SortCommandContext;
import static org.opensearch.sql.ppl.antlr.parser.OpenSearchPPLParser.StatsCommandContext;
import static org.opensearch.sql.ppl.antlr.parser.OpenSearchPPLParser.TableSourceClauseContext;
import static org.opensearch.sql.ppl.antlr.parser.OpenSearchPPLParser.TopCommandContext;
import static org.opensearch.sql.ppl.antlr.parser.OpenSearchPPLParser.WhereCommandContext;
import static org.opensearch.sql.utils.SystemIndexUtils.mappingTable;

import com.google.common.collect.ImmutableList;
import com.google.common.collect.ImmutableMap;
Expand Down Expand Up @@ -79,11 +83,11 @@ public class AstBuilder extends OpenSearchPPLParserBaseVisitor<UnresolvedPlan> {

@Override
public UnresolvedPlan visitPplStatement(PplStatementContext ctx) {
UnresolvedPlan search = visit(ctx.searchCommand());
UnresolvedPlan pplCommand = visit(ctx.pplCommands());
return ctx.commands()
.stream()
.map(this::visit)
.reduce(search, (r, e) -> e.attach(r));
.reduce(pplCommand, (r, e) -> e.attach(r));
}

/**
Expand All @@ -106,6 +110,15 @@ public UnresolvedPlan visitSearchFilterFrom(SearchFilterFromContext ctx) {
visit(ctx.fromClause()));
}

/**
* Describe command.
*/
@Override
public UnresolvedPlan visitDescribeCommand(DescribeCommandContext ctx) {
seankao-az marked this conversation as resolved.
Show resolved Hide resolved
final Relation table = (Relation) visitTableSourceClause(ctx.tableSourceClause());
return new Relation(qualifiedName(mappingTable(table.getTableName())));
}

/**
* Where command.
*/
Expand Down Expand Up @@ -286,6 +299,11 @@ public UnresolvedPlan visitTopCommand(TopCommandContext ctx) {
*/
@Override
public UnresolvedPlan visitFromClause(FromClauseContext ctx) {
return visitTableSourceClause(ctx.tableSourceClause());
}

@Override
public UnresolvedPlan visitTableSourceClause(TableSourceClauseContext ctx) {
return new Relation(ctx.tableSource()
.stream().map(this::internalVisitExpression)
.collect(Collectors.toList()));
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -112,5 +112,31 @@ public void can_parse_simple_query_string_relevance_function() {
"SOURCE=test | WHERE simple_query_string([\"Tags\" ^ 1.5, Title, `Body` 4.2], 'query',"
+ "analyzer=keyword, quote_field_suffix=\".exact\", fuzzy_prefix_length = 4)"));
}

@Test
public void testDescribeCommandShouldPass() {
ParseTree tree = new PPLSyntaxParser().analyzeSyntax("describe t");
assertNotEquals(null, tree);
}

@Test
public void testDescribeCommandWithMultipleIndicesShouldPass() {
ParseTree tree = new PPLSyntaxParser().analyzeSyntax("describe t,u");
assertNotEquals(null, tree);
}

@Test
public void testDescribeFieldsCommandShouldPass() {
ParseTree tree = new PPLSyntaxParser().analyzeSyntax("describe t | fields a,b");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the use case for this example? It seems little odd because if someone wants a,b as result he would specify a,b instead of describing table and filtering again.

Fields command is to project menitoned columns from the result set. A plausible usecase could be describe t | fields 2, 3 which implies give me second and third column names.

Also if someone appends other commands to describe, what is the expected behavior. I am assuming we will be calculating on the result set provided by prior describe command.

Copy link
Collaborator Author

@seankao-az seankao-az Jun 24, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am assuming we will be calculating on the result set provided by prior describe command.

That's correct. When appending other commands to describe, the behavior is to query the metadata table, instead of the data table itself (as expected for the pipe syntax).

An example of the usage of fields can be seen here

The fields do not refer to the data table's fields, but the metadata table's fields, because the result set of the describe command is a metadata table. Here you can see the full list of such fields.

Copy link
Collaborator Author

@seankao-az seankao-az Jun 25, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A plausible usecase could be describe t | fields 2, 3 which implies give me second and third column names.

Interestingly, we could support that using the following syntax:

describe t | where ORDINAL_POSITION=2 or ORDINAL_POSITION=3 | fields COLUMN_NAME

However, it doesn't quite work yet at the moment, due to type mismatch snippet 1, snippet 2. Also, most of the metadata are meaningless right now, including the order of the columns.

$ curl .... '{"query": "describe opensearch_dashboards_sample_data_flights | where ORDINAL_POSITION=0"}'

{
  "error": {
    "reason": "Invalid Query",
    "details": "= function expected {[BYTE,BYTE],[SHORT,SHORT],[INTEGER,INTEGER],[LONG,LONG],[FLOAT,FLOAT],[DOUBLE,DOUBLE],[STRING,STRING],[BOOLEAN,BOOLEAN],[TIMESTAMP,TIMESTAMP],[DATE,DATE],[TIME,TIME],[DATETIME,DATETIME],[INTERVAL,INTERVAL],[STRUCT,STRUCT],[ARRAY,ARRAY]}, but get [STRING,INTEGER]",
    "type": "ExpressionEvaluationException"
  },
  "status": 400
}

$ curl .... '{"query": "describe opensearch_dashboards_sample_data_flights | where ORDINAL_POSITION=\"0\""}'

{
  "error": {
    "reason": "Invalid Query",
    "details": "invalid to get integerValue from value of type STRING",
    "type": "ExpressionEvaluationException"
  },
  "status": 400
}

assertNotEquals(null, tree);
}

@Test
public void testDescribeCommandWithSourceShouldFail() {
exceptionRule.expect(RuntimeException.class);
exceptionRule.expectMessage("Failed to parse query due to offending symbol");

new PPLSyntaxParser().analyzeSyntax("describe source=t");
}
}

Loading