Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] SQL grammar spec #4916

Open
leiysky opened this issue Apr 18, 2022 · 5 comments
Open

[RFC] SQL grammar spec #4916

leiysky opened this issue Apr 18, 2022 · 5 comments
Assignees
Labels
C-documentation Category: documentation C-proposal Category: proposal

Comments

@leiysky
Copy link
Contributor

leiysky commented Apr 18, 2022

A formal SQL specification is needed by parser.

@leiysky leiysky added C-documentation Category: documentation C-proposal Category: proposal labels Apr 18, 2022
@flaneur2020
Copy link
Member

can we embed some docs about statement in the grammar spec?

so we can generate the docs like https://databend.rs/doc/reference/sql/ddl/stage/ddl-create-stage from the spec, maybe this could make it easier to keep the docs consistent with the code.

@leiysky
Copy link
Contributor Author

leiysky commented Apr 18, 2022

Statement list

Statements can be categorized into:

  • Query statements, e.g. SELECT * FROM t.
  • DDL statements, e.g. CREATE TABLE t(a INT), ALTER TABLE t(a INT).
  • DML statements, e.g. INSERT INTO t VALUES(...), COPY INTO t FROM ...
  • Other operational statements, e.g. SHOW TABLES

Here we give the full list of statements in EBNF form.

The reference of EBNF notation comes from this railroad diagram generator.

You can generate preview of railroad diagram with the website above.

Example:
image

Query statements

select_statement ::= "SELECT" ("DISTINCT")? select_list "FROM" table_reference_list ("WHERE" expression)? ("GROUP" "BY" group_by_list)? ("HAVING" expression)? ("ORDER" "BY" expression ("ASC" | "DESC")?)? ("LIMIT" (expression | (expression "," expression)))?

select_list ::= select_target ("," select_target)* 

table_reference_list ::= table_reference ("," table_reference)*

select_target ::= expr ("AS" identifier)?
                  | qualified_wildcard
                  | "*"

qualified_wildcard ::= identifier ("." identifier)? "." "*"

table_reference ::= aliased_table
                    | parenthesized_subquery ("AS")? identifier
                    | parenthesized_joined_tables (("AS")? identifier)?

parenthesized_subquery ::= "(" parenthesized_subquery ")"
                           | subquery

parenthesized_joined_tables ::= "(" parenthesized_joined_tables ")"
                                | joined_tables

joined_tables ::= table_reference (("INNER")? | (("LEFT" | "RIGHT" | "FULL") ("OUTER")?)) "JOIN" table_reference "ON" expression
                  | table_reference (("INNER")? | (("LEFT" | "RIGHT" | "FULL") ("OUTER")?)) "JOIN" table_reference "USING" "(" join_key_list ")"
                  | table_reference "NATURAL" (("INNER")? | (("LEFT" | "RIGHT" | "FULL") ("OUTER")?)) "JOIN" table_reference

join_key_list ::= identifier ("," identifier)*

aliased_table ::= identifier (("AS")? identifier)?

group_by_list ::= expression ("," expression)*

DDL Statements

Create table:


create_table_statement ::= "CREATE" "TABLE" ("IF" "NOT" "EXISTS")? identifier "(" column_definition_list ")"

column_definition_list ::= column_definition ("," column_definition)*

column_definition ::= identifier data_type (column_constraint)*

data_type ::= "BOOLEAN"
              | "TINYINT"
              | "TINYINT" "UNSIGNED"
              | "SMALLINT"
              | "SMALLINT" "UNSIGNED"
              | "INT"
              | "INT" "UNSIGNED"
              | "BIGINT"
              | "BIGINT" "UNSIGNED"
              | "FLOAT"
              | "DOUBLE"
              | "DATE"
              | "DATETIME"
              | "TIMESTAMP"
              | "VARCHAR"
              | "ARRAY"
              | "OBJECT"
              | "VARIANT"

column_constraint ::= "NULL"
                      | "NOT" "NULL"
                      | "DEFAULT" expression

Alter table:

alter_table_statement ::= "ALTER" "TABLE" ("IF" "EXISTS")? identifier action_list

action_list ::= action ("," action)*

action ::= "RENAME" "TO" identifier

Drop table:

drop_table_statement ::= "DROP" "TABLE" ("IF" "EXISTS")? identifier

Truncate table:

truncate_table_statement ::= "TRUNCATE" "TABLE" identifier

Database:

create_database_statement ::= "CREATE" "DATABASE" ("IF" "NOT" "EXISTS") identifier

drop_database_statement ::= "DROP" ("DATABASE" | "SCHEMA") ("IF" "EXISTS")? identifier

Stage:

create_stage_statement ::= "CREATE" "STAGE" ("IF" "NOT" "EXISTS")? identifier (external_stage_params)? (file_format)? (copy_options)? (comment)?

external_stage_params ::= "URL" "=" string_literal ("CREDENTIALS" "=" "(" "AWS_KEY_ID" "=" string_literal "AWS_SECRET_KEY" "=" string_literal ")") ("ENDPOINT_URL" "=" string_literal)

file_format ::= "FILE_FORMAT" "=" "(" "TYPE" "=" ("CSV" | "PARQUET") (format_options)? ")"

format_options ::= ("RECORD_DELIMITER" "=" string_literal)? ("FIELD_DELIMITER" "=" string_literal)? ("SKIP_HEADER" "=" integer_literal)?

copy_options ::= "COPY_OPTIONS" "=" "(" "SIZE_LIMIT" "=" integer_literal ")"

comment ::= "COMMENT" "=" string_literal

drop_stage_statement ::= "DROP" "STAGE" ("IF" "EXISTS")? identifier

desc_stage_statement ::= "DESC" "STAGE" identifier

list_stage_files_statement ::= "LIST" identifier ("PATTERN" "=" string_literal)

DML Statements

Insert statement:

insert_statement ::= "INSERT" ("INTO" | "OVERWRITE") identifier (column_list)? "VALUES" value_list ("," value_list)*

column_list ::= "(" identifier ("," identifier)* ")"

value_list ::= "(" literal ("," literal)* ")"

insert_select_statement ::= "INSERT" ("INTO" | "OVERWRITE") identifier (column_list)? select_statement

Copy into statement:

copy_into_statement ::= "COPY" "INTO" identifier "FROM" (internal_stage | external_stage | external_location) (files)? (pattern)? (file_format)? (copy_options)?

files ::= "FILES" "=" (string_literal ("," string_literal)*)

pattern ::= "PATTERN" "=" string_literal

Scalar expression

TODO

@leiysky
Copy link
Contributor Author

leiysky commented Apr 18, 2022

can we embed some docs about statement in the grammar spec?

so we can generate the docs like https://databend.rs/doc/reference/sql/ddl/stage/ddl-create-stage from the spec, maybe this could make it easier to keep the docs consistent with the code.

Yes, it can be integrated with CI or else in the future.

This RFC is aiming to provide a formal specification of Databend SQL grammar. The documentation work won't be involved for now.

@leiysky
Copy link
Contributor Author

leiysky commented Apr 18, 2022

Reserved words

  • SELECT
  • DISTINCT
  • FROM
  • AS
  • WHERE
  • GROUP
  • BY
  • HAVING
  • ORDER
  • LIMIT
  • ASC
  • DESC
  • Wildcard ("*")
  • Paren ("()")
  • INNER
  • LEFT
  • RIGHT
  • FULL
  • OUTER
  • JOIN
  • ON
  • USING
  • NATURAL
  • CREATE
  • TABLE
  • IF
  • NOT
  • EXISTS
  • BOOLEAN
  • TINYINT
  • SMALLINT
  • INT
  • BIGINT
  • UNSIGNED
  • FLOAT
  • DOUBLE
  • DATE
  • DATETIME
  • TIMESTAMP
  • VARCHAR
  • ARRAY
  • OBJECT
  • VARIANT
  • NULL
  • DEFAULT
  • ALTER
  • RENAME
  • TO
  • DROP
  • DATABASE
  • SCHEMA
  • STAGE
  • URL
  • CREDENTIALS
  • AWS_KEY_ID
  • AWS_SECRET_KEY
  • ENDPOINT_URL
  • FILE_FORMAT
  • TYPE
  • CSV
  • PARQUET
  • RECORD_DELIMITER
  • Equality sign ("=")
  • FIELD_DELIMITER
  • SKIP_HEADER
  • COPY_OPTIONS
  • SIZE_LIMIT
  • COMMENT
  • LIST
  • PATTERN
  • INSERT
  • OVERWRITE
  • VALUES
  • COPY
  • FILES
  • CASE
  • WHEN
  • THEN
  • EXTRACT
  • INTERVAL
  • YEAR
  • MONTH
  • DAY

@Xuanwo
Copy link
Member

Xuanwo commented Sep 16, 2022

I like this! We can use this spec to generate a better reading diagram in SQL docs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-documentation Category: documentation C-proposal Category: proposal
Projects
None yet
Development

No branches or pull requests

4 participants