diff --git a/docs/concepts/collections/index.md b/docs/concepts/collections/index.md index 400434286443..d417745fc5a3 100644 --- a/docs/concepts/collections/index.md +++ b/docs/concepts/collections/index.md @@ -35,8 +35,7 @@ defined schema. Only data that conforms to the schema can be added to the collection. ksqlDB supports two abstractions for representing collections: -[streams](streams.md) and [tables](tables.md). Both operate under a simple -key/value model. +streams and tables. Both operate under a simple key/value model. Streams ------- diff --git a/docs/concepts/collections/streams.md b/docs/concepts/collections/streams.md deleted file mode 100644 index 254723e61023..000000000000 --- a/docs/concepts/collections/streams.md +++ /dev/null @@ -1,92 +0,0 @@ ---- -layout: page -title: Streams -tagline: Stream collections in ksqlDB -description: Learn about streams of events in ksqlDB. -keywords: ksqldb, collection, stream ---- - -A stream is a durable, partitioned sequence of immutable events. When a new -event is added a stream, it's appended to the partition that its key belongs -to. Streams are useful for modeling a historical sequence of activity. For -example, you might use a stream to model a series of customer purchases or a -sequence of readings from a sensor. Under the hood, streams are simply stored -as {{ site.aktm }} topics with an enforced schema. You can create a stream from -scratch or declare a stream on top of an existing {{ site.ak }} topic. In both -cases, you can specify a variety of configuration options. - -Create a stream from scratch ------------------------------- - -When you create a stream from scratch, a backing {{ site.ak }} topic is created -automatically. Use the CREATE STREAM statement to create a stream from scratch, -and give it a name, schema, and configuration options. The following statement -registers a `publications` stream on a topic named `publication_events`. Events -in the `publications` stream are distributed over 3 partitions, are keyed on -the `author` column, and are serialized in the Avro format. - -```sql -CREATE STREAM publications ( - author VARCHAR KEY, - title VARCHAR - ) WITH ( - kafka_topic = 'publication_events', - value_format = 'avro', - partitions = 3 - ); -``` - -In this example, a new stream named `publications` is created with two columns: -`author` and `title`. Both are of type `VARCHAR`. ksqlDB automatically creates -an underlying `publication_events` topic that you can access freely. The topic -has 3 partitions, and any new events that are appended to the stream are hashed -according to the value of the `author` column. Because {{ site.ak }} can store -data in a variety of formats, we let ksqlDB know that we want the value portion -of each row stored in the Avro format. You can use a variety of configuration -options in the final `WITH` clause. - -!!! note - If you create a stream from scratch, you must supply the number of - partitions. - -Create a stream over an existing Kafka topic --------------------------------------------- - -You can also create a stream on top of an existing {{ site.ak }} topic. -Internally, ksqlDB simply registers the topic with the provided schema -and doesn't create anything new. - -```sql -CREATE STREAM publications ( - author VARCHAR KEY, - title VARCHAR - ) WITH ( - kafka_topic = 'publication_events', - value_format = 'avro' - ); -``` - -Because the topic already exists, you do not need to specify the number of partitions. - -It's important that the columns you define match the data in the existing topic. -In this case, the message would need a `KAFKA` serialized `VARCHAR` in the message key -and an `AVRO` serialized record containing a `title` field in the message value. - -If both the `author` and `title` columns are in the message value, you can write: - -```sql -CREATE STREAM publications ( - author VARCHAR, - title VARCHAR - ) WITH ( - kafka_topic = 'publication_events', - value_format = 'avro' - ); -``` - -Notice the `author` column is no longer marked with the `KEY` keyword, so it is now -read from the message value. - -If an underlying event in the {{ site.ak }} topic doesn’t conform to the given -stream schema, the event is discarded at read-time, and an error is added to the -[processing log](../../developer-guide/test-and-debug/processing-log.md). diff --git a/docs/concepts/collections/tables.md b/docs/concepts/collections/tables.md deleted file mode 100644 index 81377a18fd09..000000000000 --- a/docs/concepts/collections/tables.md +++ /dev/null @@ -1,93 +0,0 @@ ---- -layout: page -title: Tables -tagline: Table collections in ksqlDB -description: Learn about tables of events in ksqlDB. -keywords: ksqldb, collection, table ---- - -A table is a durable, partitioned collection that models change over time. -It's the mutable counterpart to the immutable [stream](streams.md). By contrast -to streams, which represent a historical sequence of events, tables represent -what is true as of “now”. For example, you might model the locations that -someone has lived at as a stream: first Miami, then New York, then London, -and so forth. You can use a table to roll up this information and tell you -where they live right now. Tables can also be used to materialize a view by -incrementally aggregating a stream of events. - -Tables work by leveraging the keys of each event. Keys are used to denote -identity. If a sequence of events shares a key, the last event for a given key -represents the most up-to-date information. Under the hood, ksqlDB uses Kafka’s -notion of a *compacted topic* to make this work. Compaction is a process that -periodically deletes all but the newest events for each key. For more -information, see -[Log Compaction](https://kafka.apache.org/documentation/#compaction). - -You can create a table from scratch or declare a table on top of an existing -{{ site.aktm }} topic. You can supply a variety of configuration options. In -either case, the table is not *materialized*, which limits its ability to be -queried. Only tables that are derived from other collections are materialized. -For more information, see [Materialized Views](../materialized-views.md). - -Create a table from scratch ---------------------------- - -When you create a table from scratch, a backing compacted {{ site.ak }} topic -is created automatically. Use the -[CREATE TABLE](../../developer-guide/ksqldb-reference/create-table.md) -statement to create a table from scratch, and give it a name, schema, and -configuration options. The following statement registers a `movies` table on a -topic named `movies`. Events in the `movies` table are distributed over 5 -partitions, are keyed on the `title` column, and are serialized in the Avro -format. - -```sql -CREATE TABLE movies ( - title VARCHAR PRIMARY KEY, - release_year INT - ) WITH ( - kafka_topic = 'movies', - value_format = 'avro', - partitions = 5 - ); -``` - -In this example, a new table named `movies` is created with two columns: -`title` and `release_year`. ksqlDB automatically creates an underlying `movies` -topic that you can access freely. The topic has 5 partitions, and any new -events that are integrated into the table are hashed according to the value -of the `title` column. Because {{ site.ak }} can store data in a variety of -formats, we let ksqlDB know that we want the value portion of each row stored -in the Avro format. You can use a variety of configuration options in the final -WITH clause. - -!!! note - If you create a table from scratch, you must supply the number of - partitions. - -Create a table over an existing Kafka topic -------------------------------------------- - -You can also create a table on top of an existing {{ site.ak }} topic. -Internally, ksqlDB simply registers the topic with the provided schema -and doesn't create anything new. - -```sql -CREATE TABLE movies ( - title VARCHAR PRIMARY KEY, - release_year INT - ) WITH ( - kafka_topic = 'movies', - value_format = 'avro' - ); -``` - -Because the topic already exists, you do not need to specify the number of partitions. - -It's important that the columns you define match the data in the existing topic. -In this case, the message would need a `KAFKA` serialized `VARCHAR` in the message key -and an `AVRO` serialized record containing a `release_year` field in the message value. - -If an underlying event in the {{ site.ak }} topic doesn’t conform to the given -table schema, the event is discarded at read-time, and an error is added to the -[processing log](../../developer-guide/test-and-debug/processing-log.md). diff --git a/docs/developer-guide/ksqldb-reference/create-stream-as-select.md b/docs/developer-guide/ksqldb-reference/create-stream-as-select.md index 3de4bf7a2d49..2dd3bdae7d25 100644 --- a/docs/developer-guide/ksqldb-reference/create-stream-as-select.md +++ b/docs/developer-guide/ksqldb-reference/create-stream-as-select.md @@ -1,116 +1,116 @@ ---- -layout: page -title: CREATE STREAM AS SELECT -tagline: ksqlDB CREATE STREAM AS SELECT statement -description: Syntax for the CREATE STREAM AS SELECT statement in ksqlDB -keywords: ksqlDB, create, stream, push query ---- - -CREATE STREAM AS SELECT -======================= - -Synopsis --------- - -```sql -CREATE [OR REPLACE] STREAM stream_name - [WITH ( property_name = expression [, ...] )] - AS SELECT select_expr [, ...] - FROM from_stream - [[ LEFT | FULL | INNER ] JOIN [join_table | join_stream] [ WITHIN [(before TIMEUNIT, after TIMEUNIT) | N TIMEUNIT] ] ON join_criteria]* - [ WHERE condition ] - [PARTITION BY column_name] - EMIT CHANGES; -``` - -Description ------------ - -Create a new materialized stream view, along with the corresponding Kafka topic, and -stream the result of the query into the topic. - -The PARTITION BY clause, if supplied, is applied to the source _after_ any JOIN or WHERE clauses, -and _before_ the SELECT clause, in much the same way as GROUP BY. - -Joins to streams can use any stream column. If the join criteria is not the key column of the stream -ksqlDB will internally repartition the data. - -!!! important - {{ site.ak }} guarantees the relative order of any two messages from - one source partition only if they are also both in the same partition - *after* the repartition. Otherwise, {{ site.ak }} is likely to interleave - messages. The use case will determine if these ordering guarantees are - acceptable. - -Joins to tables must use the table's PRIMARY KEY as the join criteria: none primary key joins are -[not yet supported](https://github.com/confluentinc/ksql/issues/4424). -For more information, see [Join Event Streams with ksqlDB](../joins/join-streams-and-tables.md). - -See [Partition Data to Enable Joins](../joins/partition-data.md) for more information about how to -correctly partition your data for joins. - -For stream-stream joins, you must specify a WITHIN clause for matching -records that both occur within a specified time interval. For valid time -units, see [Time Units](../syntax-reference.md#time-units). - -The key of the resulting stream is determined by the following rules, in order of priority: - 1. if the query has a `PARTITION BY`: - 1. if the `PARITION BY` is on a single source column reference, the key will match the - name, type and contents of the source column. - 1. otherwise the key will have a system generated name, unless you provide an alias in the - projection, and will match the type and contents of the result of the expression. - 1. if the query has a join see [Join Synthetic Key Columns](../joins/synthetic-keys) for more info. - 1. otherwise, the primary key will match the name, unless you provide an alias in the projection, - and type of the source stream's key. - -The projection must include all columns required in the result, including any key columns. - -For supported [serialization formats](../serialization.md#serialization-formats), -ksqlDB can integrate with [Confluent Schema Registry](https://docs.confluent.io/current/schema-registry/index.html). -ksqlDB registers the value schema of the new stream with {{ site.sr }} automatically. -The schema is registered under the subject `-value`. - -The WITH clause for the result supports the following properties: - -| Property | Description | -| ----------------- | ---------------------------------------------------------------------------------------------------- | -| KAFKA_TOPIC | The name of the Kafka topic that backs this stream. If this property is not set, then the name of the stream in upper case will be used as default. | -| VALUE_FORMAT | Specifies the serialization format of the message value in the topic. For supported formats, see [Serialization Formats](../serialization.md#serialization-formats). If this property is not set, the format of the input stream/table is used. | -| VALUE_DELIMITER | Used when VALUE_FORMAT='DELIMITED'. Supports single character to be a delimiter, defaults to ','. For space and tab delimited values you must use the special values 'SPACE' or 'TAB', not an actual space or tab character. | -| PARTITIONS | The number of partitions in the backing topic. If this property is not set, then the number of partitions of the input stream/table will be used. In join queries, the property values are taken from the left-side stream or table. | -| REPLICAS | The replication factor for the topic. If this property is not set, then the number of replicas of the input stream or table will be used. In join queries, the property values are taken from the left-side stream or table. | -| TIMESTAMP | Sets a column within this stream's schema to be used as the default source of `ROWTIME` for any downstream queries. Downstream queries that use time-based operations, such as windowing, will process records in this stream based on the timestamp in this column. The column will be used to set the timestamp on any records emitted to Kafka. Timestamps have a millisecond accuracy. If not supplied, the `ROWTIME` of the source stream is used.
**Note**: This doesn't affect the processing of the query that populates this stream. For example, given the following statement:
CREATE STREAM foo WITH (TIMESTAMP='t2') AS
SELECT * FROM bar
WINDOW TUMBLING (size 10 seconds);
EMIT CHANGES;
The window into which each row of `bar` is placed is determined by bar's `ROWTIME`, not `t2`. | -| TIMESTAMP_FORMAT | Used in conjunction with TIMESTAMP. If not set, ksqlDB timestamp column must be of type `bigint`. When set, the TIMESTAMP column must be of type `varchar` and have a format that can be parsed with the Java `DateTimeFormatter`. If your timestamp format has characters requiring single quotes, you can escape them with two successive single quotes, `''`, for example: `'yyyy-MM-dd''T''HH:mm:ssX'`. For more information on timestamp formats, see [DateTimeFormatter](https://cnfl.io/java-dtf). | -| WRAP_SINGLE_VALUE | Controls how values are serialized where the values schema contains only a single column. This setting controls how the query serializes values with a single-column schema.
If set to `true`, ksqlDB serializes the column as a named column within a record.
If set to `false`, ksqlDB serializes the column as an anonymous value.
If not supplied, the system default, defined by [ksql.persistence.wrap.single.values](../../operate-and-deploy/installation/server-config/config-reference.md#ksqlpersistencewrapsinglevalues), then the format's default is used.
**Note:** `null` values have special meaning in ksqlDB. Care should be taken when dealing with single-column schemas where the value can be `null`. For more information, see [Single column (un)wrapping](../serialization.md#single-field-unwrapping).
**Note:** Supplying this property for formats that do not support wrapping, for example `DELIMITED`, or when the value schema has multiple columns, results in an error. | - - -!!! note - - To use Avro, you must have {{ site.sr }} enabled and - `ksql.schema.registry.url` must be set in the ksqlDB server configuration - file. See [Configure ksqlDB for Avro, Protobuf, and JSON schemas](../../operate-and-deploy/installation/server-config/avro-schema.md). - - Avro field names are not case sensitive in ksqlDB. This matches the ksqlDB - column name behavior. - -Example -------- - -```sql --- Create a view that filters an existing stream: -CREATE STREAM filtered AS - SELECT - a, - few, - columns - FROM source_stream; - --- Create a view that enriches a stream with a table lookup: -CREATE STREAM enriched AS - SELECT - cs.*, - u.name, - u.classification, - u.level - FROM clickstream cs - JOIN users u ON u.id = cs.userId; -``` - +--- +layout: page +title: CREATE STREAM AS SELECT +tagline: ksqlDB CREATE STREAM AS SELECT statement +description: Syntax for the CREATE STREAM AS SELECT statement in ksqlDB +keywords: ksqlDB, create, stream, push query +--- + +CREATE STREAM AS SELECT +======================= + +Synopsis +-------- + +```sql +CREATE [OR REPLACE] STREAM stream_name + [WITH ( property_name = expression [, ...] )] + AS SELECT select_expr [, ...] + FROM from_stream + [[ LEFT | FULL | INNER ] JOIN [join_table | join_stream] [ WITHIN [(before TIMEUNIT, after TIMEUNIT) | N TIMEUNIT] ] ON join_criteria]* + [ WHERE condition ] + [PARTITION BY column_name] + EMIT CHANGES; +``` + +Description +----------- + +Create a new materialized stream view, along with the corresponding Kafka topic, and +stream the result of the query into the topic. + +The PARTITION BY clause, if supplied, is applied to the source _after_ any JOIN or WHERE clauses, +and _before_ the SELECT clause, in much the same way as GROUP BY. + +Joins to streams can use any stream column. If the join criteria is not the key column of the stream +ksqlDB will internally repartition the data. + +!!! important + {{ site.ak }} guarantees the relative order of any two messages from + one source partition only if they are also both in the same partition + *after* the repartition. Otherwise, {{ site.ak }} is likely to interleave + messages. The use case will determine if these ordering guarantees are + acceptable. + +Joins to tables must use the table's PRIMARY KEY as the join criteria: none primary key joins are +[not yet supported](https://github.com/confluentinc/ksql/issues/4424). +For more information, see [Join Event Streams with ksqlDB](../joins/join-streams-and-tables.md). + +See [Partition Data to Enable Joins](../joins/partition-data.md) for more information about how to +correctly partition your data for joins. + +For stream-stream joins, you must specify a WITHIN clause for matching +records that both occur within a specified time interval. For valid time +units, see [Time Units](../syntax-reference.md#time-units). + +The key of the resulting stream is determined by the following rules, in order of priority: + 1. if the query has a `PARTITION BY`: + 1. if the `PARITION BY` is on a single source column reference, the key will match the + name, type and contents of the source column. + 1. otherwise the key will have a system generated name, unless you provide an alias in the + projection, and will match the type and contents of the result of the expression. + 1. if the query has a join see [Join Synthetic Key Columns](../joins/synthetic-keys) for more info. + 1. otherwise, the primary key will match the name, unless you provide an alias in the projection, + and type of the source stream's key. + +The projection must include all columns required in the result, including any key columns. + +For supported [serialization formats](../../developer-guide/serialization.md), +ksqlDB can integrate with [Confluent Schema Registry](https://docs.confluent.io/current/schema-registry/index.html). +ksqlDB registers the value schema of the new stream with {{ site.sr }} automatically. +The schema is registered under the subject `-value`. + +The WITH clause for the result supports the following properties: + +| Property | Description | +| ----------------- | ---------------------------------------------------------------------------------------------------- | +| KAFKA_TOPIC | The name of the Kafka topic that backs this stream. If this property is not set, then the name of the stream in upper case will be used as default. | +| VALUE_FORMAT | Specifies the serialization format of the message value in the topic. For supported formats, see [Serialization Formats](../serialization.md#serialization-formats). If this property is not set, the format of the input stream/table is used. | +| VALUE_DELIMITER | Used when VALUE_FORMAT='DELIMITED'. Supports single character to be a delimiter, defaults to ','. For space and tab delimited values you must use the special values 'SPACE' or 'TAB', not an actual space or tab character. | +| PARTITIONS | The number of partitions in the backing topic. If this property is not set, then the number of partitions of the input stream/table will be used. In join queries, the property values are taken from the left-side stream or table. | +| REPLICAS | The replication factor for the topic. If this property is not set, then the number of replicas of the input stream or table will be used. In join queries, the property values are taken from the left-side stream or table. | +| TIMESTAMP | Sets a column within this stream's schema to be used as the default source of `ROWTIME` for any downstream queries. Downstream queries that use time-based operations, such as windowing, will process records in this stream based on the timestamp in this column. The column will be used to set the timestamp on any records emitted to Kafka. Timestamps have a millisecond accuracy. If not supplied, the `ROWTIME` of the source stream is used.
**Note**: This doesn't affect the processing of the query that populates this stream. For example, given the following statement:
CREATE STREAM foo WITH (TIMESTAMP='t2') AS
SELECT * FROM bar
WINDOW TUMBLING (size 10 seconds);
EMIT CHANGES;
The window into which each row of `bar` is placed is determined by bar's `ROWTIME`, not `t2`. | +| TIMESTAMP_FORMAT | Used in conjunction with TIMESTAMP. If not set, ksqlDB timestamp column must be of type `bigint`. When set, the TIMESTAMP column must be of type `varchar` and have a format that can be parsed with the Java `DateTimeFormatter`. If your timestamp format has characters requiring single quotes, you can escape them with two successive single quotes, `''`, for example: `'yyyy-MM-dd''T''HH:mm:ssX'`. For more information on timestamp formats, see [DateTimeFormatter](https://cnfl.io/java-dtf). | +| WRAP_SINGLE_VALUE | Controls how values are serialized where the values schema contains only a single column. This setting controls how the query serializes values with a single-column schema.
If set to `true`, ksqlDB serializes the column as a named column within a record.
If set to `false`, ksqlDB serializes the column as an anonymous value.
If not supplied, the system default, defined by [ksql.persistence.wrap.single.values](../../operate-and-deploy/installation/server-config/config-reference.md#ksqlpersistencewrapsinglevalues), then the format's default is used.
**Note:** `null` values have special meaning in ksqlDB. Care should be taken when dealing with single-column schemas where the value can be `null`. For more information, see [Single column (un)wrapping](../serialization.md#single-field-unwrapping).
**Note:** Supplying this property for formats that do not support wrapping, for example `DELIMITED`, or when the value schema has multiple columns, results in an error. | + + +!!! note + - To use Avro, you must have {{ site.sr }} enabled and + `ksql.schema.registry.url` must be set in the ksqlDB server configuration + file. See [Configure ksqlDB for Avro, Protobuf, and JSON schemas](../../operate-and-deploy/installation/server-config/avro-schema.md). + - Avro field names are not case sensitive in ksqlDB. This matches the ksqlDB + column name behavior. + +Example +------- + +```sql +-- Create a view that filters an existing stream: +CREATE STREAM filtered AS + SELECT + a, + few, + columns + FROM source_stream; + +-- Create a view that enriches a stream with a table lookup: +CREATE STREAM enriched AS + SELECT + cs.*, + u.name, + u.classification, + u.level + FROM clickstream cs + JOIN users u ON u.id = cs.userId; +``` + diff --git a/docs/developer-guide/ksqldb-reference/create-table-as-select.md b/docs/developer-guide/ksqldb-reference/create-table-as-select.md index 621584fcc117..dd3d3491b647 100644 --- a/docs/developer-guide/ksqldb-reference/create-table-as-select.md +++ b/docs/developer-guide/ksqldb-reference/create-table-as-select.md @@ -1,120 +1,120 @@ ---- -layout: page -title: CREATE TABLE AS SELECT -tagline: ksqlDB CREATE TABLE AS SELECT statement -description: Syntax for the CREATE TABLE AS SELECT statement in ksqlDB -keywords: ksqlDB, create, table, push query ---- - -CREATE TABLE AS SELECT -====================== - -Synopsis --------- - -```sql -CREATE [OR REPLACE] TABLE table_name - [WITH ( property_name = expression [, ...] )] - AS SELECT select_expr [, ...] - FROM from_item - [[ LEFT | FULL | INNER ] JOIN [join_table | join_stream] ON join_criteria]* - [ WINDOW window_expression ] - [ WHERE condition ] - [ GROUP BY grouping_expression ] - [ HAVING having_expression ] - [ EMIT output_refinement ]; -``` - -Description ------------ - -Create a new ksqlDB materialized table view, along with the corresponding Kafka topic, and -stream the result of the query as a changelog into the topic. - -Note that the WINDOW clause can only be used if the `from_item` is a stream and the query contains -a `GROUP BY` clause. - -Note that EMIT `output_refinement` defaults to `CHANGES` unless explicitly set to `FINAL` on a -windowed aggregation. - -Joins to streams can use any stream column. If the join criteria is not the key column of the stream -ksqlDB will internally repartition the data. - -!!! important - {{ site.ak }} guarantees the relative order of any two messages from - one source partition only if they are also both in the same partition - *after* the repartition. Otherwise, {{ site.ak }} is likely to interleave - messages. The use case will determine if these ordering guarantees are - acceptable. - -Joins to tables must use the table's PRIMARY KEY as the join criteria: none primary key joins are -[not yet supported](https://github.com/confluentinc/ksql/issues/4424). -For more information, see [Join Event Streams with ksqlDB](../joins/join-streams-and-tables.md). - -See [Partition Data to Enable Joins](../joins/partition-data.md) for more information about how to -correctly partition your data for joins. - -The primary key of the resulting table is determined by the following rules, in order of priority: - 1. if the query has a `GROUP BY`: - 1. if the `GROUP BY` is on a single source column reference, the primary key will match the - name, type and contents of the source column. - 1. if the `GROUP BY` is any other single expression, the primary key will have a system - generated name, unless you provide an alias in the projection, and will match the type and - contents of the result of the expression. - 1. otherwise, the primary key will have a system generated name, and will be of type `STRING` - and contain the grouping expression concatenated together. - 1. if the query has a join see [Join Synthetic Key Columns](../joins/synthetic-keys) for more info. - 1. otherwise, the primary key will match the name, unless you provide an alias in the projection, - and type of the source table's primary key. - -The projection must include all columns required in the result, including any primary key columns. - -For supported [serialization formats](../serialization.md#serialization-formats), -ksqlDB can integrate with the [Confluent Schema Registry](https://docs.confluent.io/current/schema-registry/index.html). -ksqlDB registers the value schema of the new table with {{ site.sr }} automatically. -The schema is registered under the subject `-value`. - -The WITH clause supports the following properties: - -| Property | Description | -| ----------------- | ---------------------------------------------------------------------------------------------------- | -| KAFKA_TOPIC | The name of the Kafka topic that backs this table. If this property is not set, then the name of the table will be used as default. | -| VALUE_FORMAT | Specifies the serialization format of the message value in the topic. For supported formats, see [Serialization Formats](../serialization.md#serialization-formats). If this property is not set, then the format of the input stream/table is used. | -| VALUE_DELIMITER | Used when VALUE_FORMAT='DELIMITED'. Supports single character to be a delimiter, defaults to ','. For space and tab delimited values you must use the special values 'SPACE' or 'TAB', not an actual space or tab character. | -| PARTITIONS | The number of partitions in the backing topic. If this property is not set, then the number of partitions of the input stream/table will be used. In join queries, the property values are taken from the left-side stream or table. | -| REPLICAS | The replication factor for the topic. If this property is not set, then the number of replicas of the input stream or table will be used. In join queries, the property values are taken from the left-side stream or table. | -| TIMESTAMP | Sets a column within this stream's schema to be used as the default source of `ROWTIME` for any downstream queries. Downstream queries that use time-based operations, such as windowing, will process records in this stream based on the timestamp in this column. The column will be used to set the timestamp on any records emitted to Kafka. Timestamps have a millisecond accuracy. If not supplied, the `ROWTIME` of the source stream is used.
**Note**: This doesn't affect the processing of the query that populates this stream. For example, given the following statement:
CREATE STREAM foo WITH (TIMESTAMP='t2') AS
SELECT * FROM bar
WINDOW TUMBLING (size 10 seconds);
EMIT CHANGES;
The window into which each row of `bar` is placed is determined by bar's `ROWTIME`, not `t2`. | -| TIMESTAMP_FORMAT | Used in conjunction with TIMESTAMP. If not set the timestamp column must be of type `bigint`. If it is set, then the TIMESTAMP column must be of type varchar and have a format that can be parsed with the Java `DateTimeFormatter`. If your timestamp format has characters requiring single quotes, you can escape them with two successive single quotes, `''`, for example: `'yyyy-MM-dd''T''HH:mm:ssX'`. For more information on timestamp formats, see [DateTimeFormatter](https://cnfl.io/java-dtf). | -| WRAP_SINGLE_VALUE | Controls how values are serialized where the values schema contains only a single column. The setting controls how the query will serialize values with a single-column schema.
If set to `true`, ksqlDB will serialize the column as a named column within a record.
If set to `false`, ksqlDB will serialize the column as an anonymous value.
If not supplied, the system default, defined by [ksql.persistence.wrap.single.values](../../operate-and-deploy/installation/server-config/config-reference.md#ksqlpersistencewrapsinglevalues), then the format's default is used.
**Note:** `null` values have special meaning in ksqlDB. Care should be taken when dealing with single-column schemas where the value can be `null`. For more information, see [Single column (un)wrapping](../serialization.md#single-field-unwrapping).
**Note:** Supplying this property for formats that do not support wrapping, for example `DELIMITED`, or when the value schema has multiple columns, will result in an error. | - - -!!! note - - To use Avro or Protobuf, you must have {{ site.sr }} enabled and - `ksql.schema.registry.url` must be set in the ksqlDB server configuration - file. See [Configure ksqlDB for Avro, Protobuf, and JSON schemas](../../operate-and-deploy/installation/server-config/avro-schema.md#configure-avro-and-schema-registry-for-ksql). - - Avro and Protobuf field names are not case sensitive in ksqlDB. This matches the ksqlDB - column name behavior. - -Example -------- - -```sql --- Derive a new view from an existing table: -CREATE TABLE derived AS - SELECT - a, - b, - d - FROM source - WHERE A is not null; - --- Or, join a stream of play events to a songs table, windowing weekly, to create a weekly chart: -CREATE TABLE weeklyMusicCharts AS - SELECT - s.songName, - count(1) AS playCount - FROM playStream p - JOIN songs s ON p.song_id = s.id - WINDOW TUMBLING (7 DAYS) - GROUP BY s.songName; -``` +--- +layout: page +title: CREATE TABLE AS SELECT +tagline: ksqlDB CREATE TABLE AS SELECT statement +description: Syntax for the CREATE TABLE AS SELECT statement in ksqlDB +keywords: ksqlDB, create, table, push query +--- + +CREATE TABLE AS SELECT +====================== + +Synopsis +-------- + +```sql +CREATE [OR REPLACE] TABLE table_name + [WITH ( property_name = expression [, ...] )] + AS SELECT select_expr [, ...] + FROM from_item + [[ LEFT | FULL | INNER ] JOIN [join_table | join_stream] ON join_criteria]* + [ WINDOW window_expression ] + [ WHERE condition ] + [ GROUP BY grouping_expression ] + [ HAVING having_expression ] + [ EMIT output_refinement ]; +``` + +Description +----------- + +Create a new ksqlDB materialized table view, along with the corresponding Kafka topic, and +stream the result of the query as a changelog into the topic. + +Note that the WINDOW clause can only be used if the `from_item` is a stream and the query contains +a `GROUP BY` clause. + +Note that EMIT `output_refinement` defaults to `CHANGES` unless explicitly set to `FINAL` on a +windowed aggregation. + +Joins to streams can use any stream column. If the join criteria is not the key column of the stream +ksqlDB will internally repartition the data. + +!!! important + {{ site.ak }} guarantees the relative order of any two messages from + one source partition only if they are also both in the same partition + *after* the repartition. Otherwise, {{ site.ak }} is likely to interleave + messages. The use case will determine if these ordering guarantees are + acceptable. + +Joins to tables must use the table's PRIMARY KEY as the join criteria: none primary key joins are +[not yet supported](https://github.com/confluentinc/ksql/issues/4424). +For more information, see [Join Event Streams with ksqlDB](../joins/join-streams-and-tables.md). + +See [Partition Data to Enable Joins](../joins/partition-data.md) for more information about how to +correctly partition your data for joins. + +The primary key of the resulting table is determined by the following rules, in order of priority: + 1. if the query has a `GROUP BY`: + 1. if the `GROUP BY` is on a single source column reference, the primary key will match the + name, type and contents of the source column. + 1. if the `GROUP BY` is any other single expression, the primary key will have a system + generated name, unless you provide an alias in the projection, and will match the type and + contents of the result of the expression. + 1. otherwise, the primary key will have a system generated name, and will be of type `STRING` + and contain the grouping expression concatenated together. + 1. if the query has a join see [Join Synthetic Key Columns](../joins/synthetic-keys) for more info. + 1. otherwise, the primary key will match the name, unless you provide an alias in the projection, + and type of the source table's primary key. + +The projection must include all columns required in the result, including any primary key columns. + +For supported [serialization formats](../../developer-guide/serialization.md), +ksqlDB can integrate with the [Confluent Schema Registry](https://docs.confluent.io/current/schema-registry/index.html). +ksqlDB registers the value schema of the new table with {{ site.sr }} automatically. +The schema is registered under the subject `-value`. + +The WITH clause supports the following properties: + +| Property | Description | +| ----------------- | ---------------------------------------------------------------------------------------------------- | +| KAFKA_TOPIC | The name of the Kafka topic that backs this table. If this property is not set, then the name of the table will be used as default. | +| VALUE_FORMAT | Specifies the serialization format of the message value in the topic. For supported formats, see [Serialization Formats](../serialization.md#serialization-formats). If this property is not set, then the format of the input stream/table is used. | +| VALUE_DELIMITER | Used when VALUE_FORMAT='DELIMITED'. Supports single character to be a delimiter, defaults to ','. For space and tab delimited values you must use the special values 'SPACE' or 'TAB', not an actual space or tab character. | +| PARTITIONS | The number of partitions in the backing topic. If this property is not set, then the number of partitions of the input stream/table will be used. In join queries, the property values are taken from the left-side stream or table. | +| REPLICAS | The replication factor for the topic. If this property is not set, then the number of replicas of the input stream or table will be used. In join queries, the property values are taken from the left-side stream or table. | +| TIMESTAMP | Sets a column within this stream's schema to be used as the default source of `ROWTIME` for any downstream queries. Downstream queries that use time-based operations, such as windowing, will process records in this stream based on the timestamp in this column. The column will be used to set the timestamp on any records emitted to Kafka. Timestamps have a millisecond accuracy. If not supplied, the `ROWTIME` of the source stream is used.
**Note**: This doesn't affect the processing of the query that populates this stream. For example, given the following statement:
CREATE STREAM foo WITH (TIMESTAMP='t2') AS
SELECT * FROM bar
WINDOW TUMBLING (size 10 seconds);
EMIT CHANGES;
The window into which each row of `bar` is placed is determined by bar's `ROWTIME`, not `t2`. | +| TIMESTAMP_FORMAT | Used in conjunction with TIMESTAMP. If not set the timestamp column must be of type `bigint`. If it is set, then the TIMESTAMP column must be of type varchar and have a format that can be parsed with the Java `DateTimeFormatter`. If your timestamp format has characters requiring single quotes, you can escape them with two successive single quotes, `''`, for example: `'yyyy-MM-dd''T''HH:mm:ssX'`. For more information on timestamp formats, see [DateTimeFormatter](https://cnfl.io/java-dtf). | +| WRAP_SINGLE_VALUE | Controls how values are serialized where the values schema contains only a single column. The setting controls how the query will serialize values with a single-column schema.
If set to `true`, ksqlDB will serialize the column as a named column within a record.
If set to `false`, ksqlDB will serialize the column as an anonymous value.
If not supplied, the system default, defined by [ksql.persistence.wrap.single.values](../../operate-and-deploy/installation/server-config/config-reference.md#ksqlpersistencewrapsinglevalues), then the format's default is used.
**Note:** `null` values have special meaning in ksqlDB. Care should be taken when dealing with single-column schemas where the value can be `null`. For more information, see [Single column (un)wrapping](../serialization.md#single-field-unwrapping).
**Note:** Supplying this property for formats that do not support wrapping, for example `DELIMITED`, or when the value schema has multiple columns, will result in an error. | + + +!!! note + - To use Avro or Protobuf, you must have {{ site.sr }} enabled and + `ksql.schema.registry.url` must be set in the ksqlDB server configuration + file. See [Configure ksqlDB for Avro, Protobuf, and JSON schemas](../../operate-and-deploy/installation/server-config/avro-schema.md#configure-avro-and-schema-registry-for-ksql). + - Avro and Protobuf field names are not case sensitive in ksqlDB. This matches the ksqlDB + column name behavior. + +Example +------- + +```sql +-- Derive a new view from an existing table: +CREATE TABLE derived AS + SELECT + a, + b, + d + FROM source + WHERE A is not null; + +-- Or, join a stream of play events to a songs table, windowing weekly, to create a weekly chart: +CREATE TABLE weeklyMusicCharts AS + SELECT + s.songName, + count(1) AS playCount + FROM playStream p + JOIN songs s ON p.song_id = s.id + WINDOW TUMBLING (7 DAYS) + GROUP BY s.songName; +``` diff --git a/docs/overview/apache-kafka-primer.md b/docs/overview/apache-kafka-primer.md new file mode 100644 index 000000000000..4961c457f0f0 --- /dev/null +++ b/docs/overview/apache-kafka-primer.md @@ -0,0 +1,247 @@ +--- +layout: page +title: Apache Kafka® primer +tagline: Kafka concepts you need to use ksqlDB +description: Learn the minimum number of Kafka concepts to use ksqlDB effectively +keywords: ksqldb, kafka +--- + +ksqlDB is an event streaming database built specifically for {{ site.aktm }}. +Although it's designed to give you a higher-level set of primitives than +{{ site.ak }} has, it's inevitable that all of {{ site.ak }}'s concepts can't be, and +shouldn't be, abstracted away entirely. This section describes the minimum +number of {{ site.ak }} concepts that you need to use ksqlDB effectively. +For more information, consult the official [Apache Kafka documentation](https://kafka.apache.org/documentation/). + +## Records + +The primary unit of data in {{ site.ak }} is the event. An event models +something that happened in the world at a point in time. In {{ site.ak }}, +you represent each event using a data construct known as a record. A record +carries a few different kinds of data in it: key, value, timestamp, topic, partition, offset, and headers. + +The _key_ of a record is an arbitrary piece of data that denotes the identity +of the event. If the events are clicks on a web page, a suitable key might be +the ID of the user who did the clicking. + +The _value_ is also an arbitrary piece of data that represents the primary data of +interest. The value of a click event probably contains the page that it +happened on, the DOM element that was clicked, and other interesting tidbits +of information. + +The _timestamp_ denotes when the event happened. There are a few different "kinds" +of time that can be tracked. These aren’t discussed here, but they’re useful to +[learn about](../../../concepts/time-and-windows-in-ksqldb-queries/#time-semantics) nonetheless. + +The _topic_ and _partition_ describe which larger collection and subset of events +this particular event belongs to, and the _offset_ describes its exact position within +that larger collection (more on that below). + +Finally, the _headers_ carry arbitrary, user-supplied metadata about the record. + +ksqlDB abstracts over some of these pieces of information so you don’t need to +think about them. Others are exposed directly and are an integral part of the +programming model. For example, the fundamental unit of data in ksqlDB is the +_row_. A row is a helpful abstraction over a {{ site.ak }} record. Rows have +columns of two kinds: key columns and value columns. They also carry +pseudocolumns for metadata, like a `timestamp`. + +In general, ksqlDB avoids raising up {{ site.ak }}-level implementation details +that don’t contribute to a high-level programming model. + +## Topics + +Topics are named collections of records. Their purpose is to let you hold +events of mutual interest together. A series of click records might get stored +in a "clicks" topic so that you can access them all in one place. Topics are +append-only. Once you add a record to a topic, you can’t change or delete it +individually. + +There are no rules for what kinds of records can be placed into topics. They +don't need to conform to the same structure, relate to the same situation, or +anything like that. The way you manage publication to topics is entirely a +matter of user convention and enforcement. + +ksqlDB provides higher-level abstractions over a topic through +_[streams](../reference/sql/data-definition.md#streams)_ and +_[tables](../reference/sql/data-definition.md#tables)_. +A stream or table associates a schema with a {{ site.ak }} topic. +The schema controls the shape of records that are allowed to be stored in the +topic. This kind of static typing makes it easier to understand what sort of +rows are in your topic and generally helps you make fewer mistakes in your +programs that process them. + +## Partitions + +When a record is placed into a topic, it is placed into a particular partition. +A partition is a totally ordered sequence of records by offset. Topics may have multiple +partitions to make storage and processing more scalable. When you create a +topic, you choose how many partitions it has. + +When you append a record to a topic, a partitioning strategy chooses which +partition it is stored in. There are many partitioning strategies. The most common +one is to hash the contents of the record's key against the total number of +partitions. This has the effect of placing all records with the same identity +into the same partition, which is useful because of the strong ordering +guarantees. + +The order of the records is tracked by a piece of data known as an offset, +which is set when the record is appended. A record with offset of _10_ happened +earlier than a record in the same partition with offset of _20_. + +Much of the mechanics here are handled automatically by ksqlDB on your behalf. +When you create a stream or table, you choose the number of partitions for the +underlying topic so that you can have control over its scalability. When you +declare a schema, you choose which columns are part of the key and which are +part of the value. Beyond this, you don't need to think about individual partitions +or offsets. Here are some examples of that. + +When a record is processed, its key content is hashed so that its new downstream +partition will be consistent with all other records with the same key. When records are +appended, they follow the correct offset order, even in the presence of +failures or faults. When a stream's key content changes because of how a query +wants to process the rows (via `GROUP BY` or `PARTITION BY`), the underlying +records keys are recalculated, and the records are sent to a new partition in +the new topic set to perform the computation. + +## Producers and consumers + +Producers and consumers facilitate the movement of records to and from topics. +When an application wants to either publish records or subscribe to them, it +invokes the APIs (generally called the _client_) to do so. Clients communicate +with the brokers (see below) over a structured network protocol. + +When consumers read records from a topic, they never delete them or mutate +them in any way. This pattern of being able to repeatedly read the same +information is helpful for building multiple applications over the same data +set in a non-conflicting way. It's also the primary building block for +supporting "replay", where an application can rewind its event stream and read +old information again. + +Producers and consumers expose a fairly low-level API. You need to construct +your own records, manage their schemas, configure their serialization, and +handle what you send where. + +ksqlDB behaves as a high-level, continuous producer and consumer. You simply +declare the shape of your records, then issue high-level SQL commands that +describe how to populate, alter, and query the data. These SQL programs are +translated into low-level client API invocations that take care of the details +for you. + +## Brokers + +The brokers are servers that store and manage access to topics. Multiple brokers +can cluster together to replicate topics in a highly-available, fault-tolerant +manner. Clients communicate with the brokers to read and write records. + +When you run a ksqlDB server or cluster, each of its nodes communicates with +the {{ site.ak }} brokers to do its processing. From the {{ site.ak }} brokers' +point of view, each ksqlDB server is like a client. No processing takes place +on the broker. ksqlDB's servers do all of their computation on their own nodes. + +## Serializers + +Because no data format is a perfect fit for all problems, {{ site.ak }} was +designed to be agnostic to the data contents in the key and value portions of +its records. When records move from client to broker, the user payload (key and +value) must be transformed to byte arrays. This enables {{ site.ak }} to work +with an opaque series of bytes without needing to know anything about what they +are. When records are delivered to a consumer, those byte arrays need to be +transformed back into their original topics to be meaningful to the application. +The processes that convert to and from byte representations are called +_serialization_ and _deserialization_, respectively. + +When a producer sends a record to a topic, it must decide which serializers to +use to convert the key and value to byte arrays. The key and value +serializers are chosen independently. When a consumer receives a record, it +must decide which deserializer to use to convert the byte arrays back to +their original values. Serializers and deserializers come in pairs. If you use +a different deserializer, you won't be able to make sense of the byte contents. + +ksqlDB raises the abstraction of serialization substantially. Instead of +configuring serializers manually, you declare formats using configuration +options at stream/table creation time. Instead of having to keep track of which +topics are serialized which way, ksqlDB maintains metadata about the byte +representations of each stream and table. Consumers are configured automatically +to use the correct deserializers. + +## Schemas + +Although the records serialized to {{ site.ak }} are opaque bytes, they must have +some rules about their structure to make it possible to process them. One aspect of this +structure is the schema of the data, which defines its shape and fields. Is it +an integer? Is it a map with keys `foo`, `bar`, and `baz`? Something else? + +Without any mechanism for enforcement, schemas are implicit. A consumer, +somehow, needs to know the form of the produced data. Frequently this happens +by getting a group of people to agree verbally on the schema. This approach, +however, is error prone. It's often better if the schema can be managed +centrally, audited, and enforced programmatically. + +[Confluent {{ site.sr }}](https://docs.confluent.io/current/schema-registry/index.html), a project outside of {{ site.ak }}, helps with schema +management. {{ site.sr }} enables producers to register a topic with a schema +so that when any further data is produced, it is rejected if it doesn't +conform to the schema. Consumers can consult {{ site.sr }} to find the schema +for topics they don't know about. + +Rather than having you glue together producers, consumers, and schema +configuration, ksqlDB integrates transparently with {{ site.sr }}. By enabling +a configuration option so that the two systems can talk to each other, ksqlDB +stores all stream and table schemas in {{ site.sr }}. These schemas can then be +downloaded and used by any application working with ksqlDB data. Moreover, +ksqlDB can infer the schemas of existing topics automatically, so that you +don't need to declare their structure when you define the stream or table over +it. + +## Consumer groups + +When a consumer program boots up, it registers itself into a _consumer group_, +which multiple consumers can enter. Each time a record is eligible to be +consumed, exactly one consumer in the group reads it. This effectively provides +a way for a set of processes to coordinate and load balance the consumption of +records. + +Because the records in a single topic are meant to be consumed by one process in the group, each +partition in the subscription is read by only one consumer at a time. The number +of partitions that each consumer is responsible for is defined by the total +number of source partitions divided by the number of consumers. If a consumer +dynamically joins the group, the ownership is recomputed and the partitions +reassigned. If a consumer leaves the group, the same computation takes place. + +ksqlDB builds on this powerful load balancing primitive. When you deploy a +persistent query to a cluster of ksqlDB servers, the workload is distributed +across the cluster according to the number of source partitions. You don't need +to manage group membership explicitly, because all of this happens automatically. + +For example, if you deploy a persistent query with ten source partitions to a +ksqlDB cluster with two nodes, each node processes five partitions. If you lose +a server, the sole remaining server will rebalance automatically and process +all ten. If you now add four more servers, each rebalances to process two partitions. + +## Retention and compaction + +It is often desirable to clean up older records after some period of time. +Retention and compaction are two different options for doing this. They are both +optional and can be used in conjunction. + +Retention defines how long a record is stored before it's deleted. Retention is one of the +only ways to delete a record in a topic. This parameter is +particularly important in stream processing because it defines the time +horizon that you can replay a stream of events. Replay is useful if you're +fixing a bug, building a new application, or backtesting some existing piece of +logic. + +ksqlDB enables you to control the retention of the underlying topics of base +streams and tables directly, so it's important to understand the concept. For +more information see [Topics and Logs in the Kafka docs](https://kafka.apache.org/documentation/#intro_topics). + +Compaction, by contrast, is a process that runs in the background on each {{ site.ak }} +broker that periodically deletes all but the latest record per key. It is an +optional, opt-in process. Compaction is particularly useful when your records +represent some kind of updates to a piece of a state, and the latest update is +the only one that matters in the end. + +ksqlDB directly leverages compaction to support the underlying changelogs that +back its materialized tables. They allow ksqlDB to store the minimum amount of +information needed to rebuild a table in the event of a failover. For more +information see [Log Compaction in the Kafka docs](https://kafka.apache.org/documentation/#compaction). \ No newline at end of file diff --git a/docs/reference/sql/appendix.md b/docs/reference/sql/appendix.md new file mode 100644 index 000000000000..bd3d2874a7b9 --- /dev/null +++ b/docs/reference/sql/appendix.md @@ -0,0 +1,150 @@ +--- +layout: page +title: ksqlDB SQL keywords and operators +tagline: SQL language keywords +description: Tables listing all valid keywords and operators in ksqlDB SQL +keywords: ksqldb, sql, keyword, operators +--- + +## Keywords + +The following table shows all keywords in the language. + +| keyword | description | example | +|--------------|-----------------------------------------|----------------------------------------------------------------------| +| `ADVANCE` | hop size in hopping window | `WINDOW HOPPING (SIZE 30 SECONDS, ADVANCE BY 10 SECONDS)` | +| `ALL` | list hidden topics | `SHOW ALL TOPICS` | +| `AND` | logical "and" operator | `WHERE userid<>'User_1' AND userid<>'User_2'` | +| `ARRAY` | one-indexed array of elements | `SELECT ARRAY[1, 2] FROM s1 EMIT CHANGES;` | +| `AS` | alias a column, expression, or type | | +| `BEGINNING` | print from start of topic | `PRINT FROM BEGINNING;` | +| `BETWEEN` | constrain a value to a range | `SELECT event FROM events WHERE event_id BETWEEN 10 AND 20 …` | +| `BY` | specify expression | `GROUP BY regionid`, `ADVANCE BY 10 SECONDS`, `PARTITION BY userid` | +| `CASE` | select a condition from expressions | `SELECT CASE WHEN condition THEN result [ WHEN … THEN … ] … END` | +| `CAST` | change expression type | `SELECT id, CONCAT(CAST(COUNT(*) AS VARCHAR), '_HELLO') FROM views …`| +| `CHANGES` | specify incremental refinement type | `SELECT * FROM users EMIT CHANGES;` | +| `CONNECTOR` | manage a connector | `CREATE SOURCE CONNECTOR 'jdbc-connector' WITH( …` | +| `CONNECTORS` | list all connectors | `SHOW CONNECTORS;` | +| `CREATE` | create an object | `CREATE STREAM rock_songs (artist VARCHAR, title VARCHAR) …` | +| `DAY` | time unit of one day for a window | `WINDOW TUMBLING (SIZE 30 SECONDS, RETENTION 1 DAY)` | +| `DAYS` | time unit of days for a window | `WINDOW TUMBLING (SIZE 30 SECONDS, RETENTION 1000 DAYS)` | +| `DECIMAL` | decimal numeric type | | +| `DELETE` | remove a {{ site.ak}} topic | `DROP TABLE DELETE TOPIC;` | +| `DESCRIBE` | list details for an object | `DESCRIBE PAGEVIEWS;` | +| `DROP` | delete an object | `DROP CONNECTOR ;` | +| `ELSE` | condition in `WHEN` statement | `CASE WHEN units<2 THEN 'sm' WHEN units<4 THEN 'med' ELSE 'large' …` | +| `EMIT` | specify push query | `SELECT * FROM users EMIT CHANGES;` | +| `END` | close a `CASE` block | `SELECT CASE WHEN condition THEN result [ WHEN … THEN … ] … END` | +| `EXISTS` | test whether object exists | `DROP STREAM IF EXISTS ;` | +| `EXPLAIN` | show execution plan | `EXPLAIN ;` or `EXPLAIN ;` | +| `EXTENDED` | list details for an object | `DESCRIBE EXTENDED ;` | +| `FALSE` | Boolean value of false | | +| `FINAL` | specify pull query | `SELECT * FROM users EMIT FINAL;` | +| `FROM` | specify record source for queries | `SELECT * FROM users;` | +| `FULL` | specify `FULL JOIN` | `CREATE TABLE t AS SELECT * FROM l FULL OUTER JOIN r ON l.ID = r.ID;`| +| `FUNCTION` | list details for a function | `DESCRIBE FUNCTION ;` | +| `FUNCTIONS` | list all functions | `SHOW FUNCTIONS;` | +| `GRACE` | grace period for a tumbling window | `WINDOW TUMBLING (SIZE 1 HOUR, GRACE PERIOD 2 HOURS)` | +| `GROUP` | group rows with the same values | `SELECT regionid, COUNT(*) FROM pageviews GROUP BY regionid` | +| `HAVING` | condition expression | `GROUP BY card_number HAVING COUNT(*) > 3` | +| `HOPPING` | specify a hopping window | `WINDOW HOPPING (SIZE 30 SECONDS, ADVANCE BY 10 SECONDS)` | +| `HOUR` | time unit of one hour for a window | `WINDOW TUMBLING (SIZE 1 HOUR, RETENTION 1 DAY)` | +| `HOURS` | time unit of hours for a window | `WINDOW TUMBLING (SIZE 2 HOURS, RETENTION 1 DAY)` | +| `IF` | test whether object exists | `DROP STREAM IF EXISTS ;` | +| `IN` | specify multiple values | `WHERE name IN (value1, value2, ...)` | +| `INNER` | specify `INNER JOIN` | `CREATE TABLE t AS SELECT * FROM l INNER JOIN r ON l.ID = r.ID;` | +| `INSERT` | insert new records in a stream/table | `INSERT INTO ...` | +| `INTEGER` | integer numeric type | `CREATE TABLE profiles (id INTEGER PRIMARY KEY, …` | +| `INTERVAL` | number of messages to skip in `PRINT` | `PRINT INTERVAL 5;` | +| `INTO` | stream/table to insert values | `INSERT INTO stream_name ...` | +| `IS` | | | +| `JOIN` | match records in streams/tables | `CREATE TABLE t AS SELECT * FROM l INNER JOIN r ON l.ID = r.ID;` | +| `KEY` | specify key column | `CREATE TABLE users (userId INT PRIMARY KEY, …` | +| `LEFT` | specify `LEFT JOIN` | `CREATE TABLE t AS SELECT * FROM l LEFT JOIN r ON l.ID = r.ID;` | +| `LIKE` | match pattern | `WHERE UCASE(gender)='FEMALE' AND LCASE (regionid) LIKE '%_6'` | +| `LIMIT` | number of records to output | `SELECT * FROM users EMIT CHANGES LIMIT 5;` | +| `LIST` | list objects | `SHOW STREAMS;` | +| `MAP` | `map` data type | `SELECT MAP(k1:=v1, k2:=v1*2) FROM s1 EMIT CHANGES;` | +| `MILLISECOND` | time unit of one ms for a window | `WINDOW TUMBLING (SIZE 1 MILLISECOND, RETENTION 1 DAY)` | +| `MILLISECONDS` | time unit of ms for a window | `WINDOW TUMBLING (SIZE 100 MILLISECONDS, RETENTION 1 DAY)` | +| `MINUTE` | time unit of one min for a window | `WINDOW TUMBLING (SIZE 1 MINUTE, RETENTION 1 DAY)` | +| `MINUTES` | time unit of mins for a window | `WINDOW TUMBLING (SIZE 30 MINUTES, RETENTION 1 DAY)` | +| `MONTH` | time unit of one month for a window | `WINDOW TUMBLING (SIZE 1 HOUR, RETENTION 1 MONTH)` | +| `MONTHS` | time unit of months for a window | `WINDOW TUMBLING (SIZE 1 HOUR, RETENTION 2 MONTHs)` | +| `NOT` | logical "not" operator | | +| `NULL` | field with no value | | +| `ON` | specify join criteria | `LEFT JOIN users ON pageviews.userid = users.userid` | +| `OR` | logical "or" operator | `WHERE userid='User_1' OR userid='User_2'` | +| `OUTER` | specify `OUTER JOIN` | `CREATE TABLE t AS SELECT * FROM l FULL OUTER JOIN r ON l.ID = r.ID;`| +| `PARTITION` | repartition a stream | `PARTITION BY ` | +| `PARTITIONS` | partitions to distribute keys over | `CREATE STREAM users_rekeyed WITH (PARTITIONS=6) AS …` | +| `PERIOD` | grace period for a tumbling window | `WINDOW TUMBLING (SIZE 1 HOUR, GRACE PERIOD 2 HOURS)` | +| `PRIMARY` | specify primary key column | `CREATE TABLE users (userId INT PRIMARY KEY, …` | +| `PRINT` | output records in a topic | `PRINT FROM BEGINNING;` | +| `PROPERTIES` | list all properties | `SHOW PROPERTIES;` | +| `QUERIES` | list all queries | `SHOW QUERIES;` | +| `REPLACE` | string replace | `REPLACE(col1, 'foo', 'bar')` | +| `RETENTION` | time to retain past windows | `WINDOW TUMBLING (SIZE 30 SECONDS, RETENTION 1000 DAYS)` | +| `RIGHT` | | | +| `RUN` | execute queries from a file | `RUN SCRIPT ;` | +| `SCRIPT` | execute queries from a file | `RUN SCRIPT ;` | +| `SECOND` | time unit of one sec for a window | `WINDOW TUMBLING (SIZE 1 SECOND, RETENTION 1 DAY)` | +| `SECONDS` | time unit of secs for a window | `WINDOW TUMBLING (SIZE 30 SECONDS, RETENTION 1 DAY)` | +| `SELECT` | query a stream or table | | +| `SESSION` | specify a session window | `WINDOW SESSION (60 SECONDS)` | +| `SET` | assign a property value | `SET 'auto.offset.reset'='earliest';` | +| `SHOW` | list objects | `SHOW FUNCTIONS;` | +| `SINK` | create a sink connector | `CREATE SINK CONNECTOR …` | +| `SIZE` | time length of a window | `WINDOW TUMBLING (SIZE 5 SECONDS)` | +| `SOURCE` | create a source connector | `CREATE SOURCE CONNECTOR …` | +| `STREAM` | register a stream on a topic | `CREATE STREAM users_orig AS SELECT * FROM users EMIT CHANGES;` | +| `STREAMS` | list all streams | `SHOW STREAMS;` | +| `STRUCT` | struct data type | `SELECT STRUCT(f1 := v1, f2 := v2) FROM s1 EMIT CHANGES;` | +| `TABLE` | register a table on a topic | `CREATE TABLE users (id BIGINT PRIMARY KEY, …` | +| `TABLES` | list all tables | `SHOW TABLES;` | +| `TERMINATE` | end a persistent query | `TERMINATE query_id;` | +| `THEN` | return expression in a CASE block | `CASE WHEN units<2 THEN 'sm' WHEN units<4 THEN 'med' ELSE 'large' …` | +| `TIMESTAMP` | specify a timestamp column | `CREATE STREAM pageviews WITH (TIMESTAMP='viewtime', …` | +| `TOPIC` | specify {{site.ak}} topic to delete | `DROP TABLE DELETE TOPIC;` | +| `TOPICS` | list all streams | `SHOW TOPICS;` | +| `TRUE` | Boolean value of true | | +| `TUMBLING` | specify a tumbling window | `WINDOW TUMBLING (SIZE 5 SECONDS)` | +| `TYPE` | alias a complex type declaration | `CREATE TYPE AS ;` | +| `TYPES` | list all custom type aliases | `SHOW TYPES;` | +| `UNSET` | unassign a property value | `UNSET 'auto.offset.reset';` | +| `VALUES` | list of values to insert | `INSERT INTO foo VALUES ('key', 'A');` | +| `WHEN` | specify condition in a `CASE` block | `SELECT CASE WHEN condition THEN result [ WHEN … THEN … ] …` | +| `WHERE` | filter records by a condition | `SELECT * FROM pageviews WHERE pageid < 'Page_20'` | +| `WINDOW` | groups rows with the same keys | `SELECT userid, COUNT(*) FROM users WINDOW SESSION (60 SECONDS) …` | +| `WITH` | specify object creation params | `CREATE STREAM pageviews WITH (TIMESTAMP='viewtime', …` | +| `WITHIN` | time range in a windowed join | `SELECT * FROM impressions i JOIN clicks c WITHIN 1 minute …` | +| `YEAR` | time unit of one year for a window | `WINDOW TUMBLING (SIZE 1 HOUR, RETENTION 1 YEAR)` | +| `YEARS` | time unit of years for a window | `WINDOW TUMBLING (SIZE 1 HOUR, RETENTION 2 YEARS)` | + +## Operators + +The following table shows all operators in the language. + +| operator | meaning | applies to +|--------------|--------------------------------|----------------- +| `=` | is equal to | string, numeric +| `!=` or `<>` | is not equal to | string, numeric +| `<` | is less than | string, numeric +| `<=` | is less than or equal to | string, numeric +| `>` | is greater than | string, numeric +| `>=` | is greater than or equal to | string, numeric +| `+` | addition for numeric, concatenation for string | string, numeric +| `-` | subtraction | numeric +| `*` | multiplication | numeric +| `/` | division | numeric +| `%` | modulus | numeric +| `||` or `+` | concatenation | string +| `:=` | assignment | all +| `->` | struct field dereference | struct +| `.` | source dereference | table, stream +| `E` or `e` | exponent | numeric +| `NOT` | logical NOT | boolean +| `AND` | logical AND | boolean +| `OR` | logical OR | boolean +| `BETWEEN` | test if value within range | numeric, string +| `LIKE` | match a pattern | string \ No newline at end of file diff --git a/docs/reference/sql/data-definition.md b/docs/reference/sql/data-definition.md new file mode 100644 index 000000000000..dbc8b796589e --- /dev/null +++ b/docs/reference/sql/data-definition.md @@ -0,0 +1,228 @@ +--- +layout: page +title: Data definition +tagline: Use DDL to structure data +description: How to use DDL to structure data in ksqlDB +keywords: ksqldb, sql, ddl +--- + +This section covers how you create the structures that store your events. +ksqlDB abstracts events as rows with columns and stores them in streams +and tables. + +## Rows and columns + +Streams and tables help you model collections of events that accrete over time. +Both are represented as a series of rows and columns with a schema, much like a +relational database table. Rows represent individual events. Columns represent +the attributes of those events. + +Each column has a data type. The data type limits the span of permissible values +that you can assign. For example, if a column is declared as type `INT`, it can't +be assigned the value of string `'foo'`. + +In contrast to relational database tables, the columns of a row in ksqlDB are +divided into _key_ and _value_ columns. The key columns control which partition +a row resides in. The value columns, by convention, store the main data of +interest. Controlling the key columns is useful for manipulating the underlying +data locality, and enables you to integrate with the wider {{ site.ak }} +ecosystem, which uses the same key/value data model. By default, a column is a +value column. Marking a column as a `(PRIMARY) KEY` makes it a key column. + +Internally, each row is backed by a [Kafka record](../../../overview/apache-kafka-primer/#records). +In {{ site.ak }}, the key and value parts of a record are +[serialized](../../../overview/apache-kafka-primer/#serializers) independently. +ksqlDB enables you to exercise this same flexibility and builds on the semantics +of {{ site.ak }} records, rather than hiding them. + +There is no theoretical limit on the number of columns in a stream or table. +In practice, the limit is determined by the maximum message size that {{ site.ak }} +can store and the resources dedicated to ksqlDB. + +## Streams + +A stream is a partitioned, immutable, append-only collection that represents a +series of historical facts. For example, the rows of a stream could model a +sequence of financial transactions, like "Alice sent $100 to Bob", followed by +"Charlie sent $50 to Bob". + +Once a row is inserted into a stream, it can never change. New rows can be +appended at the end of the stream, but existing rows can never be updated or +deleted. + +Each row is stored in a particular partition. Every row, implicitly or explicitly, +has a key that represents its identity. All rows with the same key reside in the +same partition. + +To create a stream, use the `CREATE STREAM` command. The following example +statement specifies a name for the new stream, the names of the columns, and +the data type of each column. + +```sql +CREATE STREAM s1 ( + k VARCHAR KEY, + v1 INT, + v2 VARCHAR +) WITH ( + kafka_topic = 's1', + partitions = 3, + value_format = 'json' +); +``` + +This creates a new stream named `s1` with three columns: `k`, `v1`, and `v2`. +The column `k` is designated as the key of this stream, which controls the +partition that each row is stored in. When the data is stored, the value +portion of each row's underlying {{ site.ak }} record is serialized in the +JSON format. + +Under the hood, each stream corresponds to a [Kafka topic](../../../overview/apache-kafka-primer/#topics) +with a registered schema. If the backing topic for a stream doesn't exist when +you declare it, ksqlDB creates it on your behalf, as shown in the previous +example statement. + +You can also declare a stream on top of an existing topic. When you do that, +ksqlDB simply registers its associated schema. If topic `s2` already exists, +the following statement register a new stream over it: + +```sql +CREATE STREAM s2 ( + k1 VARCHAR KEY, + v1 VARCHAR +) WITH ( + kafka_topic = 's2', + value_format = 'json' +); +``` + +!!! tip + When you create a stream on an existing topic, you don't need to declare + the number of partitions for the topic. ksqlDB infers the partition count + from the existing topic. + +## Tables + +A table is a mutable, partitioned collection that models change over time. In +contrast with a stream, which represents a historical sequence of events, a +table represents what is true as of "now". For example, you might use a table +to model the locations where someone has lived as a stream: first Miami, then +New York, then London, and so forth. + +Tables work by leveraging the keys of each row. If a sequence of rows shares a +key, the last row for a given key represents the most up-to-date information +for that key's identity. A background process periodically runs and deletes all +but the newest rows for each key. + +Syntactically, declaring a table is similar to declaring a stream. The following +example statement declares a `current_location` table that has a key field +named `person`. + +```sql +CREATE TABLE current_location ( + person VARCHAR PRIMARY KEY, + location VARCHAR +) WITH ( + kafka_topic = 'current_location', + partitions = 3, + value_format = 'json' +); +``` + +As with a stream, you can declare a table directly on top of an existing +{{ site.ak }} topic by omitting the number of partitions in the `WITH` clause. + +## Keys + +You can mark a column with the `KEY` keyword to indicate that it's a key +column. Key columns constitute the key portion of the row's underlying +{{ site.ak }} record. Only streams can mark columns as keys, and it's optional +for them to do do. Tables must use the `PRIMARY KEY` constraint instead. + +In the following example statement, `k1`'s data is stored in the key portion of +the row, and `v1`'s data is stored in the value. + +```sql +CREATE STREAM s3 ( + k1 VARCHAR KEY, + v1 VARCHAR +) WITH ( + kafka_topic = 's3', + value_format = 'json' +); +``` + +The ability to declare key columns explicitly is especially useful when you're +creating a stream over an existing topic. If ksqlDB can't infer what data is in +the key of the underlying {{ site.ak }} record, it must perform a repartition +of the rows internally. If you're not sure what data is in the key or you simply +don't need it, you can omit the `KEY` keyword. + +## Default values + +If a column is declared in a schema, but no attribute is present in the +underlying {{ site.ak }} record, the value for the row's column is populated as +`null`. + +## Pseudocolumns + +A pseudocolumn is a column that's automatically populated by ksqlDB and contains +meta-information that can be inferred about the row at creation time. By default, +pseudocolumns aren't returned when selecting all columns with the star (`*`) +special character. You must select them explicitly, as shown in the following +example statement. + +```sql +SELECT ROWTIME, * FROM s1 EMIT CHANGES; +``` + +The following table lists all pseudocolumns. + +| pseudocolumn | meaning | +|--------------|--------------------------------| +| `ROWTIME` | Row timestamp, inferred from the underlying Kafka record if not overridden. | + +You can't create additional pseudocolumns beyond these. + +## Constraints + +Although data types help limit the range of values that can be accepted by +ksqlDB, sometimes it's useful to have more sophisticated restrictions. +_Constraints_ enable you to exercise this type of logic directly in your schema. + +### Primary key constraints + +In a relational database, a primary key indicates that a column will be used as +a unique identifier for all rows in a table. If you have a table that has a row +with primary key `5`, you can't insert another row whose primary key is also `5`. + +ksqlDB uses primary keys in a similar way, but there are a few differences, +because ksqlDB is an event streaming database, not a relational database. + +- Only tables can have primary keys. Streams do not support them. +- Adding multiple rows to a table with the same primary key doesn't cause the + subsequent rows to be rejected. + +The reason for both of these behaviors is the same: the purpose of tables is to +model change of particular identities, but streams are used to accrete facts. +When you insert multiple rows with the same primary key into a table, ksqlDB +interprets these rows as changes to a single identity. + +Primary keys can't be null, and they must be used in all declared tables. In +the following example statement, `id` acts as the primary key for table `users`: + +```sql +CREATE TABLE users ( + id BIGINT PRIMARY KEY + name VARCHAR + ) WITH ( + kafka_topic = 'users', + partitions = 3, + value_format = 'json' + ); +``` + +### Not-null constraints + +A _not-null constraint_ designates that a column can't contain a null value. +ksqlDB doesn't support this constraint, but you can track its progress in +[GitHub issue 4436](https://github.com/confluentinc/ksql/issues/4436). \ No newline at end of file diff --git a/docs/reference/sql/syntax/lexical-structure.md b/docs/reference/sql/syntax/lexical-structure.md new file mode 100644 index 000000000000..0209fcacf52b --- /dev/null +++ b/docs/reference/sql/syntax/lexical-structure.md @@ -0,0 +1,201 @@ +--- +layout: page +title: Lexical structure data +tagline: Structure of SQL commands and statements in ksqlDB +description: Details about SQL commands and statements in ksqlDB +keywords: ksqldb, sql, keyword, identifier, constant, operator +--- + +SQL is a domain-specific language for managing and manipulating data. It’s +used primarily to work with structured data, where the types and relationships +across entities are well-defined. Originally adopted for relational databases, +SQL is rapidly becoming the language of choice for stream processing. It’s +declarative, expressive, and ubiquitous. + +The American National Standards Institute (ANSI) maintains a standard for the +specification of SQL. SQL-92, the third revision to the standard, is generally +the most recognized form of the specification. Beyond the standard, there are +many flavors and extensions to SQL so that it can express programs beyond +what's possible with the SQL-92 grammar. + +ksqlDB’s SQL grammar was built initially around Presto's grammar and has been +extended judiciously. ksqlDB goes beyond SQL-92, because the standard currently +has no constructs for streaming queries, which are a core aspect of this project. + +## Syntax + +SQL inputs are made up of a series of statements. Each statements is made up of +a series of tokens and ends in a semicolon (`;`). The tokens that apply depend +on the statement being invoked. + +A token is any keyword, identifier, backticked identifier, literal, or special +character. By convention, tokens are separated by whitespace, unless there is +no ambiguity in the grammar. This happens when tokens flank a special character. + +The following example statements are syntactically valid ksqlDB SQL input: + +```sql +INSERT INTO s1 (a, b) VALUES ('k1', 'v1'); + +CREATE STREAM s2 AS + SELECT a, b + FROM s1 + EMIT CHANGES; + +SELECT * FROM t1 WHERE k1='foo' EMIT CHANGES; +``` + +## Keywords + +Some tokens, such as `SELECT`, `INSERT`, and `CREATE`, are _keywords_. +Keywords are reserved tokens that have a specific meaning in ksqlDB's syntax. +They control their surrounding allowable tokens and execution semantics. +Keywords are case insensitive, meaning `SELECT` and `select` are equivalent. +You can't create an identifier that is already a reserved word, unless you use +backticked identifiers. + +A complete list of keywords can be found in the [appendix](../appendix.md#keywords). + +## Identifiers + +Identifiers are symbols that represent user-defined entities, like streams, +tables, columns, and other objects. For example, if you have a stream named +`s1`, `s1` is an _identifier_ for that stream. By default, identifiers are +case-insensitive, meaning `s1` and `S1` refer to the same stream. Under the +hood, ksqlDB capitalizes all of the characters in the identifier for all +future display purposes. + +Unless an identifier is backticked, it may be composed only of characters that +are a letter, number, or underscore. There is no imposed limit on the number of +characters. + +To make it possible to use any character in an identifier, you can enclose it +in backticks (``` ` ```) when you declare and use it. A _backticked identifier_ +is useful when you don't control the data, so it might have special characters, +or even keywords. When you use backticked identifers, ksqlDB captures the case +exactly, and any future references to the identifer become case-sensitive. For +example, if you declare the following stream: + +```sql +CREATE STREAM `s1` ( + k VARCHAR KEY, + `@MY-identifier-stream-column!` INT +) WITH ( + kafka_topic = 's1', + partitions = 3, + value_format = 'json' +); +``` + +You must select from it by backticking the stream name and column name and +using the original casing: + +```sql +SELECT `@MY-identifier-stream-column!` FROM `s1` EMIT CHANGES; +``` + +## Constants + +There are three implicitly typed constants, or literals, in ksqlDB: strings, +numbers, and booleans. + +### String constants + +A string constant is an arbitrary series of characters surrounded by single +quotes (`'`), like `'Hello world'`. To include a quote inside of a string +literal, escape the quote by prefixing it with another quote, for example +`'You can call me ''Stuart'', or Stu.'` + +### Numeric constants + +Numeric constants are accepted in the following forms: + +1. **_`digits`_** +2. **_`digits`_**`.[`**_`digits`_**`][e[+-]`**_`digits`_**`]` +3. `[`**_`digits`_**`].`**_`digits`_**`[e[+-]`**_`digits`_**`]` +4. **_`digits`_**`e[+-]`**_`digits`_** + +where **_`digits`_** is one or more single-digit integers (`0` through `9`). + +- At least one digit must be present before or after the decimal point, if + there is one. +- At least one digit must follow the exponent symbol `e`, if there is one. +- No spaces, underscores, or any other characters are allowed in the constant. +- Numeric constants may also have a `+` or `-` prefix, but this is considered to + be a function applied to the constant, not the constant itself. + +Here are some examples of valid numeric constants: + +- `5` +- `7.2` +- `0.0087` +- `1.` +- `.5` +- `1e-3` +- `1.332434e+2` +- `+100` +- `-250` + +### Boolean constants + +A boolean constant is represented as either the identifer `true` or `false`. +Boolean constants are not case-sensitive, meaning `true` evaluates to the same +value as `TRUE`. + +## Operators + +Operators are infix functions composed of special characters. A complete list +of operators can be found in the [appendix](../appendix.md#operators). ksqlDB +doesn't allow you to add user-space operators. + +## Special characters + +Some characters have a particular meaning that doesn't correspond to an +operator. The following list describes the special characters and their +purpose. + +- Parentheses (`()`) retain their usual meaning in programming languages for + grouping expressions and controlling the order of evaluation. +- Brackets (`[]`) are used to work with arrays, both in their construction and + subscript access. They also allow you to key into maps. +- Commas (`,`) delineate a discrete list of entities. +- The semi-colons (`;`) terminates a SQL command. +- The asterisk (`*`), when used in particular syntax, is used as an "all" + qualifier. This is seen most commonly in a `SELECT` command to retrieve all + columns. +- The period (`.`) accesses a column in a stream or table. +- The arrow (`->`) accesses a field in a struct data type. + +## Comments + +A comment is a string beginning with twos dashes. It includes all of the +content from the dashes to the end of the line: + +```sql +-- Here is a comment. +``` + +You can also span a comment over multiple lines by using C-style syntax: + +```sql +/* Here is + another comment. + */ +``` + +## Lexical precedence + +Operators are evaluated using the following order of precedence: + +1. `*`, `/`, `%` +2. `+`, `-` +3. `=`, `>`, `<`, `>=`, `<=`, `<>`, `!=` +4. `NOT` +5. `AND` +6. `BETWEEN`, `LIKE`, `OR` + +In an expression, when two operators have the same precedence level, they're +evaluated left-to-right based on their position. + +You can enclose an expression in parentheses to force precedence or clarify +precedence, for example, `(5 + 2) * 3`. \ No newline at end of file diff --git a/docs/requirements.txt b/docs/requirements.txt index 9a1d8bdf37de..de6290cd024e 100644 --- a/docs/requirements.txt +++ b/docs/requirements.txt @@ -7,3 +7,4 @@ Pygments==2.4.2 mkdocs-material==5.1.3 python-dateutil==2.8.1 mkdocs-redirects==1.0.1 +mdx_truly_sane_lists==1.2 diff --git a/mkdocs.yml b/mkdocs.yml index 9ef8aedbf7fb..fdda899ebc09 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -28,13 +28,12 @@ extra_javascript: nav: - Overview: index.md - Getting started: quickstart.md # links to Derek's quickstart at ksqldb.io + - Apache Kafka primer: overview/apache-kafka-primer.md - Concepts: - Concepts: concepts/index.md - Events: concepts/events.md - Collections: - Collections Overview: concepts/collections/index.md - - Streams: concepts/collections/streams.md - - Tables: concepts/collections/tables.md - Inserting events: concepts/collections/inserting-events.md - Stream Processing: concepts/stream-processing.md - Materialized Views: concepts/materialized-views.md @@ -79,6 +78,11 @@ nav: - Control the case of identifiers: how-to-guides/control-the-case-of-identifiers.md - Reference: - Syntax Reference: developer-guide/syntax-reference.md + - The SQL language: + - SQL syntax: + - Lexical structure: reference/sql/syntax/lexical-structure.md + - Data definition: reference/sql/data-definition.md + - Appendix: reference/sql/appendix.md - Statements: - SQL quick reference: developer-guide/ksqldb-reference/quick-reference.md - Statement Index: developer-guide/ksqldb-reference/index.md @@ -156,7 +160,7 @@ nav: - ksqlDB with Embedded Connect: tutorials/embedded-connect.md - Integrate with PostgreSQL: tutorials/connect-integration.md - Troubleshooting: troubleshoot-ksqldb.md - - Frequently Asked Questions: faq.md + - Frequently asked questions: faq.md markdown_extensions: - toc: @@ -170,6 +174,7 @@ markdown_extensions: - mdx_gh_links: user: confluentinc repo: ksqldb + - mdx_truly_sane_lists plugins: - search @@ -178,6 +183,8 @@ plugins: - redirects: redirect_maps: developer-guide/implement-a-udf.md: how-to-guides/create-a-user-defined-function.md + concepts/collections/streams.md: reference/sql/data-definition.md + concepts/collections/tables.md: reference/sql/data-definition.md extra: site: