New sections for lexical structure, ddl, and primer docs #6041

MichaelDrogalis · 2020-08-17T23:08:16Z

Ports selective pages from #5935.

MichaelDrogalis · 2020-08-17T23:09:02Z

@vpapavas Can you please review the new section on the lexical structure of the language? These docs are part of a larger revamp of the reference section. We're hoping to ship the bit that's already ready to go.

MichaelDrogalis · 2020-08-17T23:09:36Z

@derekjn / @agavra Can you please review the DDL section in this patch? See context above.

MichaelDrogalis · 2020-08-17T23:11:14Z

@vcrfxia Can you please review the Apache Kafka primer section of this patch? See above for context. The dedicated primary section is to help us avoid having to explain basic Kafka sections in multiple parts of the docs.

* docs: copy edit new sql reference topics * docs: copy edit lexical structure topic

JimGalasyn

Great new content!

vcrfxia

Hey @MichaelDrogalis , the Apache Kafka primer looks wonderful! LGTM with a bunch of nits/minor things inline.

docs/overview/apache-kafka-primer.md

vcrfxia · 2020-08-21T18:56:28Z

docs/overview/apache-kafka-primer.md

+
+ksqlDB provides higher-level abstractions over a topic through _streams_ and
+_tables_. A stream or table is a {{ site.ak }} topic with a registered schema.
+The schema controls the shape of records that are allowed to be stored in the


nit: don't love the word "shape". How about "structure" or "data layout" (or similar)?

My gut also says "specifies" rather than "controls" but I'm just nit-picking :)

I'm with Vic here on the controls. Then suggests some kind of check to ensure messages with the wrong schema are not produced a topic by some other producer, and such a check does not exist. This is particularly important distinction for source topics, which ksqlDB only reads from.

Though it's very common for all messages to have the same schema, even this isn't a requirement for some formats, e.g. JSON can handle topics where values have different schemas: the declared stream or table can either define the superset of fields, some common subset, or any other combination.

How about just saying something about ksqlDB adding a SQL abstraction over the data held in Kafka, and that SQL statically typed?

docs/overview/apache-kafka-primer.md

MichaelDrogalis · 2020-08-26T22:15:55Z

Thanks @vcrfxia! Great feedback, addressed it.

docs/reference/sql/syntax/lexical-structure.md

vpapavas

LGTM with minor comments

MichaelDrogalis · 2020-08-27T15:14:20Z

Super, thank you @vpapavas!

agavra

+1 this is super cool, adds lots of clarity to concepts I find myself explaining multiple times

docs/overview/apache-kafka-primer.md

docs/reference/sql/appendix.md

agavra · 2020-08-31T22:49:56Z

docs/reference/sql/appendix.md

+
+The following table shows all keywords in the language.
+
+| keyword      | description                             | example                                                              |


this is awesome, but I'm worried about it staying in sync with the code 🤔

agavra · 2020-08-31T22:56:28Z

docs/reference/sql/data-definition.md

+SELECT ROWTIME, * FROM s1 EMIT CHANGES;
+```
+
+The following table lists all pseudocolumns.


what is the verdict on WINDOWSTART and WINDOWEND?

They're currently not defined as pseudo columns. They are included by default with a select *.

I guess at the moment they're defined a 'system columns'. There names are reserved for system use.

docs/reference/sql/syntax/lexical-structure.md

big-andy-coates · 2020-09-01T13:28:38Z

docs/overview/apache-kafka-primer.md

+The _topic_ and _partition_ describe which larger collection and subset of events
+this particular event belongs to, and the _offset_ describes its exact position within
+that larger collection (more on that below).


I find this overly confusing.

All this talk of larger collection and subset of events is unnecessarily vague and using alternative nomenclature to what Kafka uses. Given this page is all about the Kafka terms users will need to understand, does it not make sense to use the terms Kafka uses?

Would it not be better to just give a quick outline of what a topic is in Kafka, including that its broken into partitions, then explain that the offset if the offset into a particular topic-partition?

Or, given you explain these concepts in more depth below, simple say:

The offset denotes the position of the record within a specific partition of a topic, (more on these below).

This avoids terms such as collections (topic) and subsets (partitions).

big-andy-coates · 2020-09-01T13:38:07Z

docs/overview/apache-kafka-primer.md

+
+ksqlDB provides higher-level abstractions over a topic through _streams_ and
+_tables_. A stream or table is a {{ site.ak }} topic with a registered schema.
+The schema controls the shape of records that are allowed to be stored in the


I'm with Vic here on the controls. Then suggests some kind of check to ensure messages with the wrong schema are not produced a topic by some other producer, and such a check does not exist. This is particularly important distinction for source topics, which ksqlDB only reads from.

Though it's very common for all messages to have the same schema, even this isn't a requirement for some formats, e.g. JSON can handle topics where values have different schemas: the declared stream or table can either define the superset of fields, some common subset, or any other combination.

How about just saying something about ksqlDB adding a SQL abstraction over the data held in Kafka, and that SQL statically typed?

docs/overview/apache-kafka-primer.md

big-andy-coates · 2020-09-01T13:52:52Z

docs/overview/apache-kafka-primer.md

+partition will be consistent with all other records with the same key. When records are
+appended, they follow the correct offset order, even in the presence of
+failures or faults. When a stream's key content changes because of how a query


When records are appended, they follow the correct offset order, even in the presence of failures or faults

It's unclear to me what this means. Either a message is or is not appended to a partition. This is controlled by the Kafka broker. It doesn't make sense to me to say that we ensure records are appended with the correct offset order.

big-andy-coates · 2020-09-01T14:35:29Z

docs/reference/sql/data-definition.md

+SELECT ROWTIME, * FROM s1 EMIT CHANGES;
+```
+
+The following table lists all pseudocolumns.


They're currently not defined as pseudo columns. They are included by default with a select *.

I guess at the moment they're defined a 'system columns'. There names are reserved for system use.

big-andy-coates · 2020-09-01T14:37:18Z

docs/reference/sql/data-definition.md

+- Adding multiple rows to a table with the same primary key doesn't cause the
+  subsequent rows to be rejected.


This is mixing up the concept of a table with a changelog.

The table can only have one row with a specific primary key. A changelog can contain multiple rows with the same key. The changelog can be materialized into a table.

This is the same as a traditional database.

We should explain this table/changelog duality somewhere ;)

Yes, unfortunately at the moment we can INSERT VALUES into a table with an key for a row that already exists and it will work. However, this is a bug IMHO, (it should be UPSERT not INSERT),.

big-andy-coates · 2020-09-01T14:44:37Z

docs/reference/sql/syntax/lexical-structure.md

+
+- At least one digit must be present before or after the decimal point, if
+  there is one.
+- At least one digit must follow the exponent symbol `e`, if there is one.


FYI, e is case-insensitive. So can be E too.

docs/reference/sql/syntax/lexical-structure.md

* docs: ports lexical structure, ddl, and primer docs * docs: remove old docs * docs: fix redirects * docs: remove dead links * docs: fixes more dead links * docs: adds appendix * docs: copy edit new ddl and lexical structure topics (DOCS-5143) (#6046) * docs: copy edit new sql reference topics * docs: copy edit lexical structure topic * docs: more carefully describe partition * docs: clarify * docs: link to s/t * docs: suggestions * docs: clarify language * docs: ombine retention & compaction * docs: address Vicky's feedback * docs: almog feedback Co-authored-by: Jim Galasyn <jim.galasyn@confluent.io> Co-authored-by: Michael Drogalis <michael.drogalis@confluent.io>

…c#6041) * docs: ports lexical structure, ddl, and primer docs * docs: remove old docs * docs: fix redirects * docs: remove dead links * docs: fixes more dead links * docs: adds appendix * docs: copy edit new ddl and lexical structure topics (DOCS-5143) (confluentinc#6046) * docs: copy edit new sql reference topics * docs: copy edit lexical structure topic * docs: more carefully describe partition * docs: clarify * docs: link to s/t * docs: suggestions * docs: clarify language * docs: ombine retention & compaction * docs: address Vicky's feedback * docs: almog feedback Co-authored-by: Jim Galasyn <jim.galasyn@confluent.io>

docs: ports lexical structure, ddl, and primer docs

09e90ae

MichaelDrogalis requested a review from JimGalasyn August 17, 2020 23:08

MichaelDrogalis requested a review from a team as a code owner August 17, 2020 23:08

MichaelDrogalis requested a review from vpapavas August 17, 2020 23:08

MichaelDrogalis requested review from derekjn, agavra and vcrfxia August 17, 2020 23:11

MichaelDrogalis and others added 6 commits August 17, 2020 16:25

docs: remove old docs

c990aa7

docs: fix redirects

38a199e

docs: remove dead links

c10b014

docs: fixes more dead links

f0fe45a

docs: adds appendix

e7857ad

docs: copy edit new ddl and lexical structure topics (DOCS-5143) (#6046)

52cee8f

* docs: copy edit new sql reference topics * docs: copy edit lexical structure topic

JimGalasyn approved these changes Aug 18, 2020

View reviewed changes

vcrfxia approved these changes Aug 21, 2020

View reviewed changes

MichaelDrogalis added 6 commits August 26, 2020 14:57

docs: more carefully describe partition

f489b46

docs: clarify

4078867

docs: link to s/t

024834f

docs: suggestions

2582edf

docs: clarify language

124d8c1

docs: ombine retention & compaction

ead9b76

Merge branch 'master' into mdrogalis/sql-docs

1a682fa

vpapavas reviewed Aug 27, 2020

View reviewed changes

docs/reference/sql/syntax/lexical-structure.md Outdated Show resolved Hide resolved

vpapavas reviewed Aug 27, 2020

View reviewed changes

docs/reference/sql/syntax/lexical-structure.md Outdated Show resolved Hide resolved

vpapavas reviewed Aug 27, 2020

View reviewed changes

docs/reference/sql/syntax/lexical-structure.md Outdated Show resolved Hide resolved

vpapavas approved these changes Aug 27, 2020

View reviewed changes

docs: address Vicky's feedback

0c1fbf0

agavra approved these changes Aug 31, 2020

View reviewed changes

big-andy-coates reviewed Sep 1, 2020

View reviewed changes

big-andy-coates mentioned this pull request Sep 1, 2020

fix: CREATE IF NOT EXISTS does not work at all #6073

Merged

2 tasks

docs: almog feedback

9e3dd56

MichaelDrogalis merged commit d38746b into master Sep 1, 2020

MichaelDrogalis deleted the mdrogalis/sql-docs branch September 1, 2020 22:55

This was referenced Sep 2, 2020

docs: cherry-pick: new sections for lexical structure, ddl, and primer docs (DOCS-5277) #6140

Merged

docs: cherry-pick: new sections for lexical structure, ddl, and primer docs (DOCS-5282) #6141

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New sections for lexical structure, ddl, and primer docs #6041

New sections for lexical structure, ddl, and primer docs #6041

MichaelDrogalis commented Aug 17, 2020

MichaelDrogalis commented Aug 17, 2020

MichaelDrogalis commented Aug 17, 2020 •

edited

Loading

MichaelDrogalis commented Aug 17, 2020

JimGalasyn left a comment

vcrfxia left a comment

vcrfxia Aug 21, 2020

big-andy-coates Sep 1, 2020

MichaelDrogalis commented Aug 26, 2020

vpapavas left a comment

MichaelDrogalis commented Aug 27, 2020

agavra left a comment

agavra Aug 31, 2020

big-andy-coates Sep 1, 2020

agavra Aug 31, 2020

big-andy-coates Sep 1, 2020

big-andy-coates Sep 1, 2020

big-andy-coates Sep 1, 2020

big-andy-coates Sep 1, 2020

big-andy-coates Sep 1, 2020

big-andy-coates Sep 1, 2020

big-andy-coates Sep 1, 2020


		The following table shows all keywords in the language.

		\| keyword \| description \| example \|

		- Adding multiple rows to a table with the same primary key doesn't cause the
		subsequent rows to be rejected.

New sections for lexical structure, ddl, and primer docs #6041

New sections for lexical structure, ddl, and primer docs #6041

Conversation

MichaelDrogalis commented Aug 17, 2020

MichaelDrogalis commented Aug 17, 2020

MichaelDrogalis commented Aug 17, 2020 • edited Loading

MichaelDrogalis commented Aug 17, 2020

JimGalasyn left a comment

Choose a reason for hiding this comment

vcrfxia left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MichaelDrogalis commented Aug 26, 2020

vpapavas left a comment

Choose a reason for hiding this comment

MichaelDrogalis commented Aug 27, 2020

agavra left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MichaelDrogalis commented Aug 17, 2020 •

edited

Loading