-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New sections for lexical structure, ddl, and primer docs #6041
Conversation
@vpapavas Can you please review the new section on the lexical structure of the language? These docs are part of a larger revamp of the reference section. We're hoping to ship the bit that's already ready to go. |
@vcrfxia Can you please review the Apache Kafka primer section of this patch? See above for context. The dedicated primary section is to help us avoid having to explain basic Kafka sections in multiple parts of the docs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great new content!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @MichaelDrogalis , the Apache Kafka primer looks wonderful! LGTM with a bunch of nits/minor things inline.
|
||
ksqlDB provides higher-level abstractions over a topic through _streams_ and | ||
_tables_. A stream or table is a {{ site.ak }} topic with a registered schema. | ||
The schema controls the shape of records that are allowed to be stored in the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: don't love the word "shape". How about "structure" or "data layout" (or similar)?
My gut also says "specifies" rather than "controls" but I'm just nit-picking :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm with Vic here on the controls. Then suggests some kind of check to ensure messages with the wrong schema are not produced a topic by some other producer, and such a check does not exist. This is particularly important distinction for source topics, which ksqlDB only reads from.
Though it's very common for all messages to have the same schema, even this isn't a requirement for some formats, e.g. JSON can handle topics where values have different schemas: the declared stream or table can either define the superset of fields, some common subset, or any other combination.
How about just saying something about ksqlDB adding a SQL abstraction over the data held in Kafka, and that SQL statically typed?
Thanks @vcrfxia! Great feedback, addressed it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with minor comments
Super, thank you @vpapavas! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 this is super cool, adds lots of clarity to concepts I find myself explaining multiple times
|
||
The following table shows all keywords in the language. | ||
|
||
| keyword | description | example | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is awesome, but I'm worried about it staying in sync with the code 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Me too.
SELECT ROWTIME, * FROM s1 EMIT CHANGES; | ||
``` | ||
|
||
The following table lists all pseudocolumns. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is the verdict on WINDOWSTART
and WINDOWEND
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They're currently not defined as pseudo columns. They are included by default with a select *
.
I guess at the moment they're defined a 'system columns'. There names are reserved for system use.
The _topic_ and _partition_ describe which larger collection and subset of events | ||
this particular event belongs to, and the _offset_ describes its exact position within | ||
that larger collection (more on that below). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find this overly confusing.
All this talk of larger collection
and subset of events
is unnecessarily vague and using alternative nomenclature to what Kafka uses. Given this page is all about the Kafka terms users will need to understand, does it not make sense to use the terms Kafka uses?
Would it not be better to just give a quick outline of what a topic is in Kafka, including that its broken into partitions, then explain that the offset if the offset into a particular topic-partition?
Or, given you explain these concepts in more depth below, simple say:
The offset denotes the position of the record within a specific partition of a topic, (more on these below).
This avoids terms such as collections
(topic) and subsets
(partitions).
|
||
ksqlDB provides higher-level abstractions over a topic through _streams_ and | ||
_tables_. A stream or table is a {{ site.ak }} topic with a registered schema. | ||
The schema controls the shape of records that are allowed to be stored in the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm with Vic here on the controls. Then suggests some kind of check to ensure messages with the wrong schema are not produced a topic by some other producer, and such a check does not exist. This is particularly important distinction for source topics, which ksqlDB only reads from.
Though it's very common for all messages to have the same schema, even this isn't a requirement for some formats, e.g. JSON can handle topics where values have different schemas: the declared stream or table can either define the superset of fields, some common subset, or any other combination.
How about just saying something about ksqlDB adding a SQL abstraction over the data held in Kafka, and that SQL statically typed?
partition will be consistent with all other records with the same key. When records are | ||
appended, they follow the correct offset order, even in the presence of | ||
failures or faults. When a stream's key content changes because of how a query |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When records are appended, they follow the correct offset order, even in the presence of failures or faults
It's unclear to me what this means. Either a message is or is not appended to a partition. This is controlled by the Kafka broker. It doesn't make sense to me to say that we ensure records are appended with the correct offset order.
SELECT ROWTIME, * FROM s1 EMIT CHANGES; | ||
``` | ||
|
||
The following table lists all pseudocolumns. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They're currently not defined as pseudo columns. They are included by default with a select *
.
I guess at the moment they're defined a 'system columns'. There names are reserved for system use.
- Adding multiple rows to a table with the same primary key doesn't cause the | ||
subsequent rows to be rejected. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is mixing up the concept of a table
with a changelog
.
The table can only have one row with a specific primary key. A changelog can contain multiple rows with the same key. The changelog can be materialized into a table.
This is the same as a traditional database.
We should explain this table/changelog duality somewhere ;)
Yes, unfortunately at the moment we can INSERT VALUES
into a table with an key for a row that already exists and it will work. However, this is a bug IMHO, (it should be UPSERT
not INSERT
),.
|
||
- At least one digit must be present before or after the decimal point, if | ||
there is one. | ||
- At least one digit must follow the exponent symbol `e`, if there is one. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI, e
is case-insensitive. So can be E
too.
* docs: ports lexical structure, ddl, and primer docs * docs: remove old docs * docs: fix redirects * docs: remove dead links * docs: fixes more dead links * docs: adds appendix * docs: copy edit new ddl and lexical structure topics (DOCS-5143) (#6046) * docs: copy edit new sql reference topics * docs: copy edit lexical structure topic * docs: more carefully describe partition * docs: clarify * docs: link to s/t * docs: suggestions * docs: clarify language * docs: ombine retention & compaction * docs: address Vicky's feedback * docs: almog feedback Co-authored-by: Jim Galasyn <jim.galasyn@confluent.io> Co-authored-by: Michael Drogalis <michael.drogalis@confluent.io>
…c#6041) * docs: ports lexical structure, ddl, and primer docs * docs: remove old docs * docs: fix redirects * docs: remove dead links * docs: fixes more dead links * docs: adds appendix * docs: copy edit new ddl and lexical structure topics (DOCS-5143) (confluentinc#6046) * docs: copy edit new sql reference topics * docs: copy edit lexical structure topic * docs: more carefully describe partition * docs: clarify * docs: link to s/t * docs: suggestions * docs: clarify language * docs: ombine retention & compaction * docs: address Vicky's feedback * docs: almog feedback Co-authored-by: Jim Galasyn <jim.galasyn@confluent.io>
Ports selective pages from #5935.