Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New sections for lexical structure, ddl, and primer docs #6041

Merged
merged 16 commits into from
Sep 1, 2020

Conversation

MichaelDrogalis
Copy link
Contributor

Ports selective pages from #5935.

@MichaelDrogalis
Copy link
Contributor Author

@vpapavas Can you please review the new section on the lexical structure of the language? These docs are part of a larger revamp of the reference section. We're hoping to ship the bit that's already ready to go.

@MichaelDrogalis
Copy link
Contributor Author

MichaelDrogalis commented Aug 17, 2020

@derekjn / @agavra Can you please review the DDL section in this patch? See context above.

@MichaelDrogalis
Copy link
Contributor Author

@vcrfxia Can you please review the Apache Kafka primer section of this patch? See above for context. The dedicated primary section is to help us avoid having to explain basic Kafka sections in multiple parts of the docs.

Copy link
Member

@JimGalasyn JimGalasyn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great new content!

Copy link
Contributor

@vcrfxia vcrfxia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @MichaelDrogalis , the Apache Kafka primer looks wonderful! LGTM with a bunch of nits/minor things inline.

docs/overview/apache-kafka-primer.md Show resolved Hide resolved
docs/overview/apache-kafka-primer.md Outdated Show resolved Hide resolved
docs/overview/apache-kafka-primer.md Outdated Show resolved Hide resolved

ksqlDB provides higher-level abstractions over a topic through _streams_ and
_tables_. A stream or table is a {{ site.ak }} topic with a registered schema.
The schema controls the shape of records that are allowed to be stored in the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: don't love the word "shape". How about "structure" or "data layout" (or similar)?

My gut also says "specifies" rather than "controls" but I'm just nit-picking :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm with Vic here on the controls. Then suggests some kind of check to ensure messages with the wrong schema are not produced a topic by some other producer, and such a check does not exist. This is particularly important distinction for source topics, which ksqlDB only reads from.

Though it's very common for all messages to have the same schema, even this isn't a requirement for some formats, e.g. JSON can handle topics where values have different schemas: the declared stream or table can either define the superset of fields, some common subset, or any other combination.

How about just saying something about ksqlDB adding a SQL abstraction over the data held in Kafka, and that SQL statically typed?

docs/overview/apache-kafka-primer.md Outdated Show resolved Hide resolved
docs/overview/apache-kafka-primer.md Show resolved Hide resolved
docs/overview/apache-kafka-primer.md Outdated Show resolved Hide resolved
docs/overview/apache-kafka-primer.md Outdated Show resolved Hide resolved
docs/overview/apache-kafka-primer.md Outdated Show resolved Hide resolved
docs/overview/apache-kafka-primer.md Show resolved Hide resolved
@MichaelDrogalis
Copy link
Contributor Author

Thanks @vcrfxia! Great feedback, addressed it.

Copy link
Member

@vpapavas vpapavas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with minor comments

@MichaelDrogalis
Copy link
Contributor Author

Super, thank you @vpapavas!

Copy link
Contributor

@agavra agavra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 this is super cool, adds lots of clarity to concepts I find myself explaining multiple times

docs/overview/apache-kafka-primer.md Outdated Show resolved Hide resolved
docs/overview/apache-kafka-primer.md Outdated Show resolved Hide resolved
docs/reference/sql/appendix.md Outdated Show resolved Hide resolved

The following table shows all keywords in the language.

| keyword | description | example |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is awesome, but I'm worried about it staying in sync with the code 🤔

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Me too.

SELECT ROWTIME, * FROM s1 EMIT CHANGES;
```

The following table lists all pseudocolumns.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the verdict on WINDOWSTART and WINDOWEND?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They're currently not defined as pseudo columns. They are included by default with a select *.

I guess at the moment they're defined a 'system columns'. There names are reserved for system use.

docs/reference/sql/syntax/lexical-structure.md Outdated Show resolved Hide resolved
Comment on lines +36 to +38
The _topic_ and _partition_ describe which larger collection and subset of events
this particular event belongs to, and the _offset_ describes its exact position within
that larger collection (more on that below).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find this overly confusing.

All this talk of larger collection and subset of events is unnecessarily vague and using alternative nomenclature to what Kafka uses. Given this page is all about the Kafka terms users will need to understand, does it not make sense to use the terms Kafka uses?

Would it not be better to just give a quick outline of what a topic is in Kafka, including that its broken into partitions, then explain that the offset if the offset into a particular topic-partition?

Or, given you explain these concepts in more depth below, simple say:

The offset denotes the position of the record within a specific partition of a topic, (more on these below).

This avoids terms such as collections (topic) and subsets (partitions).


ksqlDB provides higher-level abstractions over a topic through _streams_ and
_tables_. A stream or table is a {{ site.ak }} topic with a registered schema.
The schema controls the shape of records that are allowed to be stored in the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm with Vic here on the controls. Then suggests some kind of check to ensure messages with the wrong schema are not produced a topic by some other producer, and such a check does not exist. This is particularly important distinction for source topics, which ksqlDB only reads from.

Though it's very common for all messages to have the same schema, even this isn't a requirement for some formats, e.g. JSON can handle topics where values have different schemas: the declared stream or table can either define the superset of fields, some common subset, or any other combination.

How about just saying something about ksqlDB adding a SQL abstraction over the data held in Kafka, and that SQL statically typed?

docs/overview/apache-kafka-primer.md Outdated Show resolved Hide resolved
docs/overview/apache-kafka-primer.md Outdated Show resolved Hide resolved
Comment on lines +100 to +102
partition will be consistent with all other records with the same key. When records are
appended, they follow the correct offset order, even in the presence of
failures or faults. When a stream's key content changes because of how a query
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When records are appended, they follow the correct offset order, even in the presence of failures or faults

It's unclear to me what this means. Either a message is or is not appended to a partition. This is controlled by the Kafka broker. It doesn't make sense to me to say that we ensure records are appended with the correct offset order.

SELECT ROWTIME, * FROM s1 EMIT CHANGES;
```

The following table lists all pseudocolumns.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They're currently not defined as pseudo columns. They are included by default with a select *.

I guess at the moment they're defined a 'system columns'. There names are reserved for system use.

Comment on lines +202 to +203
- Adding multiple rows to a table with the same primary key doesn't cause the
subsequent rows to be rejected.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is mixing up the concept of a table with a changelog.

The table can only have one row with a specific primary key. A changelog can contain multiple rows with the same key. The changelog can be materialized into a table.

This is the same as a traditional database.

We should explain this table/changelog duality somewhere ;)

Yes, unfortunately at the moment we can INSERT VALUES into a table with an key for a row that already exists and it will work. However, this is a bug IMHO, (it should be UPSERT not INSERT),.


- At least one digit must be present before or after the decimal point, if
there is one.
- At least one digit must follow the exponent symbol `e`, if there is one.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, e is case-insensitive. So can be E too.

docs/reference/sql/syntax/lexical-structure.md Outdated Show resolved Hide resolved
@MichaelDrogalis MichaelDrogalis merged commit d38746b into master Sep 1, 2020
@MichaelDrogalis MichaelDrogalis deleted the mdrogalis/sql-docs branch September 1, 2020 22:55
JimGalasyn added a commit that referenced this pull request Sep 2, 2020
* docs: ports lexical structure, ddl, and primer docs

* docs: remove old docs

* docs: fix redirects

* docs: remove dead links

* docs: fixes more dead links

* docs: adds appendix

* docs: copy edit new ddl and lexical structure topics (DOCS-5143) (#6046)

* docs: copy edit new sql reference topics

* docs: copy edit lexical structure topic

* docs: more carefully describe partition

* docs: clarify

* docs: link to s/t

* docs: suggestions

* docs: clarify language

* docs: ombine retention & compaction

* docs: address Vicky's feedback

* docs: almog feedback

Co-authored-by: Jim Galasyn <jim.galasyn@confluent.io>

Co-authored-by: Michael Drogalis <michael.drogalis@confluent.io>
sarwarbhuiyan pushed a commit to sarwarbhuiyan/ksql that referenced this pull request Sep 4, 2020
…c#6041)

* docs: ports lexical structure, ddl, and primer docs

* docs: remove old docs

* docs: fix redirects

* docs: remove dead links

* docs: fixes more dead links

* docs: adds appendix

* docs: copy edit new ddl and lexical structure topics (DOCS-5143) (confluentinc#6046)

* docs: copy edit new sql reference topics

* docs: copy edit lexical structure topic

* docs: more carefully describe partition

* docs: clarify

* docs: link to s/t

* docs: suggestions

* docs: clarify language

* docs: ombine retention & compaction

* docs: address Vicky's feedback

* docs: almog feedback

Co-authored-by: Jim Galasyn <jim.galasyn@confluent.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants