-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Process] Added the RFC template docs [#2594] #2601
Conversation
protocol_rfcs/index.md
Outdated
@@ -0,0 +1,5 @@ | |||
## Table of RFCs | |||
|
|||
Format: <RFC file name>, proposed <date>, accepted/rejected <date> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion: We should differentiate proposed vs. accepted vs. stabilized/rejected dates, no?
As in, somebody proposes feature X; after discussion, the proposal is accepted (= can now start iterating on the implementation and spec changes), or rejected (= thanks but no thanks).
If a deal-breaker is encountered during the development process, then the RFC would transition from "accepted" to "rejected" (or, if we wanted finer grained tracking, "abandoned").
Eventually, somebody proposes to stabilize X. It's either stabilized, or left as "accepted" for the time being, or could be rejected/abandoned (tho the latter outcome would arguably be a community process failure, to let it get so far only to be rejected at the end like that).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update: I just realized that "proposed" here probably means "RFC PR merged" while "accepted" means "RFC formally accepted and PROTOCOL.md updated" ?
That would definitely simplify things!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, would it make sense to have an actual table, so the above becomes actual column headers?
Format: <RFC file name>, proposed <date>, accepted/rejected <date> | |
|Link to RFC | RFC title | Date proposed | Date accepted | Date rejected | | |
|-|-|-|-|-| | |
|...||||| |
... which should render as:
Link to RFC | RFC title | Date proposed | Date accepted | Date rejected |
---|---|---|---|---|
... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Finally, should we consider having a different table for each state, so it's easier to tell what's going on?
Imagine a feature that spends a long time in development, so it's still in "proposed" state but far down the list below several other features that already "accepted" and/or "rejected." It would be pretty easy to move an entry's row to a different section, and the result might look like this:
Proposed
Link to RFC | RFC title | Date proposed |
---|---|---|
... |
Accepted
Link to RFC | RFC title | Date proposed | Date accepted |
---|---|---|---|
... |
Rejected
Link to RFC | RFC title | Date proposed | Date rejected |
---|---|---|---|
... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might also want to link the corresponding github issue as an additional column in the table, since that's where all the discussion is supposed to live?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i like that totally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And yes, proposed means RFC merged as the RFC merging and index updating will be done together in one PR.
protocol_rfcs/index.md
Outdated
|
||
| RFC name | RFC file name | Github issue | Date proposed | | ||
|-|-|-|-| | ||
| My Table Feature | my_table_feature.md | https://github.com/delta-io/delta/issues/XXXX | 2023-02-05 | <!-- remove this when adding the first RFC --> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since it already exists, would it make sense to use some recently added table feature as an example? e.g.
Accepted
Date proposed | Date accepted | Github issue | RFC link | Short description |
---|---|---|---|---|
2023-09-18 | 2023-10-06 | #2072 | log_compaction.md | Roll up small commits to allow less-frequent checkpointing |
This PR could even create a (post facto) .md for the log compaction feature, just so we have a concrete example to link to (can link to it by [file_name.md](file_name.md)
). The RFC doc should clearly state that it was created post-facto, but it would otherwise be a "real" RFC (that just happens to be very small and conveniently digestible).
It was basically just two changes: A new section called Log compaction files, and expanding Metadata cleanup to say:
- Delete all delta log entries and checkpoint files before the cutOffCheckpoint checkpoint. Also delete all the log compaction files having startVersion <= cutOffCheckpoint's version.
(above uses <ins>...</ins>
tags to underline added text, which could complement ~~...~~
strikethrough for deleted text)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aside: TIL that |:-|
in markdown table definition left-justifies that column.
In theory, non-breaking dash (‑
) and space ( 
) should prevent the date column from getting wrapped stupidly like that... but it didn't work when I tried it. Instead, it just causes wrapping within words instead of between them.
But an empty image tag such as <img width=60></img>
seems to produce an unbreakable (yet invisible) something, so we can put them in a blank row to enforce minimum column widths.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, TimestampNTZ (proposed 2023-03-01 and accepted 2023-05-04) is probably a better example, since it's actually a table feature, while still being small and simple:
The spec changes it makes:
- New section: Timestamp without timezone (TimestampNTZ)
- Add a new row to Valid feature names in table features table
- Add a new row to Partition value serialization table
- Add a new row to Primitive types table, plus the following new text below the table: "See Parquet timestamp type for more details about timestamp and
isAdjustedToUTC
."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This new proposal seems to be the best candidate to add as an example - #2598
How about we work together to add that using this process? it will be a good real exercise to polish this process.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ohh i agree. lets use this!
thank you for your cooperation!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two minor nits, but they don't need to block initial merge if we have consensus to adopt the RFC process.
Co-authored-by: Ryan Johnson <ryan.johnson@databricks.com>
Co-authored-by: Ryan Johnson <ryan.johnson@databricks.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great. Two last clarifications and I expect we're good to go?
protocol_rfcs/README.md
Outdated
|
||
| RFC name | RFC file | Github issue | Date proposed | | ||
|:-|:-|:-|:-| | ||
| My Table Feature | my_table_feature.md | https://github.com/delta-io/delta/issues/XXXX | 2023-02-05 | <!-- remove this when adding the first RFC --> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we're likely sorting these by date, it should probably be the first column, not the last?
Putting the RFC name/title last would also give more space to grow? e.g.
Date proposed | RFC file | Github issue | RFC title |
---|---|---|---|
2024-01-17 | in-commit-timestamps.md | #2532 | Reliable time travel with in-commit timestamps |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
aah. great point
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
Co-authored-by: Ryan Johnson <ryan.johnson@databricks.com>
Merging this to unblock the other proposals. If there are tweaks we can do it in those PRs as well. |
Which Delta project/connector is this regarding?
Description
This PR add the necessary template files to formalize the new proposed RFC process to make protocol changes. See #2594 for more details.
How was this patch tested?
N/A
Does this PR introduce any user-facing changes?
No