Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Process] Added the RFC template docs [#2594] #2601

Merged
merged 11 commits into from
Feb 7, 2024
Merged

Conversation

tdas
Copy link
Contributor

@tdas tdas commented Feb 3, 2024

Which Delta project/connector is this regarding?

  • Spark
  • Standalone
  • Flink
  • Kernel
  • Other - general process

Description

This PR add the necessary template files to formalize the new proposed RFC process to make protocol changes. See #2594 for more details.

How was this patch tested?

N/A

Does this PR introduce any user-facing changes?

No

@tdas tdas linked an issue Feb 3, 2024 that may be closed by this pull request
@tdas tdas changed the title [Process] Added the RFC template docs [Process] Added the RFC template docs [#2594] Feb 3, 2024
@@ -0,0 +1,5 @@
## Table of RFCs

Format: <RFC file name>, proposed <date>, accepted/rejected <date>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: We should differentiate proposed vs. accepted vs. stabilized/rejected dates, no?

As in, somebody proposes feature X; after discussion, the proposal is accepted (= can now start iterating on the implementation and spec changes), or rejected (= thanks but no thanks).

If a deal-breaker is encountered during the development process, then the RFC would transition from "accepted" to "rejected" (or, if we wanted finer grained tracking, "abandoned").

Eventually, somebody proposes to stabilize X. It's either stabilized, or left as "accepted" for the time being, or could be rejected/abandoned (tho the latter outcome would arguably be a community process failure, to let it get so far only to be rejected at the end like that).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update: I just realized that "proposed" here probably means "RFC PR merged" while "accepted" means "RFC formally accepted and PROTOCOL.md updated" ?

That would definitely simplify things!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, would it make sense to have an actual table, so the above becomes actual column headers?

Suggested change
Format: <RFC file name>, proposed <date>, accepted/rejected <date>
|Link to RFC | RFC title | Date proposed | Date accepted | Date rejected |
|-|-|-|-|-|
|...|||||

... which should render as:

Link to RFC RFC title Date proposed Date accepted Date rejected
...

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finally, should we consider having a different table for each state, so it's easier to tell what's going on?

Imagine a feature that spends a long time in development, so it's still in "proposed" state but far down the list below several other features that already "accepted" and/or "rejected." It would be pretty easy to move an entry's row to a different section, and the result might look like this:

Proposed

Link to RFC RFC title Date proposed
...

Accepted

Link to RFC RFC title Date proposed Date accepted
...

Rejected

Link to RFC RFC title Date proposed Date rejected
...

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might also want to link the corresponding github issue as an additional column in the table, since that's where all the discussion is supposed to live?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i like that totally.

Copy link
Contributor Author

@tdas tdas Feb 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And yes, proposed means RFC merged as the RFC merging and index updating will be done together in one PR.


| RFC name | RFC file name | Github issue | Date proposed |
|-|-|-|-|
| My Table Feature | my_table_feature.md | https://github.com/delta-io/delta/issues/XXXX | 2023-02-05 | <!-- remove this when adding the first RFC -->
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since it already exists, would it make sense to use some recently added table feature as an example? e.g.

Accepted

Date proposed Date accepted Github issue RFC link Short description
2023-09-18 2023-10-06 #2072 log_compaction.md Roll up small commits to allow less-frequent checkpointing

This PR could even create a (post facto) .md for the log compaction feature, just so we have a concrete example to link to (can link to it by [file_name.md](file_name.md)). The RFC doc should clearly state that it was created post-facto, but it would otherwise be a "real" RFC (that just happens to be very small and conveniently digestible).

It was basically just two changes: A new section called Log compaction files, and expanding Metadata cleanup to say:

  1. Delete all delta log entries and checkpoint files before the cutOffCheckpoint checkpoint. Also delete all the log compaction files having startVersion <= cutOffCheckpoint's version.

(above uses <ins>...</ins> tags to underline added text, which could complement ~~...~~ strikethrough for deleted text)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aside: TIL that |:-| in markdown table definition left-justifies that column.

In theory, non-breaking dash (&#x2011;) and space (&#160;) should prevent the date column from getting wrapped stupidly like that... but it didn't work when I tried it. Instead, it just causes wrapping within words instead of between them.

But an empty image tag such as <img width=60></img> seems to produce an unbreakable (yet invisible) something, so we can put them in a blank row to enforce minimum column widths.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, TimestampNTZ (proposed 2023-03-01 and accepted 2023-05-04) is probably a better example, since it's actually a table feature, while still being small and simple:

The spec changes it makes:

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This new proposal seems to be the best candidate to add as an example - #2598

How about we work together to add that using this process? it will be a good real exercise to polish this process.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, in-commit timestamps looks even more suitable: it not only has an issue - #2532 - but also an RFC .md proposal #2599. Plus, it's a dependency of managed commits #2598

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ohh i agree. lets use this!
thank you for your cooperation!

Copy link
Collaborator

@ryan-johnson-databricks ryan-johnson-databricks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two minor nits, but they don't need to block initial merge if we have consensus to adopt the RFC process.

protocol_rfcs/README.md Show resolved Hide resolved
protocol_rfcs/README.md Outdated Show resolved Hide resolved
tdas and others added 2 commits February 6, 2024 19:02
Co-authored-by: Ryan Johnson <ryan.johnson@databricks.com>
Co-authored-by: Ryan Johnson <ryan.johnson@databricks.com>
protocol_rfcs/README.md Outdated Show resolved Hide resolved
Copy link
Collaborator

@ryan-johnson-databricks ryan-johnson-databricks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great. Two last clarifications and I expect we're good to go?

protocol_rfcs/README.md Outdated Show resolved Hide resolved

| RFC name | RFC file | Github issue | Date proposed |
|:-|:-|:-|:-|
| My Table Feature | my_table_feature.md | https://github.com/delta-io/delta/issues/XXXX | 2023-02-05 | <!-- remove this when adding the first RFC -->
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we're likely sorting these by date, it should probably be the first column, not the last?
Putting the RFC name/title last would also give more space to grow? e.g.

Date proposed RFC file Github issue RFC title
2024-01-17 in-commit-timestamps.md #2532 Reliable time travel with in-commit timestamps

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aah. great point

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

tdas and others added 3 commits February 7, 2024 17:05
@tdas
Copy link
Contributor Author

tdas commented Feb 7, 2024

Merging this to unblock the other proposals. If there are tweaks we can do it in those PRs as well.

@tdas tdas merged commit e7959fe into delta-io:master Feb 7, 2024
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants