This RFC proposes a new Writer table feature called In-Commit Timestamps. When enabled, commit metadata includes a monotonically increasing timestamp that allows for reliable TIMESTAMP AS OF time travel even if filesystem operations change a commit file's modification timestamp.
For further discussions about this protocol change, please refer to the Github issue - #2532
Change to existing section
A delta file can optionally contain additional provenance information about what higher-level operation was being performed as well as who executed it.
Implementations are free to store any valid JSON object literal as the commitInfo
action unless some table feature (e.g. In-Commit Timestamps) imposes additional requirements on the data.
When In-Commit Timestamp are enabled, writers are required to include a commitInfo action with every commit, which must include the inCommitTimestamp
field.
Change to existing section
... 3. Change data readers should return the following extra columns:
Field Name | Data Type | Description |
---|---|---|
_commit_version | Long |
The table version containing the change. This can be derived from the name of the Delta log file that contains actions. |
_commit_timestamp | Timestamp |
The timestamp associated when the commit was created. inCommitTimestamp stored in the commitInfo action of the Delta log file with the version __commit_version . |
New Section after the Clustered Table section
The In-Commit Timestamps writer feature strongly associates a monotonically increasing timestamp with each commit by storing it in the commit's metadata.
Enablement:
- The table must be on Writer Version 7.
- The feature
inCommitTimestamps
must exist in the tableprotocol
'swriterFeatures
. - The table property
delta.enableInCommitTimestamps
must be set totrue
.
When In-Commit Timestamps is enabled, then:
- Writers must write the
commitInfo
(see Commit Provenance Information) action in the commit. - The
commitInfo
action must be the first action in the commit. - The
commitInfo
action must include a field namedinCommitTimestamp
, of typelong
(see Primitive Types), which represents the time (in milliseconds since the Unix epoch) when the commit is considered to have succeeded. It is the larger of two values:- The Unix wall clock time at which the writer attempted the commit
- One millisecond later than the previous commit's
inCommitTimestamp
- If the table has commits from a period when this feature was not enabled, provenance information around when this feature was enabled must be tracked in table properties:
- The property
delta.inCommitTimestampEnablementVersion
must be used to track the version of the table when this feature was enabled. - The property
delta.inCommitTimestampEnablementTimestamp
must be the same as theinCommitTimestamp
of the commit when this feature was enabled.
- The property
- The
inCommitTimestamp
of the commit that enables this feature must be greater than the file modification time of the immediately preceding commit.
For tables with In-Commit timestamps enabled, readers should use the inCommitTimestamp
as the commit timestamp for operations like time travel.
If a table has commits from a period before In-Commit timestamps were enabled, the table properties delta.inCommitTimestampEnablementVersion
and delta.inCommitTimestampEnablementTimestamp
would be set and can be used to identify commits that don't have inCommitTimestamp
.
To correctly determine the commit timestamp for these tables, readers can use the following rules:
- For commits with version >=
delta.inCommitTimestampEnablementVersion
, readers should use theinCommitTimestamp
field of thecommitInfo
action. - For commits with version <
delta.inCommitTimestampEnablementVersion
, readers should use the file modification timestamp.
Furthermore, when attempting timestamp-based time travel where table state must be fetched as of timestamp X
, readers should use the following rules:
- If
timestamp X
>=delta.inCommitTimestampEnablementTimestamp
, only table versions >=delta.inCommitTimestampEnablementVersion
should be considered for the query. - Otherwise, only table versions less than
delta.inCommitTimestampEnablementVersion
should be considered for the query.