DataShard: read iterators may have a broken external reference precharging #7769
Labels
area/datashard
Issues related to datashard tablets (relational table partitions)
bug
Something isn't working
When reviewing code for #7674 I noticed that
LastProcessedKey
value when precharging missing references may be incorrect. The problem is that @SammyVimes changed theLastProcessedKeyErased
toLastProcessedKeyErasedOrMissing
, which seemed ok at first glance. In reality this flag is not serialized in responses and may cause rows to be missing in the result.Previously
LastProcessedKeyErased
was used as a hack/optimization when resuming internally, to force erased sub-ranges to overlap and have a side-effect of stitching them together. The result remains correct even when query is restarted externally with the last reported continuation token's key. When the query is restarted externally this flag is lost and effectively becomes false, which is totally fine because erased rows are skipped and the last erased row was already accounted for. Resuming with this flag set to false is correct (in a sense that the same row is not visited twice), though may produce sub-optimal caching behavior.Note however that when we start "precharing" missing references we are positioned on the result row, which is not part of the result yet. When
LastProcessedKey
is set to that row's key and we already have some rows processed, we would produce a result that marks this row as "processed", when in reality it is not. After a temporary network failure the read actor will restart the query externally from this last known key, which would cause it to skip the row. This row, however, has never been part of the result, and now it never will.Since iterator cannot travel in time and the last known key is already updated, we need to always page fault when encountering missing references. The
LastProcessedKeyErased
flag can only be used as an erased range stitching hack.The text was updated successfully, but these errors were encountered: