Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
CHANGELOG: Fix #13998 which appeared in 0.2.58 and prevented reading from a networked filesystem mounted within the filesystem of the worker node for certain pipelines (those that did not trigger "lowering").
We use the IndexReader in
PartitionNativeIntervalReader
,PartitionNativeReaderIndexed
, andPartitionZippedIndexedNativeReader
.PartitionNativeIntervalReader
is only used byquery_table
.PartitionNativeReaderIndexed
is only used byIndexedRVDSpec2.readTableStage
which is used byTableNativeReader
when there is a new partitioner.PartitionZippedIndexedNativeReader
is only sued byAbstractRVDSpec.readZippedLowered
when there is a new partitioner.Two is for tables, three is for matrix tables. In
readZippedLowered
we explicitly drop the file protocol:We have done this, by various names, since this lowered code path was added. I added
removeFileProtocol
because stripping the protocol in Query-on-Batch prevented the reading and writing of gs:// URIs, the only URIs I could read in QoB.uriPath
(the function whose use I replaced withremoveFileProtocol
) was added by Cotton a very long time ago. It seems he added it so that he could use HDFS to generate a temporary file path on the local filesystem but pass the file path to binary tools that know nothing of HDFS and file:// URIs.#9522 added the lowered code path and thus introduced this bug. It attempted to mirror the extant code in
readIndexedPartitions
which does not strip any protocols from the path.This has gone undetected because we never try to read data through the OS's filesystem. We always use gs://, Azure, or s3:// because we do not test in environments that have a networked file system mounted in the OS's filesystem. To replicate this bug (and add a test for it), we would need a cluster with a lustre file system (or another networked filesystem). This would be a fairly large lift. The fix is trivial: just never intentionally strip the protocol!