Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
CHANGELOG: Fix #13998 which appeared in 0.2.58 and prevented reading from a networked filesystem mounted within the filesystem of the worker node for certain pipelines (those that did not trigger "lowering"). We use the IndexReader in `PartitionNativeIntervalReader`, `PartitionNativeReaderIndexed`, and `PartitionZippedIndexedNativeReader`. 1. `PartitionNativeIntervalReader` is only used by `query_table`. 2. `PartitionNativeReaderIndexed` is only used by `IndexedRVDSpec2.readTableStage` which is used by `TableNativeReader` when there is a new partitioner. 3. `PartitionZippedIndexedNativeReader` is only sued by `AbstractRVDSpec.readZippedLowered` when there is a new partitioner. Two is for tables, three is for matrix tables. In `readZippedLowered` we explicitly [drop the file protocol](https://github.com/hail-is/hail/blob/1dedf3c63f9aabf1b6ce538165360056f82f76e4/hail/src/main/scala/is/hail/rvd/AbstractRVDSpec.scala#L154-L155): ``` val absPathLeft = removeFileProtocol(pathLeft) val absPathRight = removeFileProtocol(pathRight) ``` We have done this, by various names, since this lowered code path was added. I added `removeFileProtocol` because stripping the protocol in Query-on-Batch prevented the reading and writing of gs:// URIs, the only URIs I could read in QoB. `uriPath` (the function whose use I replaced with `removeFileProtocol`) was added by Cotton [a very long time ago](92a9936). It seems he added it so that he could use HDFS to generate a temporary file path on the local filesystem but pass the file path to binary tools that know nothing of HDFS and file:// URIs. [`readIndexedPartitions`](https://github.com/hail-is/hail/blob/2b0aded9206849252b453dd80710cea8d2156793/hail/src/main/scala/is/hail/HailContext.scala#L421-L440) which *does not* strip any protocols from the path. This has gone undetected because we never try to read data through the OS's filesystem. We always use gs://, Azure, or s3:// because we do not test in environments that have a networked file system mounted in the OS's filesystem. To replicate this bug (and add a test for it), we would need a cluster with a lustre file system (or another networked filesystem). This would be a fairly large lift. The fix is trivial: just never intentionally strip the protocol!
- Loading branch information