Skip to content

Commit

Permalink
[query] fix #13998
Browse files Browse the repository at this point in the history
CHANGELOG: Fix #13998 which appeared in 0.2.58 and prevented reading from a networked filesystem mounted within the filesystem of the worker node for certain pipelines (those that did not trigger "lowering").

We use the IndexReader in `PartitionNativeIntervalReader`, `PartitionNativeReaderIndexed`, and
`PartitionZippedIndexedNativeReader`.

1. `PartitionNativeIntervalReader` is only used by `query_table`.

2. `PartitionNativeReaderIndexed` is only used by `IndexedRVDSpec2.readTableStage` which is used by
   `TableNativeReader` when there is a new partitioner.

3. `PartitionZippedIndexedNativeReader` is only sued by `AbstractRVDSpec.readZippedLowered` when
   there is a new partitioner.

Two is for tables, three is for matrix tables. In `readZippedLowered` we explicitly [drop the file
protocol](https://github.com/hail-is/hail/blob/1dedf3c63f9aabf1b6ce538165360056f82f76e4/hail/src/main/scala/is/hail/rvd/AbstractRVDSpec.scala#L154-L155):

```
        val absPathLeft = removeFileProtocol(pathLeft)
        val absPathRight = removeFileProtocol(pathRight)
```

We have done this, by various names, since this lowered code path was added. I added
`removeFileProtocol` because stripping the protocol in Query-on-Batch prevented the reading and
writing of gs:// URIs, the only URIs I could read in QoB. `uriPath` (the function whose use I
replaced with `removeFileProtocol`) was added by Cotton [a very long time
ago](92a9936). It seems he
added it so that he could use HDFS to generate a temporary file path on the local filesystem but
pass the file path to binary tools that know nothing of HDFS and file:// URIs.

[`readIndexedPartitions`](https://github.com/hail-is/hail/blob/2b0aded9206849252b453dd80710cea8d2156793/hail/src/main/scala/is/hail/HailContext.scala#L421-L440)
which *does not* strip any protocols from the path.

This has gone undetected because we never try to read data through the OS's filesystem. We always
use gs://, Azure, or s3:// because we do not test in environments that have a networked file system
mounted in the OS's filesystem. To replicate this bug (and add a test for it), we would need a
cluster with a lustre file system (or another networked filesystem). This would be a fairly large
lift. The fix is trivial: just never intentionally strip the protocol!
  • Loading branch information
Dan King committed Nov 30, 2023
1 parent 1dedf3c commit 8abc83f
Show file tree
Hide file tree
Showing 2 changed files with 3 additions and 12 deletions.
6 changes: 3 additions & 3 deletions hail/src/main/scala/is/hail/rvd/AbstractRVDSpec.scala
Original file line number Diff line number Diff line change
Expand Up @@ -151,8 +151,8 @@ object AbstractRVDSpec {
indexSpecLeft, indexSpecRight,
specLeft.key, uidFieldName)

val absPathLeft = removeFileProtocol(pathLeft)
val absPathRight = removeFileProtocol(pathRight)
val absPathLeft = pathLeft
val absPathRight = pathRight
val partsAndIntervals: IndexedSeq[(String, Interval)] = if (specLeft.key.isEmpty) {
specLeft.partFiles.map { p => (p, null) }
} else {
Expand Down Expand Up @@ -438,7 +438,7 @@ case class IndexedRVDSpec2(
val rSpec = typedCodecSpec
val reader = ir.PartitionNativeReaderIndexed(rSpec, indexSpec, part.kType.fieldNames, uidFieldName)

val absPath = removeFileProtocol(path)
val absPath = path
val partPaths = tmpPartitioner.rangeBounds.map { b => partFiles(part.lowerBoundInterval(b)) }


Expand Down
9 changes: 0 additions & 9 deletions hail/src/main/scala/is/hail/utils/package.scala
Original file line number Diff line number Diff line change
Expand Up @@ -351,15 +351,6 @@ package object utils extends Logging

def uriPath(uri: String): String = new URI(uri).getPath

def removeFileProtocol(uriString: String): String = {
val uri = new URI(uriString)
if (uri.getScheme == "file") {
uri.getPath
} else {
uri.toString
}
}

// NB: can't use Nothing here because it is not a super type of Null
private object flattenOrNullInstance extends FlattenOrNull[Array]

Expand Down

0 comments on commit 8abc83f

Please sign in to comment.