Skip to content

Commit

Permalink
[query-service] teach query service to read MTs and Ts created by Spa…
Browse files Browse the repository at this point in the history
…rk (#10184)

* [query-service] teach query service to read MTs and Ts created by Spark

Hail-on-Spark uses HadoopFS which emulates directories by creating size-zero files with
the name `gs://bucket/dirname/`. Note: the object name literally ends in a slash. Such files
should not be included in `listStatus` (they should always be empty anyway). Unfortunately,
my fix in #9914 was wrong because `GoogleStorageFileStatus` removes
the trailing slash. This prevented the path from matching `path`, which always ends in a `/`.

* fix
  • Loading branch information
danking authored Mar 11, 2021
1 parent 72dc5ee commit 2d8ba29
Showing 1 changed file with 3 additions and 2 deletions.
5 changes: 3 additions & 2 deletions hail/src/main/scala/is/hail/io/fs/GoogleStorageFS.scala
Original file line number Diff line number Diff line change
Expand Up @@ -317,8 +317,9 @@ class GoogleStorageFS(serviceAccountKey: String) extends FS {
val blobs = storage.list(bucket, BlobListOption.prefix(path), BlobListOption.currentDirectory())

blobs.getValues.iterator.asScala
.map(b => GoogleStorageFileStatus(b))
.filter(fs => !(fs.isDirectory && fs.getPath == path))
.map(b => (b, GoogleStorageFileStatus(b)))
.filter { case (b, fs) => !(fs.isDirectory && b.getName == path) } // elide directory markers created by Hadoop
.map { case (b, fs) => fs }
.toArray
}

Expand Down

0 comments on commit 2d8ba29

Please sign in to comment.