Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[query] LocalBackend hangs if given a gs:// URL #13904

Closed
danking opened this issue Oct 25, 2023 · 5 comments · Fixed by #14407
Closed

[query] LocalBackend hangs if given a gs:// URL #13904

danking opened this issue Oct 25, 2023 · 5 comments · Fixed by #14407
Assignees

Comments

@danking
Copy link
Contributor

danking commented Oct 25, 2023

What happened?

The following should not hang, but it does.

(base) dking@wm28c-761 hail % HAIL_QUERY_BACKEND=local ipython
Python 3.10.9 (main, Jan 11 2023, 09:18:18) [Clang 14.0.6 ]
Type 'copyright', 'credits' or 'license' for more information
IPython 8.16.1 -- An enhanced Interactive Python. Type '?' for help.

In [1]: import hail as hl
   ...: import os
   ...: hl.utils.range_table(1).write('gs://danking/test_hail_in_notebook.ht')

Writing to a local file path causes no issue.

Version

0.2.124

Relevant log output

No response

@ehigham
Copy link
Member

ehigham commented Feb 29, 2024

Requires gcs connector

@ehigham ehigham self-assigned this Feb 29, 2024
@ehigham
Copy link
Member

ehigham commented Mar 1, 2024

I got a timeout!

SocketTimeoutException: connect timed out

Java stack trace:
java.io.IOException: Error getting access token from metadata server at: http://169.254.169.254/computeMetadata/v1/instance/service-accounts/default/token
	at com.google.cloud.hadoop.repackaged.gcs.com.google.cloud.hadoop.util.CredentialFactory.getCredentialFromMetadataServiceAccount(CredentialFactory.java:254)
	at com.google.cloud.hadoop.repackaged.gcs.com.google.cloud.hadoop.util.CredentialFactory.getCredential(CredentialFactory.java:406)
	at com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystemBase.getCredential(GoogleHadoopFileSystemBase.java:1471)
	at com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystemBase.createGcsFs(GoogleHadoopFileSystemBase.java:1630)
	at com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystemBase.configure(GoogleHadoopFileSystemBase.java:1612)
	at com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystemBase.initialize(GoogleHadoopFileSystemBase.java:507)
	at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3469)
	at org.apache.hadoop.fs.FileSystem.access$300(FileSystem.java:174)
	at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3574)
	at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3521)
	at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:540)
	at org.apache.hadoop.fs.Path.getFileSystem(Path.java:365)
	at is.hail.io.fs.HadoopFSURL.<init>(HadoopFS.scala:76)
	at is.hail.io.fs.HadoopFS.parseUrl(HadoopFS.scala:88)
	at is.hail.io.fs.HadoopFS.parseUrl(HadoopFS.scala:85)
	at is.hail.io.fs.FS.exists(FS.scala:618)
	at is.hail.io.fs.FS.exists$(FS.scala:618)
	at is.hail.io.fs.HadoopFS.exists(HadoopFS.scala:85)
	at __C5Compiled.apply(Emit.scala)
	at is.hail.backend.local.LocalBackend.$anonfun$_jvmLowerAndExecute$3(LocalBackend.scala:223)
	at is.hail.backend.local.LocalBackend.$anonfun$_jvmLowerAndExecute$3$adapted(LocalBackend.scala:223)
	at is.hail.backend.ExecuteContext.$anonfun$scopedExecution$1(ExecuteContext.scala:144)
	at is.hail.utils.package$.using(package.scala:664)
	at is.hail.backend.ExecuteContext.scopedExecution(ExecuteContext.scala:144)
	at is.hail.backend.local.LocalBackend.$anonfun$_jvmLowerAndExecute$2(LocalBackend.scala:223)
	at is.hail.utils.ExecutionTimer.time(ExecutionTimer.scala:84)
	at is.hail.backend.local.LocalBackend._jvmLowerAndExecute(LocalBackend.scala:223)
	at is.hail.backend.local.LocalBackend._execute(LocalBackend.scala:249)
	at is.hail.backend.local.LocalBackend.$anonfun$execute$2(LocalBackend.scala:314)
	at is.hail.utils.ExecutionTimer.time(ExecutionTimer.scala:84)
	at is.hail.backend.local.LocalBackend.$anonfun$execute$1(LocalBackend.scala:309)
	at is.hail.backend.local.LocalBackend.$anonfun$execute$1$adapted(LocalBackend.scala:308)
	at is.hail.backend.ExecuteContext$.$anonfun$scoped$3(ExecuteContext.scala:78)
	at is.hail.utils.package$.using(package.scala:664)
	at is.hail.backend.ExecuteContext$.$anonfun$scoped$2(ExecuteContext.scala:78)
	at is.hail.utils.package$.using(package.scala:664)
	at is.hail.annotations.RegionPool$.scoped(RegionPool.scala:13)
	at is.hail.backend.ExecuteContext$.scoped(ExecuteContext.scala:65)
	at is.hail.backend.local.LocalBackend.$anonfun$withExecuteContext$2(LocalBackend.scala:144)
	at is.hail.utils.ExecutionTimer$.time(ExecutionTimer.scala:55)
	at is.hail.utils.ExecutionTimer$.logTime(ExecutionTimer.scala:62)
	at is.hail.backend.local.LocalBackend.withExecuteContext(LocalBackend.scala:130)
	at is.hail.backend.local.LocalBackend.execute(LocalBackend.scala:308)
	at is.hail.backend.BackendHttpHandler.handle(BackendServer.scala:88)
	at jdk.httpserver/com.sun.net.httpserver.Filter$Chain.doFilter(Filter.java:77)
	at jdk.httpserver/sun.net.httpserver.AuthFilter.doFilter(AuthFilter.java:82)
	at jdk.httpserver/com.sun.net.httpserver.Filter$Chain.doFilter(Filter.java:80)
	at jdk.httpserver/sun.net.httpserver.ServerImpl$Exchange$LinkHandler.handle(ServerImpl.java:692)
	at jdk.httpserver/com.sun.net.httpserver.Filter$Chain.doFilter(Filter.java:77)
	at jdk.httpserver/sun.net.httpserver.ServerImpl$Exchange.run(ServerImpl.java:664)
	at jdk.httpserver/sun.net.httpserver.ServerImpl$DefaultExecutor.execute(ServerImpl.java:159)
	at jdk.httpserver/sun.net.httpserver.ServerImpl$Dispatcher.handle(ServerImpl.java:442)
	at jdk.httpserver/sun.net.httpserver.ServerImpl$Dispatcher.run(ServerImpl.java:408)
	at java.base/java.lang.Thread.run(Thread.java:834)

java.net.SocketTimeoutException: connect timed out
	at java.base/java.net.PlainSocketImpl.socketConnect(Native Method)
	at java.base/java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:399)
	at java.base/java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:242)
	at java.base/java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:224)
	at java.base/java.net.Socket.connect(Socket.java:591)
	at java.base/sun.net.NetworkClient.doConnect(NetworkClient.java:177)
	at java.base/sun.net.www.http.HttpClient.openServer(HttpClient.java:474)
	at java.base/sun.net.www.http.HttpClient.openServer(HttpClient.java:569)
	at java.base/sun.net.www.http.HttpClient.<init>(HttpClient.java:242)
	at java.base/sun.net.www.http.HttpClient.New(HttpClient.java:341)
	at java.base/sun.net.www.http.HttpClient.New(HttpClient.java:362)
	at java.base/sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:1242)
	at java.base/sun.net.www.protocol.http.HttpURLConnection.plainConnect0(HttpURLConnection.java:1181)
	at java.base/sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:1075)
	at java.base/sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:1009)
	at com.google.cloud.hadoop.repackaged.gcs.com.google.api.client.http.javanet.NetHttpRequest.execute(NetHttpRequest.java:151)
	at com.google.cloud.hadoop.repackaged.gcs.com.google.api.client.http.javanet.NetHttpRequest.execute(NetHttpRequest.java:84)
	at com.google.cloud.hadoop.repackaged.gcs.com.google.api.client.http.HttpRequest.execute(HttpRequest.java:1012)
	at com.google.cloud.hadoop.repackaged.gcs.com.google.cloud.hadoop.util.CredentialFactory$ComputeCredentialWithRetry.executeRefreshToken(CredentialFactory.java:196)
	at com.google.cloud.hadoop.repackaged.gcs.com.google.api.client.auth.oauth2.Credential.refreshToken(Credential.java:470)
	at com.google.cloud.hadoop.repackaged.gcs.com.google.cloud.hadoop.util.CredentialFactory.getCredentialFromMetadataServiceAccount(CredentialFactory.java:251)
	at com.google.cloud.hadoop.repackaged.gcs.com.google.cloud.hadoop.util.CredentialFactory.getCredential(CredentialFactory.java:406)
	at com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystemBase.getCredential(GoogleHadoopFileSystemBase.java:1471)
	at com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystemBase.createGcsFs(GoogleHadoopFileSystemBase.java:1630)
	at com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystemBase.configure(GoogleHadoopFileSystemBase.java:1612)
	at com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystemBase.initialize(GoogleHadoopFileSystemBase.java:507)
	at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3469)
	at org.apache.hadoop.fs.FileSystem.access$300(FileSystem.java:174)
	at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3574)
	at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3521)
	at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:540)
	at org.apache.hadoop.fs.Path.getFileSystem(Path.java:365)
	at is.hail.io.fs.HadoopFSURL.<init>(HadoopFS.scala:76)
	at is.hail.io.fs.HadoopFS.parseUrl(HadoopFS.scala:88)
	at is.hail.io.fs.HadoopFS.parseUrl(HadoopFS.scala:85)
	at is.hail.io.fs.FS.exists(FS.scala:618)
	at is.hail.io.fs.FS.exists$(FS.scala:618)
	at is.hail.io.fs.HadoopFS.exists(HadoopFS.scala:85)
	at __C5Compiled.apply(Emit.scala)
	at is.hail.backend.local.LocalBackend.$anonfun$_jvmLowerAndExecute$3(LocalBackend.scala:223)
	at is.hail.backend.local.LocalBackend.$anonfun$_jvmLowerAndExecute$3$adapted(LocalBackend.scala:223)
	at is.hail.backend.ExecuteContext.$anonfun$scopedExecution$1(ExecuteContext.scala:144)
	at is.hail.utils.package$.using(package.scala:664)
	at is.hail.backend.ExecuteContext.scopedExecution(ExecuteContext.scala:144)
	at is.hail.backend.local.LocalBackend.$anonfun$_jvmLowerAndExecute$2(LocalBackend.scala:223)
	at is.hail.utils.ExecutionTimer.time(ExecutionTimer.scala:84)
	at is.hail.backend.local.LocalBackend._jvmLowerAndExecute(LocalBackend.scala:223)
	at is.hail.backend.local.LocalBackend._execute(LocalBackend.scala:249)
	at is.hail.backend.local.LocalBackend.$anonfun$execute$2(LocalBackend.scala:314)
	at is.hail.utils.ExecutionTimer.time(ExecutionTimer.scala:84)
	at is.hail.backend.local.LocalBackend.$anonfun$execute$1(LocalBackend.scala:309)
	at is.hail.backend.local.LocalBackend.$anonfun$execute$1$adapted(LocalBackend.scala:308)
	at is.hail.backend.ExecuteContext$.$anonfun$scoped$3(ExecuteContext.scala:78)
	at is.hail.utils.package$.using(package.scala:664)
	at is.hail.backend.ExecuteContext$.$anonfun$scoped$2(ExecuteContext.scala:78)
	at is.hail.utils.package$.using(package.scala:664)
	at is.hail.annotations.RegionPool$.scoped(RegionPool.scala:13)
	at is.hail.backend.ExecuteContext$.scoped(ExecuteContext.scala:65)
	at is.hail.backend.local.LocalBackend.$anonfun$withExecuteContext$2(LocalBackend.scala:144)
	at is.hail.utils.ExecutionTimer$.time(ExecutionTimer.scala:55)
	at is.hail.utils.ExecutionTimer$.logTime(ExecutionTimer.scala:62)
	at is.hail.backend.local.LocalBackend.withExecuteContext(LocalBackend.scala:130)
	at is.hail.backend.local.LocalBackend.execute(LocalBackend.scala:308)
	at is.hail.backend.BackendHttpHandler.handle(BackendServer.scala:88)
	at jdk.httpserver/com.sun.net.httpserver.Filter$Chain.doFilter(Filter.java:77)
	at jdk.httpserver/sun.net.httpserver.AuthFilter.doFilter(AuthFilter.java:82)
	at jdk.httpserver/com.sun.net.httpserver.Filter$Chain.doFilter(Filter.java:80)
	at jdk.httpserver/sun.net.httpserver.ServerImpl$Exchange$LinkHandler.handle(ServerImpl.java:692)
	at jdk.httpserver/com.sun.net.httpserver.Filter$Chain.doFilter(Filter.java:77)
	at jdk.httpserver/sun.net.httpserver.ServerImpl$Exchange.run(ServerImpl.java:664)
	at jdk.httpserver/sun.net.httpserver.ServerImpl$DefaultExecutor.execute(ServerImpl.java:159)
	at jdk.httpserver/sun.net.httpserver.ServerImpl$Dispatcher.handle(ServerImpl.java:442)
	at jdk.httpserver/sun.net.httpserver.ServerImpl$Dispatcher.run(ServerImpl.java:408)
	at java.base/java.lang.Thread.run(Thread.java:834)




Hail version: 0.2.128-ce3ca9c77507
Error summary: SocketTimeoutException: connect timed out
  File "/home/edmund/.local/src/hail/hail/python/hail/backend/py4j_backend.py", line 223, in _rpc
    raise fatal_error_from_java_error_triplet(
  File "/home/edmund/.local/src/hail/hail/python/hail/backend/backend.py", line 190, in execute
    raise e.maybe_user_error(ir) from None
  File "/home/edmund/.local/src/hail/hail/python/hail/backend/backend.py", line 190, in execute
    raise e.maybe_user_error(ir) from None
  File "/home/edmund/.local/src/hail/hail/python/hail/table.py", line 2002, in write
    Env.backend().execute(
  File "/home/edmund/.local/src/hail/hail/python/hail/typecheck/check.py", line 584, in wrapper
    return __original_func(*args_, **kwargs_)
  File "/home/edmund/.local/src/hail/test.py", line 6, in <module>
    ht.write('gs://ehigham-hail-tmp/test_hail_in_notebook.ht')
hail.utils.java.FatalError: SocketTimeoutException: connect timed out

Java stack trace:
java.io.IOException: Error getting access token from metadata server at: http://169.254.169.254/computeMetadata/v1/instance/service-accounts/default/token
	at com.google.cloud.hadoop.repackaged.gcs.com.google.cloud.hadoop.util.CredentialFactory.getCredentialFromMetadataServiceAccount(CredentialFactory.java:254)
	at com.google.cloud.hadoop.repackaged.gcs.com.google.cloud.hadoop.util.CredentialFactory.getCredential(CredentialFactory.java:406)
	at com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystemBase.getCredential(GoogleHadoopFileSystemBase.java:1471)
	at com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystemBase.createGcsFs(GoogleHadoopFileSystemBase.java:1630)
	at com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystemBase.configure(GoogleHadoopFileSystemBase.java:1612)
	at com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystemBase.initialize(GoogleHadoopFileSystemBase.java:507)
	at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3469)
	at org.apache.hadoop.fs.FileSystem.access$300(FileSystem.java:174)
	at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3574)
	at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3521)
	at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:540)
	at org.apache.hadoop.fs.Path.getFileSystem(Path.java:365)
	at is.hail.io.fs.HadoopFSURL.<init>(HadoopFS.scala:76)
	at is.hail.io.fs.HadoopFS.parseUrl(HadoopFS.scala:88)
	at is.hail.io.fs.HadoopFS.parseUrl(HadoopFS.scala:85)
	at is.hail.io.fs.FS.exists(FS.scala:618)
	at is.hail.io.fs.FS.exists$(FS.scala:618)
	at is.hail.io.fs.HadoopFS.exists(HadoopFS.scala:85)
	at __C5Compiled.apply(Emit.scala)
	at is.hail.backend.local.LocalBackend.$anonfun$_jvmLowerAndExecute$3(LocalBackend.scala:223)
	at is.hail.backend.local.LocalBackend.$anonfun$_jvmLowerAndExecute$3$adapted(LocalBackend.scala:223)
	at is.hail.backend.ExecuteContext.$anonfun$scopedExecution$1(ExecuteContext.scala:144)
	at is.hail.utils.package$.using(package.scala:664)
	at is.hail.backend.ExecuteContext.scopedExecution(ExecuteContext.scala:144)
	at is.hail.backend.local.LocalBackend.$anonfun$_jvmLowerAndExecute$2(LocalBackend.scala:223)
	at is.hail.utils.ExecutionTimer.time(ExecutionTimer.scala:84)
	at is.hail.backend.local.LocalBackend._jvmLowerAndExecute(LocalBackend.scala:223)
	at is.hail.backend.local.LocalBackend._execute(LocalBackend.scala:249)
	at is.hail.backend.local.LocalBackend.$anonfun$execute$2(LocalBackend.scala:314)
	at is.hail.utils.ExecutionTimer.time(ExecutionTimer.scala:84)
	at is.hail.backend.local.LocalBackend.$anonfun$execute$1(LocalBackend.scala:309)
	at is.hail.backend.local.LocalBackend.$anonfun$execute$1$adapted(LocalBackend.scala:308)
	at is.hail.backend.ExecuteContext$.$anonfun$scoped$3(ExecuteContext.scala:78)
	at is.hail.utils.package$.using(package.scala:664)
	at is.hail.backend.ExecuteContext$.$anonfun$scoped$2(ExecuteContext.scala:78)
	at is.hail.utils.package$.using(package.scala:664)
	at is.hail.annotations.RegionPool$.scoped(RegionPool.scala:13)
	at is.hail.backend.ExecuteContext$.scoped(ExecuteContext.scala:65)
	at is.hail.backend.local.LocalBackend.$anonfun$withExecuteContext$2(LocalBackend.scala:144)
	at is.hail.utils.ExecutionTimer$.time(ExecutionTimer.scala:55)
	at is.hail.utils.ExecutionTimer$.logTime(ExecutionTimer.scala:62)
	at is.hail.backend.local.LocalBackend.withExecuteContext(LocalBackend.scala:130)
	at is.hail.backend.local.LocalBackend.execute(LocalBackend.scala:308)
	at is.hail.backend.BackendHttpHandler.handle(BackendServer.scala:88)
	at jdk.httpserver/com.sun.net.httpserver.Filter$Chain.doFilter(Filter.java:77)
	at jdk.httpserver/sun.net.httpserver.AuthFilter.doFilter(AuthFilter.java:82)
	at jdk.httpserver/com.sun.net.httpserver.Filter$Chain.doFilter(Filter.java:80)
	at jdk.httpserver/sun.net.httpserver.ServerImpl$Exchange$LinkHandler.handle(ServerImpl.java:692)
	at jdk.httpserver/com.sun.net.httpserver.Filter$Chain.doFilter(Filter.java:77)
	at jdk.httpserver/sun.net.httpserver.ServerImpl$Exchange.run(ServerImpl.java:664)
	at jdk.httpserver/sun.net.httpserver.ServerImpl$DefaultExecutor.execute(ServerImpl.java:159)
	at jdk.httpserver/sun.net.httpserver.ServerImpl$Dispatcher.handle(ServerImpl.java:442)
	at jdk.httpserver/sun.net.httpserver.ServerImpl$Dispatcher.run(ServerImpl.java:408)
	at java.base/java.lang.Thread.run(Thread.java:834)

java.net.SocketTimeoutException: connect timed out
	at java.base/java.net.PlainSocketImpl.socketConnect(Native Method)
	at java.base/java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:399)
	at java.base/java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:242)
	at java.base/java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:224)
	at java.base/java.net.Socket.connect(Socket.java:591)
	at java.base/sun.net.NetworkClient.doConnect(NetworkClient.java:177)
	at java.base/sun.net.www.http.HttpClient.openServer(HttpClient.java:474)
	at java.base/sun.net.www.http.HttpClient.openServer(HttpClient.java:569)
	at java.base/sun.net.www.http.HttpClient.<init>(HttpClient.java:242)
	at java.base/sun.net.www.http.HttpClient.New(HttpClient.java:341)
	at java.base/sun.net.www.http.HttpClient.New(HttpClient.java:362)
	at java.base/sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:1242)
	at java.base/sun.net.www.protocol.http.HttpURLConnection.plainConnect0(HttpURLConnection.java:1181)
	at java.base/sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:1075)
	at java.base/sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:1009)
	at com.google.cloud.hadoop.repackaged.gcs.com.google.api.client.http.javanet.NetHttpRequest.execute(NetHttpRequest.java:151)
	at com.google.cloud.hadoop.repackaged.gcs.com.google.api.client.http.javanet.NetHttpRequest.execute(NetHttpRequest.java:84)
	at com.google.cloud.hadoop.repackaged.gcs.com.google.api.client.http.HttpRequest.execute(HttpRequest.java:1012)
	at com.google.cloud.hadoop.repackaged.gcs.com.google.cloud.hadoop.util.CredentialFactory$ComputeCredentialWithRetry.executeRefreshToken(CredentialFactory.java:196)
	at com.google.cloud.hadoop.repackaged.gcs.com.google.api.client.auth.oauth2.Credential.refreshToken(Credential.java:470)
	at com.google.cloud.hadoop.repackaged.gcs.com.google.cloud.hadoop.util.CredentialFactory.getCredentialFromMetadataServiceAccount(CredentialFactory.java:251)
	at com.google.cloud.hadoop.repackaged.gcs.com.google.cloud.hadoop.util.CredentialFactory.getCredential(CredentialFactory.java:406)
	at com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystemBase.getCredential(GoogleHadoopFileSystemBase.java:1471)
	at com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystemBase.createGcsFs(GoogleHadoopFileSystemBase.java:1630)
	at com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystemBase.configure(GoogleHadoopFileSystemBase.java:1612)
	at com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystemBase.initialize(GoogleHadoopFileSystemBase.java:507)
	at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3469)
	at org.apache.hadoop.fs.FileSystem.access$300(FileSystem.java:174)
	at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3574)
	at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3521)
	at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:540)
	at org.apache.hadoop.fs.Path.getFileSystem(Path.java:365)
	at is.hail.io.fs.HadoopFSURL.<init>(HadoopFS.scala:76)
	at is.hail.io.fs.HadoopFS.parseUrl(HadoopFS.scala:88)
	at is.hail.io.fs.HadoopFS.parseUrl(HadoopFS.scala:85)
	at is.hail.io.fs.FS.exists(FS.scala:618)
	at is.hail.io.fs.FS.exists$(FS.scala:618)
	at is.hail.io.fs.HadoopFS.exists(HadoopFS.scala:85)
	at __C5Compiled.apply(Emit.scala)
	at is.hail.backend.local.LocalBackend.$anonfun$_jvmLowerAndExecute$3(LocalBackend.scala:223)
	at is.hail.backend.local.LocalBackend.$anonfun$_jvmLowerAndExecute$3$adapted(LocalBackend.scala:223)
	at is.hail.backend.ExecuteContext.$anonfun$scopedExecution$1(ExecuteContext.scala:144)
	at is.hail.utils.package$.using(package.scala:664)
	at is.hail.backend.ExecuteContext.scopedExecution(ExecuteContext.scala:144)
	at is.hail.backend.local.LocalBackend.$anonfun$_jvmLowerAndExecute$2(LocalBackend.scala:223)
	at is.hail.utils.ExecutionTimer.time(ExecutionTimer.scala:84)
	at is.hail.backend.local.LocalBackend._jvmLowerAndExecute(LocalBackend.scala:223)
	at is.hail.backend.local.LocalBackend._execute(LocalBackend.scala:249)
	at is.hail.backend.local.LocalBackend.$anonfun$execute$2(LocalBackend.scala:314)
	at is.hail.utils.ExecutionTimer.time(ExecutionTimer.scala:84)
	at is.hail.backend.local.LocalBackend.$anonfun$execute$1(LocalBackend.scala:309)
	at is.hail.backend.local.LocalBackend.$anonfun$execute$1$adapted(LocalBackend.scala:308)
	at is.hail.backend.ExecuteContext$.$anonfun$scoped$3(ExecuteContext.scala:78)
	at is.hail.utils.package$.using(package.scala:664)
	at is.hail.backend.ExecuteContext$.$anonfun$scoped$2(ExecuteContext.scala:78)
	at is.hail.utils.package$.using(package.scala:664)
	at is.hail.annotations.RegionPool$.scoped(RegionPool.scala:13)
	at is.hail.backend.ExecuteContext$.scoped(ExecuteContext.scala:65)
	at is.hail.backend.local.LocalBackend.$anonfun$withExecuteContext$2(LocalBackend.scala:144)
	at is.hail.utils.ExecutionTimer$.time(ExecutionTimer.scala:55)
	at is.hail.utils.ExecutionTimer$.logTime(ExecutionTimer.scala:62)
	at is.hail.backend.local.LocalBackend.withExecuteContext(LocalBackend.scala:130)
	at is.hail.backend.local.LocalBackend.execute(LocalBackend.scala:308)
	at is.hail.backend.BackendHttpHandler.handle(BackendServer.scala:88)
	at jdk.httpserver/com.sun.net.httpserver.Filter$Chain.doFilter(Filter.java:77)
	at jdk.httpserver/sun.net.httpserver.AuthFilter.doFilter(AuthFilter.java:82)
	at jdk.httpserver/com.sun.net.httpserver.Filter$Chain.doFilter(Filter.java:80)
	at jdk.httpserver/sun.net.httpserver.ServerImpl$Exchange$LinkHandler.handle(ServerImpl.java:692)
	at jdk.httpserver/com.sun.net.httpserver.Filter$Chain.doFilter(Filter.java:77)
	at jdk.httpserver/sun.net.httpserver.ServerImpl$Exchange.run(ServerImpl.java:664)
	at jdk.httpserver/sun.net.httpserver.ServerImpl$DefaultExecutor.execute(ServerImpl.java:159)
	at jdk.httpserver/sun.net.httpserver.ServerImpl$Dispatcher.handle(ServerImpl.java:442)
	at jdk.httpserver/sun.net.httpserver.ServerImpl$Dispatcher.run(ServerImpl.java:408)
	at java.base/java.lang.Thread.run(Thread.java:834)




Hail version: 0.2.128-ce3ca9c77507
Error summary: SocketTimeoutException: connect timed out

@ehigham
Copy link
Member

ehigham commented Mar 5, 2024

Wonder if this is somehow related to #14158 (comment)

@ehigham
Copy link
Member

ehigham commented Mar 8, 2024

after rtfm (https://github.com/GoogleCloudDataproc/hadoop-connectors/blob/master/gcs/README.md#getting-the-connector), seems credentials need to be configured in core-site.xml. I don't know where we do this.

@ehigham
Copy link
Member

ehigham commented Mar 8, 2024

Indeed, pyspark/conf/spark-defaults.conf defines the following:

spark.hadoop.google.cloud.auth.service.account.enable true
spark.hadoop.google.cloud.auth.service.account.json.keyfile $HOME/.config/gcloud/application_default_credentials.json

ehigham added a commit to ehigham/hail that referenced this issue Mar 12, 2024
A long-standing fixme in the LocalBackend was to not rely on HadoopFS,
which we use with the SparkBackend for compatibility with dataproc and
hdfs urls.

By default, the HadoopFS doesn't understand gs urls. Users need to
install the gcs-hadoop-connector (preinstalled in dataproc) to
communicate with google cloud storage. Spark handles supplying
credentials to the connector.

Issue hail-is#13904 is caused by failing to properly supply the
gcs-hadoop-connector with credentials in the LocalBackend. In the
absence of config, the connector hangs while trying to fetch a token
form a non-existant metadata server.

The LocalBackend was designed to be a testing ground for lowered and
compiled code that would eventually be run on batch, where we use the
RouterFS. I propose a pragmatic fix for hail-is#13904 that ditches the HadoopFS
for all but local filesystem access in the LocalBackend instead of
identifying and fixing the root cause.

In doing so, I made a couple of changes to how the RouterFS is
configured: In the absence of the `HAIL_CLOUD` environment variable,
RouterFS can handle gs and az urls iff credentials are not supplied. If
the user supplies creditials, we use `HAIL_CLOUD` to decide which cloud
to route to.

fixes hail-is#13904
ehigham added a commit to ehigham/hail that referenced this issue Mar 14, 2024
A long-standing fixme in the LocalBackend was to not rely on HadoopFS,
which we use with the SparkBackend for compatibility with dataproc and
hdfs urls.

By default, the HadoopFS doesn't understand gs urls. Users need to
install the gcs-hadoop-connector (preinstalled in dataproc) to
communicate with google cloud storage. Spark handles supplying
credentials to the connector.

Issue hail-is#13904 is caused by failing to properly supply the
gcs-hadoop-connector with credentials in the LocalBackend. In the
absence of config, the connector hangs while trying to fetch a token
form a non-existant metadata server.

The LocalBackend was designed to be a testing ground for lowered and
compiled code that would eventually be run on batch, where we use the
RouterFS. I propose a pragmatic fix for hail-is#13904 that ditches the HadoopFS
for all but local filesystem access in the LocalBackend instead of
identifying and fixing the root cause.

In doing so, I made a couple of changes to how the RouterFS is
configured: In the absence of the `HAIL_CLOUD` environment variable,
RouterFS can handle gs and az urls iff credentials are not supplied. If
the user supplies creditials, we use `HAIL_CLOUD` to decide which cloud
to route to.

fixes hail-is#13904
ehigham added a commit to ehigham/hail that referenced this issue Mar 20, 2024
A long-standing fixme in the LocalBackend was to not rely on HadoopFS,
which we use with the SparkBackend for compatibility with dataproc and
hdfs urls.

By default, the HadoopFS doesn't understand gs urls. Users need to
install the gcs-hadoop-connector (preinstalled in dataproc) to
communicate with google cloud storage. Spark handles supplying
credentials to the connector.

Issue hail-is#13904 is caused by failing to properly supply the
gcs-hadoop-connector with credentials in the LocalBackend. In the
absence of config, the connector hangs while trying to fetch a token
form a non-existant metadata server.

The LocalBackend was designed to be a testing ground for lowered and
compiled code that would eventually be run on batch, where we use the
RouterFS. I propose a pragmatic fix for hail-is#13904 that ditches the HadoopFS
for all but local filesystem access in the LocalBackend instead of
identifying and fixing the root cause.

In doing so, I made a couple of changes to how the RouterFS is
configured: In the absence of the `HAIL_CLOUD` environment variable,
RouterFS can handle gs and az urls iff credentials are not supplied. If
the user supplies creditials, we use `HAIL_CLOUD` to decide which cloud
to route to.

fixes hail-is#13904
hail-ci-robot pushed a commit that referenced this issue Apr 10, 2024
A long-standing fixme in the LocalBackend was to not rely on HadoopFS,
which we use with the SparkBackend for compatibility with dataproc and
hdfs urls.

By default, the HadoopFS doesn't understand gs urls. Users need to
install the gcs-hadoop-connector (preinstalled in dataproc) to
communicate with google cloud storage. Spark handles supplying
credentials to the connector.

Issue #13904 is caused by failing to properly supply the
gcs-hadoop-connector with credentials in the LocalBackend. In the
absence of config, the connector hangs while trying to fetch a token
form a non-existant metadata server.

The LocalBackend was designed to be a testing ground for lowered and
compiled code that would eventually be run on batch, where we use the
RouterFS. I propose a pragmatic fix for #13904 that ditches the HadoopFS
for all but local filesystem access in the LocalBackend instead of
identifying and fixing the root cause.

In doing so, I made a couple of changes to how the RouterFS is
configured: In the absence of the `HAIL_CLOUD` environment variable,
RouterFS can handle gs and az urls iff credentials are not supplied. If
the user supplies creditials, we use `HAIL_CLOUD` to decide which cloud
to route to.

fixes #13904
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants