-
Notifications
You must be signed in to change notification settings - Fork 244
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[query] LocalBackend hangs if given a gs:// URL #13904
Comments
Requires gcs connector |
I got a timeout!
|
Wonder if this is somehow related to #14158 (comment) |
after rtfm (https://github.com/GoogleCloudDataproc/hadoop-connectors/blob/master/gcs/README.md#getting-the-connector), seems credentials need to be configured in |
Indeed,
|
ehigham
added a commit
to ehigham/hail
that referenced
this issue
Mar 12, 2024
A long-standing fixme in the LocalBackend was to not rely on HadoopFS, which we use with the SparkBackend for compatibility with dataproc and hdfs urls. By default, the HadoopFS doesn't understand gs urls. Users need to install the gcs-hadoop-connector (preinstalled in dataproc) to communicate with google cloud storage. Spark handles supplying credentials to the connector. Issue hail-is#13904 is caused by failing to properly supply the gcs-hadoop-connector with credentials in the LocalBackend. In the absence of config, the connector hangs while trying to fetch a token form a non-existant metadata server. The LocalBackend was designed to be a testing ground for lowered and compiled code that would eventually be run on batch, where we use the RouterFS. I propose a pragmatic fix for hail-is#13904 that ditches the HadoopFS for all but local filesystem access in the LocalBackend instead of identifying and fixing the root cause. In doing so, I made a couple of changes to how the RouterFS is configured: In the absence of the `HAIL_CLOUD` environment variable, RouterFS can handle gs and az urls iff credentials are not supplied. If the user supplies creditials, we use `HAIL_CLOUD` to decide which cloud to route to. fixes hail-is#13904
ehigham
added a commit
to ehigham/hail
that referenced
this issue
Mar 14, 2024
A long-standing fixme in the LocalBackend was to not rely on HadoopFS, which we use with the SparkBackend for compatibility with dataproc and hdfs urls. By default, the HadoopFS doesn't understand gs urls. Users need to install the gcs-hadoop-connector (preinstalled in dataproc) to communicate with google cloud storage. Spark handles supplying credentials to the connector. Issue hail-is#13904 is caused by failing to properly supply the gcs-hadoop-connector with credentials in the LocalBackend. In the absence of config, the connector hangs while trying to fetch a token form a non-existant metadata server. The LocalBackend was designed to be a testing ground for lowered and compiled code that would eventually be run on batch, where we use the RouterFS. I propose a pragmatic fix for hail-is#13904 that ditches the HadoopFS for all but local filesystem access in the LocalBackend instead of identifying and fixing the root cause. In doing so, I made a couple of changes to how the RouterFS is configured: In the absence of the `HAIL_CLOUD` environment variable, RouterFS can handle gs and az urls iff credentials are not supplied. If the user supplies creditials, we use `HAIL_CLOUD` to decide which cloud to route to. fixes hail-is#13904
ehigham
added a commit
to ehigham/hail
that referenced
this issue
Mar 20, 2024
A long-standing fixme in the LocalBackend was to not rely on HadoopFS, which we use with the SparkBackend for compatibility with dataproc and hdfs urls. By default, the HadoopFS doesn't understand gs urls. Users need to install the gcs-hadoop-connector (preinstalled in dataproc) to communicate with google cloud storage. Spark handles supplying credentials to the connector. Issue hail-is#13904 is caused by failing to properly supply the gcs-hadoop-connector with credentials in the LocalBackend. In the absence of config, the connector hangs while trying to fetch a token form a non-existant metadata server. The LocalBackend was designed to be a testing ground for lowered and compiled code that would eventually be run on batch, where we use the RouterFS. I propose a pragmatic fix for hail-is#13904 that ditches the HadoopFS for all but local filesystem access in the LocalBackend instead of identifying and fixing the root cause. In doing so, I made a couple of changes to how the RouterFS is configured: In the absence of the `HAIL_CLOUD` environment variable, RouterFS can handle gs and az urls iff credentials are not supplied. If the user supplies creditials, we use `HAIL_CLOUD` to decide which cloud to route to. fixes hail-is#13904
hail-ci-robot
pushed a commit
that referenced
this issue
Apr 10, 2024
A long-standing fixme in the LocalBackend was to not rely on HadoopFS, which we use with the SparkBackend for compatibility with dataproc and hdfs urls. By default, the HadoopFS doesn't understand gs urls. Users need to install the gcs-hadoop-connector (preinstalled in dataproc) to communicate with google cloud storage. Spark handles supplying credentials to the connector. Issue #13904 is caused by failing to properly supply the gcs-hadoop-connector with credentials in the LocalBackend. In the absence of config, the connector hangs while trying to fetch a token form a non-existant metadata server. The LocalBackend was designed to be a testing ground for lowered and compiled code that would eventually be run on batch, where we use the RouterFS. I propose a pragmatic fix for #13904 that ditches the HadoopFS for all but local filesystem access in the LocalBackend instead of identifying and fixing the root cause. In doing so, I made a couple of changes to how the RouterFS is configured: In the absence of the `HAIL_CLOUD` environment variable, RouterFS can handle gs and az urls iff credentials are not supplied. If the user supplies creditials, we use `HAIL_CLOUD` to decide which cloud to route to. fixes #13904
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
What happened?
The following should not hang, but it does.
Writing to a local file path causes no issue.
Version
0.2.124
Relevant log output
No response
The text was updated successfully, but these errors were encountered: