Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support S3A proxy configuration properties #1132

Closed
chancez opened this issue Jul 16, 2019 · 8 comments
Closed

Support S3A proxy configuration properties #1132

chancez opened this issue Jul 16, 2019 · 8 comments
Labels
enhancement New feature or request

Comments

@chancez
Copy link
Member

chancez commented Jul 16, 2019

I'm trying to get Presto to work in an restricted network, but have the ability to use S3/GCS using an http/https proxy.

There's some options like fs.s3a.proxy.host in the hadoop core-site.xml that would be useful if Presto could support these options in the Hive connector's S3 configuration properties.

@electrum electrum added enhancement New feature or request good first issue Good for newcomers labels Jul 17, 2019
@yuokada
Copy link
Contributor

yuokada commented Jul 26, 2019

How about below variables? Do we need to add more variables yet?

  • fs.s3a.proxy.host
  • fs.s3a.proxy.port
  • fs.s3a.proxy.username
  • fs.s3a.proxy.password
  • fs.s3a.proxy.domain

@chancez
Copy link
Member Author

chancez commented Jul 26, 2019

That seems sufficient.

@tooptoop4
Copy link
Contributor

@chancez did u solve this?

@chancez
Copy link
Member Author

chancez commented Jan 20, 2020

I'm not using Presto currently as I've changed jobs. I haven't fixed this.

@tooptoop4
Copy link
Contributor

tooptoop4 commented Jan 24, 2020

So i checked prestodb 220 does not use proxy but prestosql does (it gets it from linux env variables - https://github.com/aws/aws-sdk-java/blob/master/aws-java-sdk-core/src/main/java/com/amazonaws/ClientConfiguration.java#L2437-L2442, looks to be introduced in awssdk 1.11.580, that is why prestodb on 1.11.445 does not use it while prestosql on 1.11.602 does use it). After running a hive query you can see in the server.log: 2020-01-24T10:12:57.755Z INFO hive-hive-0 com.amazonaws.http.AmazonHttpClient Configuring Proxy. Proxy Host: redact Proxy Port: redact

Problem is it does not allow setting them to empty.
I tried in presto jvm.config:
-Dhttp.proxyHost=
-Dhttp.proxyPort=
-Dhttps.proxyHost=
-Dhttps.proxyPort=

Query 20200124_111055_00000_rnt6x failed: Host name may not be empty
io.prestosql.spi.PrestoException: Host name may not be empty
        at io.prestosql.plugin.hive.BackgroundHiveSplitLoader$HiveSplitLoaderTask.process(BackgroundHiveSplitLoader.java:216)
        at io.prestosql.plugin.hive.util.ResumableTasks$1.run(ResumableTasks.java:38)
        at io.prestosql.$gen.Presto_328__0_220____20200124_111038_2.run(Unknown Source)
        at io.airlift.concurrent.BoundedExecutor.drainQueue(BoundedExecutor.java:78)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.IllegalArgumentException: Host name may not be empty
        at org.apache.http.util.Args.containsNoBlanks(Args.java:84)
        at org.apache.http.HttpHost.<init>(HttpHost.java:80)
        at com.amazonaws.http.apache.SdkProxyRoutePlanner.<init>(SdkProxyRoutePlanner.java:43)
        at com.amazonaws.http.apache.client.impl.ApacheHttpClientFactory.addProxyConfig(ApacheHttpClientFactory.java:94)
        at com.amazonaws.http.apache.client.impl.ApacheHttpClientFactory.create(ApacheHttpClientFactory.java:75)
        at com.amazonaws.http.apache.client.impl.ApacheHttpClientFactory.create(ApacheHttpClientFactory.java:38)
        at com.amazonaws.http.AmazonHttpClient.<init>(AmazonHttpClient.java:324)
        at com.amazonaws.http.AmazonHttpClient.<init>(AmazonHttpClient.java:308)
        at com.amazonaws.AmazonWebServiceClient.<init>(AmazonWebServiceClient.java:229)
        at com.amazonaws.AmazonWebServiceClient.<init>(AmazonWebServiceClient.java:215)
        at com.amazonaws.services.s3.AmazonS3Client.<init>(AmazonS3Client.java:682)
        at com.amazonaws.services.s3.AmazonS3Builder$1.apply(AmazonS3Builder.java:35)
        at com.amazonaws.services.s3.AmazonS3Builder$1.apply(AmazonS3Builder.java:32)
        at com.amazonaws.services.s3.AmazonS3ClientBuilder.build(AmazonS3ClientBuilder.java:64)
        at com.amazonaws.services.s3.AmazonS3ClientBuilder.build(AmazonS3ClientBuilder.java:28)
        at com.amazonaws.client.builder.AwsSyncClientBuilder.build(AwsSyncClientBuilder.java:46)
        at io.prestosql.plugin.hive.s3.PrestoS3FileSystem.createAmazonS3Client(PrestoS3FileSystem.java:756)
        at io.prestosql.plugin.hive.s3.PrestoS3FileSystem.initialize(PrestoS3FileSystem.java:252)
        at org.apache.hadoop.fs.PrestoFileSystemCache.createFileSystem(PrestoFileSystemCache.java:125)
        at org.apache.hadoop.fs.PrestoFileSystemCache.getInternal(PrestoFileSystemCache.java:92)
        at org.apache.hadoop.fs.PrestoFileSystemCache.get(PrestoFileSystemCache.java:65)
        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:479)
        at org.apache.hadoop.fs.Path.getFileSystem(Path.java:361)
        at io.prestosql.plugin.hive.HdfsEnvironment.lambda$getFileSystem$0(HdfsEnvironment.java:71)
        at io.prestosql.plugin.hive.authentication.NoHdfsAuthentication.doAs(NoHdfsAuthentication.java:23)
        at io.prestosql.plugin.hive.HdfsEnvironment.getFileSystem(HdfsEnvironment.java:70)
        at io.prestosql.plugin.hive.HdfsEnvironment.getFileSystem(HdfsEnvironment.java:64)
        at io.prestosql.plugin.hive.BackgroundHiveSplitLoader.loadPartition(BackgroundHiveSplitLoader.java:306)
        at io.prestosql.plugin.hive.BackgroundHiveSplitLoader.loadSplits(BackgroundHiveSplitLoader.java:280)
        at io.prestosql.plugin.hive.BackgroundHiveSplitLoader.access$300(BackgroundHiveSplitLoader.java:105)
        at io.prestosql.plugin.hive.BackgroundHiveSplitLoader$HiveSplitLoaderTask.process(BackgroundHiveSplitLoader.java:209)
        ... 6 more

I saw then saw aws/aws-sdk-java@54d9b9b (awssdk 1.11.697) handles setting no proxy

@findepi findepi removed the good first issue Good for newcomers label Jan 24, 2020
tooptoop4 added a commit to tooptoop4/presto-1 that referenced this issue Mar 5, 2020
electrum pushed a commit that referenced this issue Mar 9, 2020
@timflannagan
Copy link
Member

Can this be closed now that #3016 has been merged?

@tooptoop4
Copy link
Contributor

tooptoop4 commented Mar 14, 2020

I don't think this can be closed as with below in jvm.config the proxy (or no proxy) is applied to EVERYTHING not just s3a, for example ranger url lookups fail. I think this issue should be about supporting per catalog proxy settings

-Dhttp.proxyHost=
-Dhttp.proxyPort=0
-Dhttps.proxyHost=
-Dhttps.proxyPort=0

timflannagan pushed a commit to timflannagan/presto that referenced this issue Aug 28, 2020
@hashhar
Copy link
Member

hashhar commented Mar 4, 2022

@aczajkowski implemented ability to proxy connections to S3 in #11255.

Please reopen if it doesn't address the usecase.

@hashhar hashhar closed this as completed Mar 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Development

Successfully merging a pull request may close this issue.

7 participants