Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle invalid S3 hostname exceptions with older aws-java-sdk versions #254

Closed

Conversation

jameshou
Copy link

@jameshou jameshou commented Aug 5, 2016

We've seen a lot of messages lately regarding the "Invalid S3 URI: hostname does not appear to be a valid S3 endpoint" exception and so thought we should contribute our two cents and the code changes that worked for us. We've tried many approaches listed in that thread including using spark.executor.extraClassPath and spark.driver.extraClassPath environment variables to prepend to the classpath, including it in the assembled jar or as a shaded jar, Unfortunately many of these approaches failed mainly because we have on the machines themselves the older aws-java-sdk jar and that usually takes precedence. We ended up going with what Josh mentioned earlier about changing the S3 url in the spark-redshift code to add the endpoint to the host (*.s3.amazonaws.com).

This logic will try to instantiate a new AmazonS3URI and if it fails, it'll try to add the default S3 Amazon domain to the host.

@codecov-io
Copy link

codecov-io commented Aug 5, 2016

Current coverage is 89.33% (diff: 100%)

Merging #254 into master will increase coverage by 0.09%

@@             master       #254   diff @@
==========================================
  Files            12         12          
  Lines           641        647     +6   
  Methods         559        564     +5   
  Messages          0          0          
  Branches         82         83     +1   
==========================================
+ Hits            572        578     +6   
  Misses           69         69          
  Partials          0          0          

Powered by Codecov. Last update 95d92cd...b936648

} catch {
case e: java.lang.IllegalArgumentException => {
if (e.getMessage().
startsWith("Invalid S3 URI: hostname does not appear to be a valid S3 endpoint")) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way to programatically get the AWS SDK version so that you don't have to use exceptions for control flow?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It turns out that you can do this with https://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/util/VersionInfoUtils.html#getVersion--, but it looks like the implementation of that method relies on being able to load a properties file from the SDK JAR and that might break under certain repackaging scenarios, so while a bit ugly I think this is fine for now. Just go ahead and clean things up so we don't have to explicitly re-throw.

@JoshRosen
Copy link
Contributor

I'm going to test this out later by running all of the integration tests with an old AWS SDK. It might be cool to reconfigure the build so we can add an old SDK version to our test matrix. If you'd like to take a shot at this and don't mind SBT magic, take a look at how we handle the Spark and Avro versions in the build and test configurations (the key bit of trickiness is how we build against one version and test against another so that we catch binary compatibility problems)

*/
def addEndpointToUrl(url: String, domain: String = "s3.amazonaws.com"): String = {
val uri = new URI(url)
val hostWithEndpoint = uri.getHost() + "." + domain
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can omit the () parents in these get*() calls; they're not necessary and cause style suggestions in IntelliJ.

@JoshRosen
Copy link
Contributor

I fear that this will break if we don't test it, so in 39776a6 I've added end-to-end integration tests that run against the 1.7.4 version of the SDK. Could you go ahead and cherry-pick that change, address my comments, and push, then I'll retest and merge?

// try to instantiate AmazonS3URI with url
new AmazonS3URI(url)
} catch {
case e: IllegalArgumentException if e.getMessage.startsWith("Invalid S3 URI: hostname does not appear to be a valid S3 endpoint") => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Scalastyle is complaining that this line is too long. Try wrapping the string to the next line.

@JoshRosen JoshRosen added this to the 2.0.1 milestone Aug 8, 2016
@JoshRosen JoshRosen changed the title Handle invalid s3 hostname exceptions with older aws-java-sdk lib Handle invalid S3 hostname exceptions with older aws-java-sdk versions Aug 9, 2016
@JoshRosen
Copy link
Contributor

LGTM, so I'm going to merge this to master and will try to backport it to branch-1.x. Thanks @jameshou!

@jameshou
Copy link
Author

jameshou commented Aug 9, 2016

Yay! Thanks for merging this in so quickly. Can't wait to start using the official branch in our builds rather than including this in our patched up jar. Let me know if there are any issues that come up and I will try to address them.

@JoshRosen
Copy link
Contributor

I'll try to package either a 2.0.1 or 2.1.0 release soon, but I'm currently blocking on resolving databricks/spark-avro#147 so that I can bump the spark-avro version in that next release.

JoshRosen pushed a commit that referenced this pull request Aug 20, 2016
We've seen a lot of messages lately regarding the "Invalid S3 URI: hostname does not appear to be a valid S3 endpoint" exception and so thought we should contribute our two cents and the code changes that worked for us. We've tried many approaches listed in that thread including using `spark.executor.extraClassPath` and `spark.driver.extraClassPath` environment variables to prepend to the classpath, including it in the assembled jar or as a shaded jar, Unfortunately many of these approaches failed mainly because we have on the machines themselves the older aws-java-sdk jar and that usually takes precedence. We ended up going with what Josh mentioned earlier about changing the S3 url in the spark-redshift code to add the endpoint to the host (`*.s3.amazonaws.com`).

This logic will try to instantiate a new AmazonS3URI and if it fails, it'll try to add the default S3 Amazon domain to the host.

Author: James Hou <jameshou@data101.udemy.com>
Author: James Hou <james.hou@gmail.com>
Author: Josh Rosen <joshrosen@databricks.com>

Closes #254 from jameshou/feature/add-s3-full-endpoint-v1.
JoshRosen pushed a commit that referenced this pull request Aug 20, 2016
We've seen a lot of messages lately regarding the "Invalid S3 URI: hostname does not appear to be a valid S3 endpoint" exception and so thought we should contribute our two cents and the code changes that worked for us. We've tried many approaches listed in that thread including using `spark.executor.extraClassPath` and `spark.driver.extraClassPath` environment variables to prepend to the classpath, including it in the assembled jar or as a shaded jar, Unfortunately many of these approaches failed mainly because we have on the machines themselves the older aws-java-sdk jar and that usually takes precedence. We ended up going with what Josh mentioned earlier about changing the S3 url in the spark-redshift code to add the endpoint to the host (`*.s3.amazonaws.com`).

This logic will try to instantiate a new AmazonS3URI and if it fails, it'll try to add the default S3 Amazon domain to the host.

Author: James Hou <jameshou@data101.udemy.com>
Author: James Hou <james.hou@gmail.com>
Author: Josh Rosen <joshrosen@databricks.com>

Closes #254 from jameshou/feature/add-s3-full-endpoint-v1.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants