Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhance S3 Web Identity authentication & security mapping #24645

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

imsayari404
Copy link
Contributor

@imsayari404 imsayari404 commented Feb 27, 2025

Description

This PR enhances S3 Web Identity authentication by updating Presto’s security mapping to support Web Identity credentials. It introduces new configurations and ensures compatibility with existing S3 authentication mechanisms.

Changes

  • Updated createAwsCredentialsProvider to support Web Identity authentication.
  • Modified PrestoS3ConfigurationUpdater, S3ConfigurationUpdater, and PrestoS3FileSystem.
  • Introduced new config properties:
    • hive.s3.web-identity-token-file
    • hive.s3.webidentity.enabled
  • Updated TestHiveS3Config to validate Web Identity authentication.

Motivation and Context

  • Enables Web Identity-based authentication for S3 without requiring static credentials.
  • Improves security by leveraging IAM roles for authentication.

Impact

  • Ensured backward compatibility with existing authentication methods.

Test Plan

  • Unit tests updated to validate Web Identity auth flow.

Contributor checklist

  • Please make sure your submission complies with our contributing guide, in particular code style and commit standards.
  • PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
  • Documented new properties (with its default value), SQL syntax, functions, or other functionality.
  • If release notes are required, they follow the release notes guidelines.
  • Adequate tests were added if applicable.
  • CI passed.

Release Notes

Please follow release notes guidelines and fill in the release notes below.

== RELEASE NOTES ==

Hive Changes
* Added support for Web Identity authentication in S3 security mapping.
* Introduced `hive.s3.web-identity-token-file` and `hive.s3.webidentity.enabled` for Web Identity authentication.

Copy link
Member

@imjalpreet imjalpreet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@imsayari404 thanks for the enhancement. It mostly looks good. I've added a few minor suggestions for improvement.

return s3WebIdentityEnabled;
}

@Config("hive.s3.webidentity.enabled")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

Suggested change
@Config("hive.s3.webidentity.enabled")
@Config("hive.s3.web.identity.auth.enabled")

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Open to suggestions here.

Comment on lines 131 to 132
if (webIdentityEnabled) {
config.setBoolean(S3_WEB_IDENTITY_ENABLED, true);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: We can always set the config irrespective of true/false and set the token file only if webIdentityEnabled is true.

Suggested change
if (webIdentityEnabled) {
config.setBoolean(S3_WEB_IDENTITY_ENABLED, true);
config.setBoolean(S3_WEB_IDENTITY_ENABLED, webIdentityEnabled);
if (webIdentityEnabled) {

Comment on lines 858 to 859
boolean webIdentityEnabled = conf.getBoolean("presto.hive.s3.webidentity.enabled", false);
String webIdentityTokenFile = conf.get("presto.hive.s3.web-identity-token-file");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: let's make these two class variables and move their initialization to the initialize method, where other configs are retrieved.

public void initialize(URI uri, Configuration conf)
throws IOException
{
requireNonNull(uri, "uri is null");
requireNonNull(conf, "conf is null");
super.initialize(uri, conf);
setConf(conf);
this.uri = URI.create(uri.getScheme() + "://" + uri.getAuthority());
this.workingDirectory = new Path(PATH_SEPARATOR).makeQualified(this.uri, new Path(PATH_SEPARATOR));
HiveS3Config defaults = new HiveS3Config();
this.stagingDirectory = new File(conf.get(S3_STAGING_DIRECTORY, defaults.getS3StagingDirectory().toString()));
this.maxAttempts = conf.getInt(S3_MAX_CLIENT_RETRIES, defaults.getS3MaxClientRetries()) + 1;
this.maxBackoffTime = Duration.valueOf(conf.get(S3_MAX_BACKOFF_TIME, defaults.getS3MaxBackoffTime().toString()));
this.maxRetryTime = Duration.valueOf(conf.get(S3_MAX_RETRY_TIME, defaults.getS3MaxRetryTime().toString()));
int maxErrorRetries = conf.getInt(S3_MAX_ERROR_RETRIES, defaults.getS3MaxErrorRetries());
boolean sslEnabled = conf.getBoolean(S3_SSL_ENABLED, defaults.isS3SslEnabled());
Duration connectTimeout = Duration.valueOf(conf.get(S3_CONNECT_TIMEOUT, defaults.getS3ConnectTimeout().toString()));
Duration socketTimeout = Duration.valueOf(conf.get(S3_SOCKET_TIMEOUT, defaults.getS3SocketTimeout().toString()));
int maxConnections = conf.getInt(S3_MAX_CONNECTIONS, defaults.getS3MaxConnections());
this.multiPartUploadMinFileSize = conf.getLong(S3_MULTIPART_MIN_FILE_SIZE, defaults.getS3MultipartMinFileSize().toBytes());
this.multiPartUploadMinPartSize = conf.getLong(S3_MULTIPART_MIN_PART_SIZE, defaults.getS3MultipartMinPartSize().toBytes());
this.isPathStyleAccess = conf.getBoolean(S3_PATH_STYLE_ACCESS, defaults.isS3PathStyleAccess());
this.useInstanceCredentials = conf.getBoolean(S3_USE_INSTANCE_CREDENTIALS, defaults.isS3UseInstanceCredentials());
this.pinS3ClientToCurrentRegion = conf.getBoolean(S3_PIN_CLIENT_TO_CURRENT_REGION, defaults.isPinS3ClientToCurrentRegion());
this.s3IamRole = conf.get(S3_IAM_ROLE, defaults.getS3IamRole());
this.s3IamRoleSessionName = conf.get(S3_IAM_ROLE_SESSION_NAME, defaults.getS3IamRoleSessionName());
verify(!(useInstanceCredentials && conf.get(S3_IAM_ROLE) != null),
"Invalid configuration: either use instance credentials or specify an iam role");
verify((pinS3ClientToCurrentRegion && conf.get(S3_ENDPOINT) == null) || !pinS3ClientToCurrentRegion,
"Invalid configuration: either endpoint can be set or S3 client can be pinned to the current region");
this.sseEnabled = conf.getBoolean(S3_SSE_ENABLED, defaults.isS3SseEnabled());
this.sseType = PrestoS3SseType.valueOf(conf.get(S3_SSE_TYPE, defaults.getS3SseType().name()));
this.sseKmsKeyId = conf.get(S3_SSE_KMS_KEY_ID, defaults.getS3SseKmsKeyId());
this.s3AclType = PrestoS3AclType.valueOf(conf.get(S3_ACL_TYPE, defaults.getS3AclType().name()));
String userAgentPrefix = conf.get(S3_USER_AGENT_PREFIX, defaults.getS3UserAgentPrefix());
this.skipGlacierObjects = conf.getBoolean(S3_SKIP_GLACIER_OBJECTS, defaults.isSkipGlacierObjects());
this.s3StorageClass = conf.getEnum(S3_STORAGE_CLASS, defaults.getS3StorageClass());
ClientConfiguration configuration = new ClientConfiguration()
.withMaxErrorRetry(maxErrorRetries)
.withProtocol(sslEnabled ? Protocol.HTTPS : Protocol.HTTP)
.withConnectionTimeout(toIntExact(connectTimeout.toMillis()))
.withSocketTimeout(toIntExact(socketTimeout.toMillis()))
.withMaxConnections(maxConnections)
.withUserAgentPrefix(userAgentPrefix)
.withUserAgentSuffix(S3_USER_AGENT_SUFFIX);
this.credentialsProvider = createAwsCredentialsProvider(uri, conf);
this.s3 = createAmazonS3Client(conf, configuration);
}


if (!isNullOrEmpty(s3IamRole)) {
if (webIdentityEnabled) {
log.info("Using Web Identity Token Credentials Provider.");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: We should minimize info logs in production and use them only when absolutely necessary. If logging is required in this case, we can add a debug log.

}
return providerBuilder.build();
}
log.info("Using STS Assume Role Session Credentials Provider.");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: same here

if (webIdentityEnabled) {
log.info("Using Web Identity Token Credentials Provider.");
WebIdentityTokenCredentialsProvider.Builder providerBuilder = WebIdentityTokenCredentialsProvider.builder()
.roleArn(s3IamRole);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: please set roleSessionName here along with roleArn.

@@ -28,6 +28,8 @@ public interface S3ConfigurationUpdater
String S3_PIN_CLIENT_TO_CURRENT_REGION = "presto.s3.pin-client-to-current-region";
String S3_USE_INSTANCE_CREDENTIALS = "presto.s3.use-instance-credentials";
String S3_IAM_ROLE = "presto.hive.s3.iam-role";
String S3_WEB_IDENTITY_TOKEN_FILE = "presto.hive.s3.web-identity-token-file";
String S3_WEB_IDENTITY_ENABLED = "presto.hive.s3.webidentity.enabled";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: we can align this with config name change above.

Copy link
Member

@imjalpreet imjalpreet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Forgot to mention, let's also add a test in TestPrestoS3FileSystem.

For reference:

public void testAssumeRoleCredentials()
throws Exception
{
Configuration config = new Configuration();
config.set(S3_IAM_ROLE, "role");
config.setBoolean(S3_USE_INSTANCE_CREDENTIALS, false);
try (PrestoS3FileSystem fs = new PrestoS3FileSystem()) {
fs.initialize(new URI("s3n://test-bucket/"), config);
assertInstanceOf(getAwsCredentialsProvider(fs), STSAssumeRoleSessionCredentialsProvider.class);
}
}

@imsayari404
Copy link
Contributor Author

imsayari404 commented Feb 28, 2025

@imjalpreet , I’ve made the suggested changes. Could you review again when you have time? Thanks!

Comment on lines 157 to 158
config.setBoolean("presto.hive.s3.web.identity.auth.enabled", true);
config.set("presto.hive.s3.web-identity-token-file", "/path/to/token");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: let's use static variables defined in S3ConfigurationUpdater for the config names here

Comment on lines 131 to 134
config.setBoolean(S3_WEB_IDENTITY_ENABLED, webIdentityEnabled);
if (webIdentityEnabled) {
if (webIdentityTokenFile != null) {
config.set(S3_WEB_IDENTITY_TOKEN_FILE, webIdentityTokenFile);
}
}
Copy link
Member

@agrawalreetika agrawalreetika Mar 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these 2 are dependent config? Then we can set both after the condition check?

Suggested change
config.setBoolean(S3_WEB_IDENTITY_ENABLED, webIdentityEnabled);
if (webIdentityEnabled) {
if (webIdentityTokenFile != null) {
config.set(S3_WEB_IDENTITY_TOKEN_FILE, webIdentityTokenFile);
}
}
if (webIdentityEnabled && webIdentityTokenFile != null) {
config.setBoolean(S3_WEB_IDENTITY_ENABLED, webIdentityEnabled);
config.set(S3_WEB_IDENTITY_TOKEN_FILE, webIdentityTokenFile);
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two configurations are related but not strictly dependent. S3_WEB_IDENTITY_ENABLED determines if web identity authentication is used, but the token file is optional. Keeping them separate ensures flexibility, as web identity mode can be enabled without necessarily requiring a token file.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok in that case we can modify it to something like -

if (webIdentityEnabled && webIdentityTokenFile != null) {
    config.set(S3_WEB_IDENTITY_TOKEN_FILE, webIdentityTokenFile);
}

if (!isNullOrEmpty(webIdentityTokenFile)) {
providerBuilder.webIdentityTokenFile(webIdentityTokenFile);
}
return providerBuilder.build();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can s3 identity auth work w/o token file as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, web identity auth can work without explicitly setting the token file in the config. If the environment variable AWS_WEB_IDENTITY_TOKEN_FILE is set, AWS SDK will pick it up automatically, and authentication will still function.

@@ -124,6 +128,10 @@ public void updateConfiguration(Configuration config)
if (sseKmsKeyId != null) {
config.set(S3_SSE_KMS_KEY_ID, sseKmsKeyId);
}
if (webIdentityEnabled && webIdentityTokenFile != null) {
config.setBoolean(S3_WEB_IDENTITY_ENABLED, webIdentityEnabled);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@imsayari404 You had mentioned, We can still use webidentity auth if flag is true but token file is not given right?

If that's the case then you should set config.setBoolean(S3_WEB_IDENTITY_ENABLED, webIdentityEnabled); outside condition itself? Please verify

Copy link
Contributor Author

@imsayari404 imsayari404 Mar 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right! Since web identity auth can work without explicitly setting the token file, S3_WEB_IDENTITY_ENABLED should be set outside the condition. My bad, I'll update it accordingly.

- Updated `createAwsCredentialsProvider` to support Web Identity.

- Modified `PrestoS3ConfigurationUpdater`, `S3ConfigurationUpdater`, and `PrestoS3FileSystem` for Web Identity auth.

- Web Identity configs added `hive.s3.web-identity-token-file` and `hive.s3.webidentity.enabled`

- Updated `TestHiveS3Config` to validate changes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
from:IBM PR from IBM
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants