Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Infer artifact store endpoint in metadata writer #3530

Merged
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 12 additions & 1 deletion backend/metadata_writer/src/metadata_writer.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@
import hashlib
import os
import sys
import re
import kubernetes
import yaml
from time import sleep
Expand Down Expand Up @@ -83,11 +84,21 @@ def output_name_to_argo(name: str) -> str:
import re
return re.sub('-+', '-', re.sub('[^-0-9a-z]+', '-', name.lower())).strip('-')

def is_s3_endpoint(endpoint: str) -> bool:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe is_amazon_s3_endpoint

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. Let me have the change

return re.search('^.*s3.*amazonaws.com.*$', endpoint)

def get_object_store_provider(endpoint: str) -> bool:
if is_s3_endpoint(endpoint):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also add Google Cloud Storage endpoint detection? Probably storage.googleapis.com.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But then I guess Frontend will have to support it.
Maybe add it later.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can help add this.

return 's3'
else:
return 'minio'

def argo_artifact_to_uri(artifact: dict) -> str:
# s3 here means s3 compatible object storage. not AWS S3.
if 's3' in artifact:
s3_artifact = artifact['s3']
return 'minio://{bucket}/{key}'.format(
return '{provider}://{bucket}/{key}'.format(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is we always use s3://<endpoint>/bucket/key format in both Metadata Writer and the UX? (or 's3_with_endpoint://...').
This way we can retain all data in the single URI for Minio, GCS and AWS.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's have the discussion in #3531

provider=get_object_store_provider(s3_artifact['endpoint']),
bucket=s3_artifact.get('bucket', ''),
key=s3_artifact.get('key', ''),
)
Expand Down