-
Notifications
You must be signed in to change notification settings - Fork 747
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feature request pipeline to fetch upload data to hugging face #1239
feature request pipeline to fetch upload data to hugging face #1239
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @koch3092 , left some initial comments, we need to support creating dataset card for camel, refer: https://huggingface.co/docs/datasets/en/dataset_card
…oad-data-to-hugging-face
Can we support picture-type data and examples? |
|
…oad-data-to-hugging-face # Conflicts: # poetry.lock
1.throw NotImplementedError in abstract methods 2.update logging to camel.logger 3.Check the validity of json in upload_file() 4.update poetry.lock 5.Update the token assignment method in test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
camel/datahubs/clients/base.py
Outdated
Returns: | ||
str: The URL of the created dataset. | ||
""" | ||
raise NotImplementedError("Method not implemented.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use pass
instead
class RepoType(str, Enum): | ||
DATASET = "dataset" | ||
MODEL = "model" | ||
SPACE = "space" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move to camel.types
if not authors: | ||
authors = [] | ||
|
||
if not tags: | ||
tags = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
authors
and tags
could be unified, set within metadata
metadata = { | ||
"license": license, | ||
"task_categories": task_categories if task_categories else [], | ||
"language": language if language else [], | ||
"tags": tags, | ||
"pretty_name": dataset_name, | ||
"size_categories": [size_category] if size_category else [], | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could we remove keys with None values by using metadata = {k: v for k, v in metadata.items() if v}
?
if not existing_records: | ||
raise ValueError( | ||
f"Dataset '{dataset_name}' does not have an existing file to " | ||
f"update. Use `add_records` first." | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add records directly and give user a warming msg?
Thanks @Wendong-Fan , It looks good to me. |
…oad-data-to-hugging-face
Description
add CRUD of hugging face's datasets/records
Motivation and Context
close #1213
Types of changes
What types of changes does your code introduce? Put an
x
in all the boxes that apply:Implemented Tasks
Checklist
Go over all the following points, and put an
x
in all the boxes that apply.If you are unsure about any of these, don't hesitate to ask. We are here to help!