Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JSON lines with missing struct fields raise TypeError: Couldn't cast array #7159

Closed
albertvillanova opened this issue Sep 23, 2024 · 1 comment · Fixed by #7160
Closed

JSON lines with missing struct fields raise TypeError: Couldn't cast array #7159

albertvillanova opened this issue Sep 23, 2024 · 1 comment · Fixed by #7160
Assignees
Labels
bug Something isn't working

Comments

@albertvillanova
Copy link
Member

albertvillanova commented Sep 23, 2024

JSON lines with missing struct fields raise TypeError: Couldn't cast array of type.

See example: https://huggingface.co/datasets/wikimedia/structured-wikipedia/discussions/5

One would expect that the struct missing fields are added with null values.

@albertvillanova albertvillanova added the enhancement New feature or request label Sep 23, 2024
@albertvillanova albertvillanova self-assigned this Sep 23, 2024
@albertvillanova albertvillanova changed the title Dataset items with missing struct fields raise TypeError: Couldn't cast array of type Dataset items with missing struct fields raise TypeError: Couldn't cast array Sep 23, 2024
@albertvillanova albertvillanova changed the title Dataset items with missing struct fields raise TypeError: Couldn't cast array JSON lines with missing struct fields raise TypeError: Couldn't cast array Sep 23, 2024
@albertvillanova albertvillanova added bug Something isn't working and removed enhancement New feature or request labels Sep 25, 2024
@Aremaki
Copy link

Aremaki commented Oct 21, 2024

Hello,

I have still the same issue when loading the dataset with the new version:
https://huggingface.co/datasets/wikimedia/structured-wikipedia/discussions/5

I have downloaded and unzipped the wikimedia/structured-wikipedia dataset locally but when loading I have the same issue.

import datasets

dataset = datasets.load_dataset("/gpfsdsdir/dataset/HuggingFace/wikimedia/structured-wikipedia/20240916.fr")
TypeError: Couldn't cast array of type
struct<content_url: string, width: int64, height: int64, alternative_text: string>
to
{'content_url': Value(dtype='string', id=None), 'width': Value(dtype='int64', id=None), 'height': Value(dtype='int64', id=None)}

The above exception was the direct cause of the following exception:

My version of datasets is 3.0.1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants