Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

R: Full support of lists and structs in R #8503

Merged
merged 31 commits into from
Aug 18, 2023
Merged

Conversation

krlmlr
Copy link
Collaborator

@krlmlr krlmlr commented Aug 7, 2023

This adds support for lists of atomic vectors. All elements must be of the same type for this to work. The commits make sense individually, but mostly depend on each other.

ALTREP support can be built on top of that.

Closes #2841.

@krlmlr krlmlr requested a review from hannes August 7, 2023 14:51
@krlmlr krlmlr changed the title R: Towards full support of lists and structs in R R: Full support of lists and structs in R Aug 8, 2023
@github-actions github-actions bot marked this pull request as draft August 8, 2023 04:31
@Tmonster Tmonster self-requested a review August 8, 2023 07:29
Copy link
Contributor

@Tmonster Tmonster left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Would just like to see more test cases for other types.

If I'm not mistaken, nested types are registered per every nested child object?

tools/rpkg/src/types.cpp Show resolved Hide resolved
tools/rpkg/tests/testthat/test_register.R Show resolved Hide resolved
tools/rpkg/src/scan.cpp Outdated Show resolved Hide resolved
tools/rpkg/tests/testthat/test_struct.R Show resolved Hide resolved
tools/rpkg/src/scan.cpp Outdated Show resolved Hide resolved

private:
RTypeId id_;
child_list_t<RType> aux_;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does aux_ stand for in this case? Just auxiliary?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, auxiliary information. Can you think of a better name?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe child_Rtypes? auxiliary feels to broad to me

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aux_ holds the list child type, struct child types and names, or factor levels. It's really a union type, but luckily everything fits into the child_list_t . Could be extra_, or perhaps child_types_or_enum_levels_, or a proper virtual base class with subclasses (but then how do we name that member?).

tools/rpkg/src/scan.cpp Show resolved Hide resolved
@krlmlr
Copy link
Collaborator Author

krlmlr commented Aug 8, 2023

nested types are registered per every nested child object

Can you please rephrase?

@krlmlr
Copy link
Collaborator Author

krlmlr commented Aug 8, 2023

I'll follow up with more tests.

@krlmlr krlmlr requested a review from Tmonster August 8, 2023 15:35
@krlmlr
Copy link
Collaborator Author

krlmlr commented Aug 8, 2023

I'm not too sure about my usage of std::move() .

Copy link
Contributor

@Tmonster Tmonster left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was trying to test around and it seems like the following fails

df <- vctrs::data_frame(a = 1:4, b = list(4:6, 2:3, 1L, 5))
dbWriteTable(con, "df", df)

while

df <- vctrs::data_frame(a = 1:4, b = list(4:6, 2:3, 1L, 5L))
dbWriteTable(con, "df", df)

The error is Error: rapi_execute: Can't convert R type. I know that all nested structs/lists need the same type, but I think the user would appreciate feedback in this case that we don't yet support two types in the same nested object.

Might also be nice to have a RTypeToString() method for debugging? Possibly also for the above mentioned case?

Should we also test nested lists/structs that have above the default chunk size?

@Tmonster
Copy link
Contributor

Tmonster commented Aug 9, 2023

I'm not too sure about my usage of std::move() .

Your usage of move() looks ok to me

@krlmlr krlmlr marked this pull request as ready for review August 9, 2023 09:13
@krlmlr
Copy link
Collaborator Author

krlmlr commented Aug 9, 2023

Can you please construct a data frame that would exceed the chunk size limits?

Happy to tackle the detailed comments in a follow-up PR. The vctrs package in R also has a list_of type -- we can detect objects of this class (no need to drag in a dependency) to avoid scanning the entire list.

@github-actions github-actions bot marked this pull request as draft August 9, 2023 09:17
@krlmlr krlmlr marked this pull request as ready for review August 9, 2023 09:17
@Tmonster
Copy link
Contributor

Can you please construct a data frame that would exceed the chunk size limits?
Meant the STANDARD_VECTOR_SIZE, but it doesn't seem to be an issue.

Happy to tackle the detailed comments in a follow-up PR. The vctrs package in R also has a list_of type -- we can detect objects of this class (no need to drag in a dependency) to avoid scanning the entire list.

Yea, I think a follow up PR would be fine, this looks good to me.

@Tmonster Tmonster self-requested a review August 10, 2023 13:10
@hannes
Copy link
Member

hannes commented Aug 16, 2023

Looks good to me, can we maybe now remove some of the excludes in test_types.R?

@krlmlr
Copy link
Collaborator Author

krlmlr commented Aug 16, 2023

Absolutely. Can we do that in a follow-up PR?

@krlmlr
Copy link
Collaborator Author

krlmlr commented Aug 17, 2023

(FWIW, test-types.R isn't affected by this PR. We can already output nested types today, this is about ingestion.)

@github-actions github-actions bot marked this pull request as draft August 17, 2023 12:08
@krlmlr krlmlr marked this pull request as ready for review August 17, 2023 12:10
@hannes hannes merged commit dc25e98 into duckdb:master Aug 18, 2023
krlmlr pushed a commit to krlmlr/duckdb-r that referenced this pull request Sep 2, 2023
R: Full support of lists and structs in R
krlmlr pushed a commit to krlmlr/duckdb-r that referenced this pull request Sep 2, 2023
R: Full support of lists and structs in R
krlmlr pushed a commit to krlmlr/duckdb-r that referenced this pull request Sep 2, 2023
R: Full support of lists and structs in R
krlmlr pushed a commit to krlmlr/duckdb-r that referenced this pull request Sep 2, 2023
- Merge pull request duckdb/duckdb#8378 from Mytherin/enumcatalogremoval: Format serialization - add missing serialization logic for DelimJoin, AsOfJoin, CreateIndex and enums

- Merge pull request duckdb/duckdb#8503 from krlmlr/f-2841-nested

- Merge pull request duckdb/duckdb#8626 from carlopi/fixR: R: Fix warning on mismatched integer comparison
krlmlr pushed a commit to duckdb/duckdb-r that referenced this pull request Sep 5, 2023
- Merge pull request duckdb/duckdb#8378 from Mytherin/enumcatalogremoval: Format serialization - add missing serialization logic for DelimJoin, AsOfJoin, CreateIndex and enums

- Merge pull request duckdb/duckdb#8503 from krlmlr/f-2841-nested

- Merge pull request duckdb/duckdb#8626 from carlopi/fixR: R: Fix warning on mismatched integer comparison
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

R list column support (R is freezing)
4 participants