Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: With duckdb(environment_scan = TRUE), data frame objects are available as views in duckdb SQL queries #164

Merged
merged 5 commits into from
Dec 7, 2024

Conversation

hannes
Copy link
Member

@hannes hannes commented May 8, 2024

duckdb:::sql('from iris')

now "just" works

CC @krlmlr

Closes #140.

@hannes
Copy link
Member Author

hannes commented May 8, 2024

Needs still tests and some protect stuff I guess

Copy link
Collaborator

@krlmlr krlmlr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, nice!

src/register.cpp Outdated
return nullptr;
}
if (TYPEOF(df) == PROMSXP) {
df = Rf_eval(df, rho);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cpp11 has a global protection mechanism, but how to control the lifetime?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indeed lifetime is a problematic issue. For example, once someone creates a view of this and then deletes the df and gc runs we might be in trouble.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, support for lifetime control has landed via duckdb/duckdb#11761, which is why we now have conflicts here?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already had support for lifetime control, but it was broken in a couple ways

We use the ExternalDependency class for lifetime control, but this wasn't being kept alive long enough, which this PR aimed to address.

We changed the prototype of the replacement scan, we changed the const string &table_name into ReplacementScanInput

I have a follow up PR scheduled that removes the newly added table_ref &TableRef from this struct though
Instead we will collect replacement scans inside QueryRelation::Bind (virtual overload) and push these replacement scans as CTEs onto the parsed QueryNode stored inside the QueryRelation.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, Thijs. May I get back to you for help with lifetime control after that follow-up PR has made it into the R package? I don't find that PR in https://github.com/duckdb/duckdb/pulls?q=is%3Apr+is%3Aopen+sort%3Aupdated-desc+author%3ATishj.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Tishj: how would I go about controlling the lifetime of the bound object in this PR?

src/register.cpp Show resolved Hide resolved
return nullptr;
}
// TODO: do utf conversion
auto table_function = make_uniq<TableFunctionRef>();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or perhaps subclass TableFunctionRef ?

@hannes
Copy link
Member Author

hannes commented Dec 3, 2024

@krlmlr can you get this merged? I think the secondary lifetime issues might be something for a follow-up PR

@krlmlr krlmlr added this to the 1.1.3-1 milestone Dec 4, 2024
@krlmlr
Copy link
Collaborator

krlmlr commented Dec 4, 2024

Sure! I'd like to make it "opt in" though, otherwise it may (and probably will) break user code.

@eitsupi
Copy link
Contributor

eitsupi commented Dec 4, 2024

it may (and probably will) break user code.

Is it really so?
Wouldn't such an external table be queried if there is no table of that name in the existing connection (i.e., the query would fail without this feature)?
#140 (comment)

@krlmlr
Copy link
Collaborator

krlmlr commented Dec 4, 2024

Right, we discussed that. For 1.1.3-1, I'd make it opt-in, we can make it opt-out later.

@krlmlr krlmlr force-pushed the replacementscandf branch from 0fed172 to 2ce02b5 Compare December 7, 2024 15:55
@krlmlr krlmlr changed the title Inital stab at lazy data frame scan for R feat: With duckdb(environment_scan = TRUE), data frame objects are available as views in duckdb SQL queries Dec 7, 2024
@krlmlr krlmlr enabled auto-merge (squash) December 7, 2024 19:26
@krlmlr krlmlr merged commit 52c90db into main Dec 7, 2024
13 checks passed
@krlmlr krlmlr deleted the replacementscandf branch December 7, 2024 20:28
krlmlr added a commit that referenced this pull request Dec 8, 2024
…available as views in duckdb SQL queries (#164)

Co-authored-by: Hannes Mühleisen <hannes@duckdblabs.com>
Co-authored-by: Kirill Müller <krlmlr@users.noreply.github.com>
Co-authored-by: Kirill Müller <kirill@cynkra.com>
krlmlr added a commit that referenced this pull request Dec 8, 2024
…available as views in duckdb SQL queries (#164)

Co-authored-by: Hannes Mühleisen <hannes@duckdblabs.com>
Co-authored-by: Kirill Müller <krlmlr@users.noreply.github.com>
Co-authored-by: Kirill Müller <kirill@cynkra.com>
@NateNohling
Copy link

Is there a reprex for how should this work? I can't seem to get this to work even on duckdb_1.1.3-1

@krlmlr
Copy link
Collaborator

krlmlr commented Dec 12, 2024

There are tests.

@hannes
Copy link
Member Author

hannes commented Dec 13, 2024

cc @viktorleis

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Execute queries without registering Data Frames / Arrow Tables
5 participants