Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Execute queries without registering Data Frames / Arrow Tables #140

Closed
eitsupi opened this issue Apr 6, 2024 · 6 comments · Fixed by #164
Closed

Execute queries without registering Data Frames / Arrow Tables #140

eitsupi opened this issue Apr 6, 2024 · 6 comments · Fixed by #164
Assignees
Labels
feature a feature request or enhancement

Comments

@eitsupi
Copy link
Contributor

eitsupi commented Apr 6, 2024

From duckdb/duckdb#6771

It is convenient in the Python client to specify the target of a query without having to register pandas.DataFrame, etc., so it would be nice to have the same functionality in R.

@krlmlr
Copy link
Collaborator

krlmlr commented Apr 24, 2024

Are you proposing to make available all data-frame-like things for accessing them by name? From what environment -- the current environment plus all parent (enclosing) environments?

How would we control this feature? I suspect just enabling it might lead to surprises for existing code.

What about name clashes? Which object gets priority if a table by that name exists already?

@eitsupi
Copy link
Contributor Author

eitsupi commented Apr 24, 2024

What about name clashes? Which object gets priority if a table by that name exists already?

I forget where I read this, but I believe DuckDB has the ability to look for tables that are not on the DB from other places and just use that.

For example, the behavior is already different when the table test.csv exists and when it does not exist, as shown below.
(Needless to say, if the table test.csv does not exist, it will look for a CSV file named test.csv and use it as a virtual table.)

data.frame(bar = 2) |>
  write.csv("test.csv", row.names = FALSE)
duckdb:::sql('CREATE SCHEMA "test"; CREATE TABLE "test.csv" AS SELECT 1 AS foo; FROM "test.csv"')
#>   foo
#> 1   1

Created on 2024-04-24 with reprex v2.0.2

data.frame(bar = 2) |>
  write.csv("test.csv", row.names = FALSE)
duckdb:::sql('FROM "test.csv"')
#>   bar
#> 1   2

Created on 2024-04-24 with reprex v2.0.2

How would we control this feature? I suspect just enabling it might lead to surprises for existing code.

Given that this is already in use in the Python API, this is hardly a problem.

From what environment -- the current environment plus all parent (enclosing) environments?

Perhaps an additional argument is needed to specify the environment.

@hannes
Copy link
Member

hannes commented May 8, 2024

Indeed this should be straightforward, the R package can define a so-called replacement scan for this.

@hannes hannes self-assigned this May 8, 2024
@hannes
Copy link
Member

hannes commented May 8, 2024

I looked into this, it's not that straightforward for arrow regrettably. So for now let's just do this for data.frames.

@hannes
Copy link
Member

hannes commented May 8, 2024

First stab is here: #164

@viktorleis
Copy link

Amazing, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature a feature request or enhancement
Projects
None yet
4 participants