Execute queries without registering Data Frames / Arrow Tables #140

eitsupi · 2024-04-06T15:26:24Z

It is convenient in the Python client to specify the target of a query without having to register pandas.DataFrame, etc., so it would be nice to have the same functionality in R.

The text was updated successfully, but these errors were encountered:

krlmlr · 2024-04-24T06:44:29Z

Are you proposing to make available all data-frame-like things for accessing them by name? From what environment -- the current environment plus all parent (enclosing) environments?

How would we control this feature? I suspect just enabling it might lead to surprises for existing code.

What about name clashes? Which object gets priority if a table by that name exists already?

eitsupi · 2024-04-24T13:50:38Z

What about name clashes? Which object gets priority if a table by that name exists already?

I forget where I read this, but I believe DuckDB has the ability to look for tables that are not on the DB from other places and just use that.

For example, the behavior is already different when the table test.csv exists and when it does not exist, as shown below.
(Needless to say, if the table test.csv does not exist, it will look for a CSV file named test.csv and use it as a virtual table.)

data.frame(bar = 2) |>
  write.csv("test.csv", row.names = FALSE)
duckdb:::sql('CREATE SCHEMA "test"; CREATE TABLE "test.csv" AS SELECT 1 AS foo; FROM "test.csv"')
#>   foo
#> 1   1

^{Created on 2024-04-24 with reprex v2.0.2}

data.frame(bar = 2) |>
  write.csv("test.csv", row.names = FALSE)
duckdb:::sql('FROM "test.csv"')
#>   bar
#> 1   2

^{Created on 2024-04-24 with reprex v2.0.2}

How would we control this feature? I suspect just enabling it might lead to surprises for existing code.

Given that this is already in use in the Python API, this is hardly a problem.

From what environment -- the current environment plus all parent (enclosing) environments?

Perhaps an additional argument is needed to specify the environment.

hannes · 2024-05-08T07:55:24Z

Indeed this should be straightforward, the R package can define a so-called replacement scan for this.

hannes · 2024-05-08T08:22:05Z

I looked into this, it's not that straightforward for arrow regrettably. So for now let's just do this for data.frames.

hannes · 2024-05-08T09:17:09Z

First stab is here: #164

viktorleis · 2024-12-13T09:14:20Z

Amazing, thanks!

krlmlr added the enhancement label Apr 24, 2024

hannes self-assigned this May 8, 2024

krlmlr added feature a feature request or enhancement and removed enhancement labels Aug 16, 2024

eitsupi mentioned this issue Dec 4, 2024

feat: With duckdb(environment_scan = TRUE), data frame objects are available as views in duckdb SQL queries #164

Merged

krlmlr closed this as completed in #164 Dec 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Execute queries without registering Data Frames / Arrow Tables #140

Execute queries without registering Data Frames / Arrow Tables #140

eitsupi commented Apr 6, 2024

krlmlr commented Apr 24, 2024

eitsupi commented Apr 24, 2024

hannes commented May 8, 2024

hannes commented May 8, 2024

hannes commented May 8, 2024

viktorleis commented Dec 13, 2024

Execute queries without registering Data Frames / Arrow Tables #140

Execute queries without registering Data Frames / Arrow Tables #140

Comments

eitsupi commented Apr 6, 2024

krlmlr commented Apr 24, 2024

eitsupi commented Apr 24, 2024

hannes commented May 8, 2024

hannes commented May 8, 2024

hannes commented May 8, 2024

viktorleis commented Dec 13, 2024