-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dbplyr 2.4.0 breaks duckdb's ability to treat files as tables #38
Comments
This is not any actual bug in the DuckDB-backend, but caused by a breaking change in dbplyr. Probably functionality can be restored quite easily by using |
As a work-around at the moment you can use the following: duckdb::duckdb() |>
DBI::dbConnect(drv = _, read_only = TRUE) |>
dplyr::tbl("'metadata.0.2.3.parquet'") Even that the version above works, it still gives some warnings, so you could use instead the following: duckdb::duckdb() |>
DBI::dbConnect(drv = _, read_only = TRUE) |>
dplyr::tbl(dplyr::sql("FROM 'metadata.0.2.3.parquet'")) Or an alternative version: duckdb::duckdb() |>
DBI::dbConnect(drv = _, read_only = TRUE) |>
dplyr::tbl(dplyr::sql("FROM read_parquet('metadata.0.2.3.parquet')")) |
Since this import syntax seems to break read_parquet <- function(conn, path){
from_clause <- glue("FROM read_parquet('{path}')") |> sql()
tbl(conn, from_clause)
} |
Thanks. Reprex: options(conflicts.policy = list(warn = FALSE))
data <- data.frame(a = 1)
arrow::write_parquet(data, "data.parquet")
library("duckdb")
#> Loading required package: DBI
duckdb_con <- dbConnect(duckdb())
dplyr::tbl(duckdb_con, "data.parquet")
#> It looks like you tried to incorrectly use a table in a schema as source.
#> ℹ If you want to specify a schema use `in_schema()` or `in_catalog()`.
#> ℹ If your table actually contains "." in the name use `check_from = FALSE` to
#> silence this message.
#> Error in `collect()`:
#> ! Failed to collect lazy table.
#> Caused by error:
#> ! rapi_prepare: Failed to prepare query SELECT "data.parquet".*
#> FROM "data.parquet"
#> LIMIT 11
#> Error: Binder Error: Referenced table "data.parquet" not found!
#> Candidate tables: "data" Created on 2024-02-24 with reprex v2.1.0 |
@multimeric: Thanks for the hint, I went for this approach in the revamped |
Let's not add new functions for now. |
I'm confused. This seems to work well with both dbplyr 2.3.4 and 2.4.0, with no changes made to the code. I suspect a dplyr update fixed this in the meantime. Can you confirm? |
In the meantime, I also have implemented |
DuckDB's ability to execute
SELECT
statements directly on files (CSV, Parquet etc) worked fine withdbplyr
, up until 2.3.4. However with the upgrade to 2.4.0, it fails. With the below example I'm querying a specific file calledmetadata.0.2.3.parquet
, but you can replace it with any parquet file and the same issue will occur:According to tidyverse/dbplyr#1390, this bug is something that needs to be fixed in the
duckdb
package.The text was updated successfully, but these errors were encountered: