-
Notifications
You must be signed in to change notification settings - Fork 175
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dbplyr 2.4.0 breaks duckdb's ability to treat files as tables #1390
Comments
While the update to 2.4.0 is the reason this failed, I think this is something that should be fixed by duckdb. Please open an issue in the duckdb-r repository. |
Can you please elaborate on what part of the dbplyr or DBI API the duckdb package is misusing or how it should be fixed? |
SELECT "metadata.0.2.3.parquet".*
FROM "metadata.0.2.3.parquet" which is correct. duckdb has the special support for reading directly from files if the table would not be quoted. |
I'm confused, is this not identical to what I'm doing? |
I explained why it doesn't work 😉 You want the query SELECT "metadata.0.2.3.parquet".*
FROM "metadata.0.2.3.parquet" You could use a workaround with |
How DuckDB-backend handles this is that it checks whether the table exists and if not it removes quotes so that DuckDB parser is able to do the magic, i.e. to perform a replacement scan (such as reading from file) or execute DuckDB function such as So putting an explicit helper function there is, in my opinion, something that breakes the magic, i.e. you should (for no reason) know when you are dealing with actual database tables and when you just query from some other source of data. I haven't checked, but probably In any case, the detected problem is not a bug in the DuckDB-backend, but caused by a breaking change in how dbplyr works. That is unfortunate as this probably breaks a large proportion of data processing pipelines that use DuckDB's dbplyr-backend (using parquet or csv files). Those real-world use cases are not in any package tests, so the magnitude of the breaking change was probably difficult to see. |
DuckDB can execute
SELECT
statements directly on files (CSV, Parquet etc) without needing to use a function. In this case, the file name is treated as the table name. This is explained in the docs: https://duckdb.org/docs/archive/0.9.1/data/parquet/overview.Previously this functionality worked fine with
dbplyr
, up until 2.3.4. However with the upgrade to 2.4.0, it fails. With the below example I'm querying a specific file calledmetadata.0.2.3.parquet
, but you can replace it with any parquet file and the same issue will occur:This may be related to #1388, but the error messages are somewhat different.
The text was updated successfully, but these errors were encountered: