Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to deal with pq__varchar columns #444

Open
dmkaplan2000 opened this issue Oct 10, 2023 · 15 comments
Open

How to deal with pq__varchar columns #444

dmkaplan2000 opened this issue Oct 10, 2023 · 15 comments

Comments

@dmkaplan2000
Copy link

I have some queries that return columns of type pq__varchar. I was wondering if there was some simple way to convert these data into a more R friendly format inside R?

Thanks,
David

@krlmlr
Copy link
Member

krlmlr commented Oct 10, 2023

Thanks. Could you please provide a reproducible example? Please see https://reprex.tidyverse.org/ for guidance.

@dmkaplan2000
Copy link
Author

Given any RPostgres connection to a postgresql database, con, try the following:

x = dbGetQuery(con,"WITH a(v) AS (VALUES(ARRAY['a','b']::varchar[]),(ARRAY['abc','def','fgh']::varchar[])) SELECT * FROM a")
class(x$v)

@dmkaplan2000
Copy link
Author

Ideally I would be able to treat the elements of x$v like vectors/arrays in R, extracting elements as needed, etc., but it isn't clear how to do this with a pq__varchar. I was wondering if there was some sort of straight forward function or method for transforming these into something more familiar to R.

@dmkaplan2000
Copy link
Author

dmkaplan2000 commented Oct 26, 2023

The less than ideal solution that I eventually used for this situation was to parse the string representation of the array in R. But this only works easily because I have 1-D arrays with strings that do not have quote characters or commas in them. A better solution working directly from the pq__varchar variable would be preferable. Here is an example of a solution:

x = dbGetQuery(con,"WITH a(v) AS (VALUES(ARRAY['a','b']::varchar[]),(ARRAY['abc','def','fgh']::varchar[])) SELECT * FROM a")

x$v_list = lapply(x$v,\(.) eval(parse(text=sub("[{]","c('",sub("[}]","')",gsub(",","','",.))))))

@krlmlr
Copy link
Member

krlmlr commented Apr 1, 2024

Reprex:

con <- DBI::dbConnect(RPostgres::Postgres())

x <- DBI::dbGetQuery(con,"WITH a(v) AS (VALUES(ARRAY['a','b']::varchar[]),(ARRAY['abc','def','fgh']::varchar[])) SELECT * FROM a")
x
#>               v
#> 1         {a,b}
#> 2 {abc,def,fgh}
class(x$v)
#> [1] "pq__varchar"

Created on 2024-04-01 with reprex v2.1.0

@paleolimbot: Is there support for nested data in adbcpostgresql?

@paleolimbot
Copy link

It seems to!

library(adbcdrivermanager)

con <- adbcpostgresql::adbcpostgresql() |> 
  adbc_database_init(uri = "postgresql://localhost:5432/postgres?user=postgres&password=password") |> 
  adbc_connection_init()

con |> 
  read_adbc(
    "WITH a(v) AS (VALUES(ARRAY['a','b']::varchar[]),(ARRAY['abc','def','fgh']::varchar[])) SELECT * FROM a"
  ) |> 
  as.data.frame() |> 
  dplyr::pull()
#> <list_of<character>[2]>
#> [[1]]
#> [1] "a" "b"
#> 
#> [[2]]
#> [1] "abc" "def" "fgh"

Created on 2024-04-01 with reprex v2.1.0

I say this lightly, but I actually spent quite a bit of time getting this to work. It also works for postgres' "record" type, where it should give you back a nested data frame. There is a slight limitation on reading the array type...I believe multidimensional arrays will just be returned as a flat list_of and will (currently) loose the dimensions.

@krlmlr
Copy link
Member

krlmlr commented Apr 1, 2024

Thanks, this is wonderful! I suspect it will also work via adbi, https://adbi.r-dbi.org/. @dmkaplan2000: can you confirm?

@krlmlr
Copy link
Member

krlmlr commented Apr 1, 2024

This brings up memories of my work bringing nested data to the duckdb R client. I do appreciate the effort!

@dmkaplan2000
Copy link
Author

I tried getting this to work with adbi, but I must admit I can't figure out how to successfully connect to a database. First I tried opening a connection with:

con2 = dbConnect(adbi::adbi("adbcpostgresql"),uri="postgresql://localhost:5432/DBNAME?user=USER&password=PASSWORD")

That worked without error, but when I tried to use the connection to execute a query I got:

> dbGetQuery(con2,"WITH a(v) AS (VALUES(ARRAY['a','b']::varchar[]),(ARRAY['abc','def','fgh']::varchar[])) SELECT * FROM a")
Error in adbcdrivermanager::adbc_statement_get_parameter_schema(stmt) : 
  NOT_IMPLEMENTED

Based on what I could find online, I tried reducing the URI to uri="postgresql://localhost:5432". That also worked for starting the connection, but not for executing a query.

Do you know what I am doing wrong? The help on adbi and adbcpostgresql is pretty spartan...

@krlmlr
Copy link
Member

krlmlr commented Apr 15, 2024

It's a moonshot, but can you try dbGetQuery(immediate = TRUE) ? This is DBI's current way of telling "we don't want a prepared query, just execute". I wonder if adbi takes action based on that flag, and if not, whether it should.

CC @nbenn.

@dmkaplan2000
Copy link
Author

immediate=TRUE fixed the issue!

> con2 = dbConnect(adbi::adbi("adbcpostgresql"),uri="postgresql://localhost:5432")
> x=dbGetQuery(con2,"WITH a(v) AS (VALUES(ARRAY['a','b']::varchar[]),(ARRAY['abc','def','fgh']::varchar[])) SELECT * FROM a",immediate=TRUE)
> x$v
<list_of<character>[2]>
[[1]]
[1] "a" "b"

[[2]]
[1] "abc" "def" "fgh"

So this query works with adbi, but I am not sure this really solves my issue in the sense that it fixes one thing by breaking another. My real queries involve, in addition to character arrays, spatial data provided by the postgis extension to postgresql. RPostgres plays nicely with the sf package so the spatial data can be processed, but I do not think that adbi can be used with sf. At least my initial examination does not indicate that this is the case.

@dmkaplan2000
Copy link
Author

dmkaplan2000 commented Sep 26, 2024

Hi, I recently just had to find a generic solution for converting 1-D Postgresql arrays in a pq_varchar column into a list of character vectors. The solution that was found is detailed on stackoverflow here, but the essential code is:

# Generate data
con <- DBI::dbConnect(RPostgres::Postgres())
q = "WITH a(v) AS (VALUES(ARRAY['''a''','b']::varchar[]),(ARRAY['abc, def','the \"big\" one','fgh']::varchar[])) SELECT * FROM a"
x <- DBI::dbGetQuery(con,q)

# Function that does the work on individual varchar arrays imported into R
pq_varchar_array_to_char = function(x) {
  x |>
    gsub("^[{]|[}]$", "", x = _) |>
    gsub("[\\][\"]","%myspecialquote%",x = _) |>
    scan(text = _, quote = '"', what = "", sep = ",", quiet = TRUE) |>
    gsub("%myspecialquote%","\"",x = _)
}

# Apply function to each element of column of x

## Non-tidyverse version
out <- transform(x, v = lapply(v, pq_varchar_array_to_char))

## Tidyverse version
library(tidyverse)
out2 <- mutate(x, v = lapply(v, pq_varchar_array_to_char))

Perhaps some version of pq_varchar_array_to_char could be added to RPostgres either to be applied directly to these column types or as a helper function for users? (note: my function will fail if any of the values happens to have the text %myspecialquote% in it, but this could be fixed by recursively testing, for example, |%|, |%%|, |%%%| ... to see if one of them is not in the array and then use that as the replacement for slash-quote).

This function could also be extended to work with various types of Postgresql arrays (for example, with a generic convert_pq_array_to_r_vector function with methods for each column type that would apply the base function and then cast vector elements to the desired final value type).

This function could probably be extended to the case of N-D arrays, but that is not my use case for the time being...

@krlmlr
Copy link
Member

krlmlr commented Sep 26, 2024

Thanks. Does the VALUES() function return JSON in this example by any chance? Can you share an example for input that pq_varchar_array_to_char() would see?

@dmkaplan2000
Copy link
Author

No, I don't think VALUES() returns JSON in this example. What pq_varchar_array_to_char() would see is as follows:

> con <- DBI::dbConnect(RPostgres::Postgres())
> q = "WITH a(v) AS (VALUES(ARRAY['''a''','b']::varchar[]),(ARRAY['abc, def','the \"big\" one','fgh']::varchar[])) SELECT * FROM a"
> x <- DBI::dbGetQuery(con,q)
> x
                                     v
1                              {'a',b}
2 {"abc, def","the \\"big\\" one",fgh}
> x$v[2]
[1] "{\"abc, def\",\"the \\\"big\\\" one\",fgh}"
> class(x)
[1] "data.frame"
> class(x$v)
[1] "pq__varchar"
> class(x$v[2])
[1] "character"

Is that what you wanted?

@krlmlr
Copy link
Member

krlmlr commented Sep 27, 2024

Yes, thanks. Maybe we could leverage the PostgreSQL json_agg() function to return proper JSON and parse it in R with jsonlite?

https://stackoverflow.com/a/24006432/946850

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants