Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vectorised switch (aka SQL DECODE) #1710

Closed
datalove opened this issue Mar 13, 2016 · 4 comments
Closed

Vectorised switch (aka SQL DECODE) #1710

datalove opened this issue Mar 13, 2016 · 4 comments
Labels
feature a feature request or enhancement
Milestone

Comments

@datalove
Copy link

I do a lot of SQL querying in Oracle and often find myself reaching for the DECODE SQL function (link), which is a bit like a lightweight CASE statement, with simple replacements and an optional default replacement.

I thought it might also be useful with tbl_df using mutate in R so I wrote a decode function that similarly takes pairs of arguments like so:

decode(x, target_1, replacement_1, target_2, replacement_2, ... ,optional_default)

And set it up to work with named arguments (because I'm lazy).

mtcars %>% mutate(cyl_name = decode(cyl,4,'four',6,'six',8,'eight'))
mtcars %>% mutate(cyl_name = decode(cyl,4,'four',6,'six','guzzler'))
mtcars %>% mutate(cyl_name = decode(cyl,4,four,6,six,guzzler))
mtcars %>% mutate(cyl_name = decode(cyl,4,four,6,six))

Is it worth submitting a pull request to this decode function? I'm sure I can improve what's below.

I think adding a decode function reduces a bit of the need to solve this issue (#631) relating to a general purpose SQL-like CASE function.

decode <- function(x, ...) {

  odds <- function(x) { unlist(x[1:length(x) %% 2 == 1][1:floor(length(x)/2)]) }
  even <- function(x) { unlist(x[1:length(x) %% 2 == 0]) }
  last <- function(x) { unlist(if(length(x) %% 2 == 1) tail(x,1)) }

  interpret_args <- function(x) { if(is.call(x)) {eval(x)} else if(is.name(x)) {as.character(x)} else {x} }

  args <- eval(substitute(alist(...)))
  args <- lapply(args, interpret_args)

  targets      <- odds(args)
  replacements <- even(args)
  default      <- last(args)

  res <- x

  if(!is.null(default))
    res[! x %in% targets & ! is.na(x)] <- default

  for(i in seq_along(targets)) {
    t <- targets[[i]];  r <- replacements[[i]]
    res[res == t | (is.na(t) & is.na(res))] <- r    
  }

  if(inherits(x, "factor"))
    res <- as.factor(res)

  res
} 
@hadley hadley added feature a feature request or enhancement database labels Mar 14, 2016
@hadley hadley added this to the 0.5 milestone Mar 14, 2016
@hadley
Copy link
Member

hadley commented Mar 14, 2016

I think it would make more sense to work like a vectorised switch(), but I agree that it's a good idea and would be a nice accompaniment to the new na_if() and coalesce().

@hadley
Copy link
Member

hadley commented Mar 14, 2016

Note that DECODE appears to be an oracle extension, so would need to be converted to a CASE statement by default.

@joranE
Copy link
Contributor

joranE commented Mar 14, 2016

FWIW, I am frequently also using decode with an Oracle db, and have ended up with the following R implementations for character and factor arguments:

decode.character <- function(x,...,default = NULL){
    replacements <- unlist(list(...))
    unique_x <- unique(x)
    replacements <- replacements[names(replacements) %in% unique_x]
    if (is.null(default)){
        key <- setNames(unique_x,unique_x)
    }else{
        key <- setNames(rep(default,length(unique_x)),unique_x)
    }
    key[names(replacements)] <- replacements
    return(unname(key[x]))
}
decode.factor <- function(x,...,default = NULL){
    replacements <- unlist(list(...))
    unique_x <- levels(x)
    replacements <- replacements[names(replacements) %in% unique_x]
    if (is.null(default)){
        key <- setNames(unique_x,unique_x)
    }else{
        key <- setNames(rep(default,length(unique_x)),unique_x)
    }
    key[names(replacements)] <- replacements
    levels(x) <- key[levels(x)]
    return(x)
}

...and they are called a bit more in the style of switch, with name = value pairs in ..., where name is the old and value is the new.

@hadley hadley changed the title Simple substitution in mutate using a 'decode' function Vectorised switch (aka SQL DECODE) Mar 14, 2016
@hadley hadley closed this as completed in ee097ad Mar 14, 2016
@lock lock bot locked as resolved and limited conversation to collaborators Jun 9, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
feature a feature request or enhancement
Projects
None yet
Development

No branches or pull requests

4 participants
@hadley @datalove @joranE and others