-
Notifications
You must be signed in to change notification settings - Fork 129
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
A base R replacement for formatR::tidy_source() #562
Comments
To elaborate: this is drake:::standardize_command
#> function (x)
#> {
#> x <- ignore_ignore(x) %>% language_to_text
#> formatR::tidy_source(source = NULL, comment = FALSE, blank = FALSE,
#> arrow = TRUE, brace.newline = FALSE, indent = 4, output = FALSE,
#> text = as.character(x), width.cutoff = 119)$text.tidy %>%
#> paste(collapse = "\n") %>% braces
#> }
#> <bytecode: 0x4631438>
#> <environment: namespace:drake> Created on 2018-10-28 by the reprex package (v0.2.1) |
Also cc @lorenzwalthert. Do you have advice on turning assignment |
have you looked at base R`s parse()? that's one step further then just reformatting the code |
Base R's substring assignment won't layer in bigger width replacements but For folks who have put multiple |
Thank you so much, @hrbrmstr! Your demo of I actually prefer not to depend on Anyway, if we go through each line backwards, we can handle equals signs using only standardize_code <- function(x){
x <- as.character(x)
parsed <- parse(text = x, keep.source = TRUE)
info <- utils::getParseData(parsed, includeText = TRUE)
lines <- replace_equals(lines = strsplit(x, "\n")[[1]], info = info)
out <- parse(
text = paste0(lines, collapse="\n"),
keep.source = FALSE
)[[1]]
paste0(deparse(out), collapse = "\n")
}
replace_equals <- function(lines, info){
equals <- info[info$token == "EQ_ASSIGN", ]
for (line in unique(equals$line1)){
line_info <- equals[equals$line1 == line, ]
for (col in sort(line_info$col1, decreasing = TRUE)){
lines[line] <- paste0(
substr(x = lines[line], start = 0, stop = col - 1),
"<-",
substr(x = lines[line], start = col + 1, stop = nchar(lines[line]))
)
}
}
lines
}
y <- standardize_code("z = {f('#'); w = 123; char = \"str\" # comment
x = y = z
`=`(zz, yy)
x <- 5
5 -> x
y = 'test'
z <- function(a = 1){
sqrt(x = a)
}
function(b = mtcars){
lm(mpg ~ wt, data = b)
} -> f
}")
cat(y)
#> z <- {
#> f("#")
#> w <- 123
#> char <- "str"
#> x <- y <- z
#> zz = yy
#> x <- 5
#> x <- 5
#> y <- "test"
#> z <- function(a = 1) {
#> sqrt(x = a)
#> }
#> function(b = mtcars) f <- {
#> lm(mpg ~ wt, data = b)
#> }
#> } Created on 2018-10-28 by the reprex package (v0.2.1) |
So standardize_command <- function(x) {
ignore_ignore(x) %>%
language_to_text() %>%
standardize_code()
} |
From this comment:
Above, Another thing: the following code function(b = mtcars){
lm(mpg ~ wt, data = b)
} -> f becomes function(b = mtcars) f <- {
lm(mpg ~ wt, data = b)
} I am okay with that. For the purposes of |
I wonder if there is any way to utilise the byte compiler for this? something like compile the function, calculate the md5 sum of the byte code, see if it matches the md5 of the last known version. I tried playing around with this yesterday but I could find no way to retrieve the actual byte code (just its memory address) |
I am not sure I can follow, but thanks all involved people to work on this :-) First, formatR has zero dependencies, so the benefit of replacing functionality from a widely used package with an internal solution for a non-trivial task (problems mentioned throughout the thread) does not seem that beneficial to me. Second, I am not sure if styling is the way to go if the problem can be understood as caching. I.e. why not hashing the parse data? We already have library(magrittr)
assign_one <- "z = {f('#') # comment
x <- 5
y <- 'test'
z <- 4
x2 <- 'test2'
}"
assign_two <- "z = {f('#') # comment X
x = 5
y <- 'test'
z = 4
'test2' -> x2
}"
hash_code <- function(code) {
# reverse '->' to '<-' AND keep code valid
code <- deparse(parse(text = code)[[1]])
parse_data <- parse(text = code, keep.source = TRUE) %>%
getParseData(includeText = TRUE)
change_row <- parse_data$token == "LEFT_ASSIGN"
parse_data$text[change_row] <- "="
parse_data$token[change_row] <- "EQ_ASSIGN"
hash <- parse_data[parse_data$token != "COMMENT" & parse_data$terminal, c("token", "text")] %>%
c() %>%
digest::digest()
}
lapply(list(assign_one, assign_two), hash_code)
#> [[1]]
#> [1] "96c34606528d64f2d258731a79837c1b"
#>
#> [[2]]
#> [1] "96c34606528d64f2d258731a79837c1b" Created on 2018-10-29 by the reprex package (v0.2.1) Note that this turns Third, allowing any other assignment than drake::drake_plan(g <- x)
#> Warning: Use `=` instead of `<-` or `->` to assign targets to commands in `drake_plan()`. For example, write `drake_plan(a = 1)` instead of `drake_plan(a <- 1)`. Arrows were used to declare these commands:
#> g <- x
#> # A tibble: 1 x 2
#> target command
#> <chr> <chr>
#> 1 drake_target_1 g <- x Created on 2018-10-29 by the reprex package (v0.2.1) This is really an edge case and getting it right is probably quite hard. So I am not sure why we are trying to create a solution that can hash Also, the solution hashes |
I really I think this is unrelated to the API function drake::drake_plan(y = g <- x)
#> # A tibble: 1 x 2
#> target command
#> <chr> <chr>
#> 1 y g <- x |
Also, I think we can avoid serialization when we hash if we paste everything together first. Could save some time that we could reinvest in a long-ish hash algorithm. standardize_code <- function(x){
x <- deparse(parse(text = as.character(x), keep.source = FALSE)) %>%
paste(collapse = "\n")
info <- parse(text = x, keep.source = TRUE) %>%
utils::getParseData(includeText = TRUE)
change <- info$token == "LEFT_ASSIGN"
info$text[change] <- "="
info$token[change] <- "EQ_ASSIGN"
info[info$token != "COMMENT" & info$terminal, c("token", "text")] %>%
lapply(FUN = paste, collapse = " ") %>%
paste(collapse = " >> ") %>%
digest::digest(algo = "sha256", serialize = FALSE)
} |
But it still seems to have problems. library(magrittr)
standardize_code <- function(x){
x <- deparse(parse(text = as.character(x), keep.source = FALSE)) %>%
paste(collapse = "\n")
info <- parse(text = x, keep.source = TRUE) %>%
utils::getParseData(includeText = TRUE)
change <- info$token == "LEFT_ASSIGN"
info$text[change] <- "="
info$token[change] <- "EQ_ASSIGN"
info[info$token != "COMMENT" & info$terminal, c("token", "text")] %>%
lapply(FUN = paste, collapse = " ") %>%
paste(collapse = " >> ") %>%
digest::digest(algo = "sha256", serialize = FALSE)
}
standardize_code("y=sqrt(x=1)")
#> [1] "e133b7dbb0b12167bd18f31f82ecc79cdf9f6914c256eb5762d749502e5457a4"
standardize_code("y <- sqrt(x = 1)")
#> [1] "2043d80987b0787db97bc408d3bbad3b974e92e4106f3d2a258c8f2d46e5e014" Created on 2018-10-29 by the reprex package (v0.2.1) It appears the library(magrittr)
standardize_code <- function(x){
x <- deparse(parse(text = as.character(x), keep.source = FALSE)) %>%
paste(collapse = "\n")
info <- parse(text = x, keep.source = TRUE) %>%
utils::getParseData(includeText = TRUE)
change <- info$token == "LEFT_ASSIGN"
info$text[change] <- "="
info$token[change] <- "EQ_ASSIGN"
info[info$token != "COMMENT" & info$terminal, c("token", "text")]
}
standardize_code("y=sqrt(x=1)")
#> token text
#> 1 SYMBOL_FUNCTION_CALL expression
#> 2 '(' (
#> 4 SYMBOL_SUB y
#> 5 EQ_SUB =
#> 6 SYMBOL_FUNCTION_CALL sqrt
#> 7 '(' (
#> 9 SYMBOL_SUB x
#> 10 EQ_SUB =
#> 11 NUM_CONST 1
#> 13 ')' )
#> 17 ')' )
standardize_code("y <- sqrt(x = 1)")
#> token text
#> 1 SYMBOL_FUNCTION_CALL expression
#> 2 '(' (
#> 4 SYMBOL y
#> 5 EQ_ASSIGN =
#> 7 SYMBOL_FUNCTION_CALL sqrt
#> 8 '(' (
#> 10 SYMBOL_SUB x
#> 11 EQ_SUB =
#> 12 NUM_CONST 1
#> 14 ')' )
#> 18 ')' ) Created on 2018-10-29 by the reprex package (v0.2.1)
|
You forgot to get rid of the x <- deparse(parse(text = as.character(x), keep.source = FALSE)) %>% but you need x <- deparse(parse(text = as.character(x), keep.source = FALSE)[[1]]) %>% so library(magrittr)
standardize_code <- function(x) {
x <- deparse(parse(text = as.character(x), keep.source = FALSE)[[1]]) %>%
paste(collapse = "\n")
info <- parse(text = x, keep.source = TRUE) %>%
utils::getParseData(includeText = TRUE)
change <- info$token == "LEFT_ASSIGN"
info$text[change] <- "="
info$token[change] <- "EQ_ASSIGN"
info[info$token != "COMMENT" & info$terminal, c("token", "text")] %>%
lapply(FUN = paste, collapse = " ") %>%
paste(collapse = " >> ") %>%
digest::digest(algo = "sha256", serialize = FALSE)
}
standardize_code("y=sqrt(x=1)")
#> [1] "2f3905f7987e8b54c1cdad95475c8a754a591e5a124a43123df387337abb4866"
standardize_code("y <- sqrt(x = 1)")
#> [1] "2f3905f7987e8b54c1cdad95475c8a754a591e5a124a43123df387337abb4866" Created on 2018-10-30 by the reprex package (v0.2.0.9000) |
Awesome, thanks! With that, we also need to handle empty strings. standardize_code <- function(x){
if (!length(x)){
return(as.character(NA))
}
x <- deparse(parse(text = as.character(x), keep.source = FALSE)[[1]]) %>%
paste(collapse = "\n")
info <- parse(text = x, keep.source = TRUE) %>%
utils::getParseData(includeText = TRUE)
change <- info$token == "LEFT_ASSIGN"
info$text[change] <- "="
info$token[change] <- "EQ_ASSIGN"
info[info$token != "COMMENT" & info$terminal, c("token", "text")] %>%
lapply(FUN = paste, collapse = " ") %>%
paste(collapse = " >> ") %>%
digest::digest(algo = "sha256", serialize = FALSE)
} And now, it passes |
A potential speed issue: #563 (comment) |
Background
I am trying to trim down
drake
's package dependencies. I removed 7 just yesterday, bringing the count down from 22 to 15. I am not too concerned about base packages likeutils
, packages inr-lib
likewithr
, or tidyverse packages likedplyr
. And it seems infeasible to removestorr
,igraph
, orcodetools
. So as I continue on, I hope to find a base R replacement forformatR::tidy_source()
.formatR::tidy_source()
provides a means of standardizing a target's commands. This standardization removes comments, strange indentation, etc. beforedrake
decides if a command changed since lastmake()
. Clearly, if all we do is add spaces or comments, we don't wantdrake
to rebuild the target.Requirements
A base R replacement should
=
s and<-
's into->
.Challenges
R's default parser will take care of (1) and (2), but (3) is tricky.
Created on 2018-10-28 by the reprex package (v0.2.1)
Created on 2018-10-28 by the reprex package (v0.2.1)
Rollout
This change will invalidate all targets in all workflows, so I am postponing the release until
drake
7.0.0 (first half of 2019). We should also add a warning inassert_compatible_cache()
.The text was updated successfully, but these errors were encountered: