-
Notifications
You must be signed in to change notification settings - Fork 73
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
styler executed via console and from within R does not always yield same result #187
Comments
I was able to reproduce the problem with a smaller code snippet #
#
#
r <- function(y, s, g = 10) {
b("", "")
#
q <- o(d(i), function(i) {
d(op(t[[p]]), n(i = i))
})
f(calls) <- f(g)
mb <- j(c(
q(a::b), r,
y(u = 1)
))
k(b)
}
#
# When removing any of the above tokens, the problem does not occur anymore (I tried a few). It's very strange. E.g. replacing |
Thanks. Can you confirm that this problem also occurs when starting R from the terminal and then running |
Yes, that is the case |
I think I managed to narrow down the problem further. The parse data is already different for both methods (at least the parents). The nested parse data is then different again since the nesting is done differently I guess. Looking into it further just now. This is the flat parse data ( > non_interactive
# A tibble: 147 x 12
line1 col1 line2 col2 id parent token terminal text short token_before token_after
<int> <int> <int> <int> <int> <int> <chr> <lgl> <chr> <chr> <chr> <chr>
1 1 1 1 1 1 0 COMMENT TRUE # # COMMENT
2 2 1 2 1 4 0 COMMENT TRUE # # COMMENT COMMENT
3 3 1 3 1 7 0 COMMENT TRUE # # COMMENT SYMBOL
4 4 1 18 1 227 235 expr FALSE <NA> <NA>
5 4 1 4 1 10 12 SYMBOL TRUE r r COMMENT LEFT_ASSIGN
6 4 1 4 1 12 227 expr FALSE <NA> <NA>
7 4 3 4 4 11 227 LEFT_ASSIGN TRUE <- <- SYMBOL FUNCTION
8 4 6 18 1 226 227 expr FALSE <NA> <NA>
9 4 6 4 13 13 226 FUNCTION TRUE function funct LEFT_ASSIGN '('
10 4 14 4 14 14 226 '(' TRUE ( ( FUNCTION SYMBOL_FORMALS
# ... with 137 more rows
> interactive
# A tibble: 147 x 12
line1 col1 line2 col2 id parent token terminal text short token_before token_after
<int> <int> <int> <int> <int> <int> <chr> <lgl> <chr> <chr> <chr> <chr>
1 1 1 1 1 1 -227 COMMENT TRUE # # COMMENT
2 2 1 2 1 4 -227 COMMENT TRUE # # COMMENT COMMENT
3 3 1 3 1 7 -227 COMMENT TRUE # # COMMENT SYMBOL
4 4 1 18 1 227 0 expr FALSE <NA> <NA>
5 4 1 4 1 10 12 SYMBOL TRUE r r COMMENT LEFT_ASSIGN
6 4 1 4 1 12 227 expr FALSE <NA> <NA>
7 4 3 4 4 11 227 LEFT_ASSIGN TRUE <- <- SYMBOL FUNCTION
8 4 6 18 1 226 227 expr FALSE <NA> <NA>
9 4 6 4 13 13 226 FUNCTION TRUE function funct LEFT_ASSIGN '('
10 4 14 4 14 14 226 '(' TRUE ( ( FUNCTION SYMBOL_FORMALS
# ... with 137 more rows |
This is the nested parse data > non_interactive
# A tibble: 5 x 14
line1 col1 line2 col2 id parent token terminal text short token_before token_after internal
<int> <int> <int> <int> <int> <int> <chr> <lgl> <chr> <chr> <chr> <chr> <lgl>
1 1 1 1 1 1 0 COMMENT TRUE # # COMMENT TRUE
2 2 1 2 1 4 0 COMMENT TRUE # # COMMENT COMMENT TRUE
3 3 1 3 1 7 0 COMMENT TRUE # # COMMENT SYMBOL TRUE
4 20 1 20 1 232 0 COMMENT TRUE # # '}' COMMENT TRUE
5 21 1 21 1 235 0 COMMENT TRUE # # COMMENT TRUE
# ... with 1 more variables: child <list>
> interactive
# A tibble: 6 x 14
line1 col1 line2 col2 id parent token terminal text short token_before token_after internal
<int> <int> <int> <int> <int> <int> <chr> <lgl> <chr> <chr> <chr> <chr> <lgl>
1 1 1 1 1 1 -227 COMMENT TRUE # # COMMENT TRUE
2 2 1 2 1 4 -227 COMMENT TRUE # # COMMENT COMMENT TRUE
3 3 1 3 1 7 -227 COMMENT TRUE # # COMMENT SYMBOL TRUE
4 4 1 18 1 227 0 expr FALSE <NA> <NA> TRUE
5 20 1 20 1 232 0 COMMENT TRUE # # '}' COMMENT TRUE
6 21 1 21 1 235 0 COMMENT TRUE # # COMMENT TRUE
# ... with 1 more variables: child <list>
|
id 227 in |
So this is triggered by |
Not really. Even with the option |
I think one (pragmatic) solution would be to make sure tokens have a parent that is not a terminal. If it's is a terminal, we can assign the parent of the parent to the token as the parent. That's may fit into the package for handling the inconsistency in the parse data (see #100). I just tried it and it works for the case at hand. Line / col values of the parse data are corrected with the approach outlined above. |
I created #188, please have a look. It solves the specific case introduced in this issue. |
Pretty sure this is a bug in the R parser, it is not initialized properly the first time parse is run https://bugs.r-project.org/bugzilla3/show_bug.cgi?id=16041. lintr works around this by always running parse twice https://github.com/jimhester/lintr/blob/aa70430282852afa15a61cc69771e71dd32e1aec/R/get_source_expressions.R#L121-L129 I am pretty sure you don't see this interactively because the parser has to run on the input expression, and this bug only happens the very first time something is parsed. If you have a small reproducible example consider including it in the bug report, if there is independent verification maybe the root cause could be sorted out. |
Thanks for looking at that @jimhester. Indeed, using two identical R -q -e 'styler::style_file("R/debug.r")' gives the same result as running styler::style_file("R/debug.r") For the problematic case introduced above. Regarding the reprex, if I get you right, I could save the above code snippet as R -q -e 'first <- utils::getParseData(parse(file = "file_causing_error.R", keep.source = TRUE)); second <- utils::getParseData(parse(file = "file_causing_error.R", keep.source = TRUE)); all.equal(first, second)'
> first <- utils::getParseData(parse(file = "file_causing_error.R", keep.source = TRUE)); second <- utils::getParseData(parse(file = "file_causing_error.R", keep.source = TRUE)); all.equal(first, second)
#> [1] "Attributes: < Component “srcfile”: Component “parseData”: Mean relative difference: 3.897872 >"
#> [2] "Component “parent”: Mean relative difference: 3.897872"
> And in particular, show the offending values of "parent". R -e 'first <- utils::getParseData(parse(file = "file_causing_error.R", keep.source = TRUE))$parent; second <- utils::getParseData(parse(file = "file_causing_error.R", keep.source = TRUE))$parent; data.frame(first, second)[first != second,]'
#> first second
#> 1 0 -227
#> 2 0 -227
#> 3 0 -227
#> 4 235 0 And submit a new bug report and make reference to Bug 16041. |
I wouldn't submit a new bug report, rather comment on the existing one, I am pretty sure it is the same bug. |
Ok, I will do it that way then. |
As far as styler goes, I suggest to adapt @jimhester's solution and parse twice. |
If Kirill already has access to the R bug reporting system, maybe he could add my example to Jim Hester's bug report since I just realized that I need an R Core member to manually add me to the bug reporting system in order to comment on an existing bug report. |
Thanks, catching up. Rather than posting to a four-year-old ticket I'd try to fix the error in R-core and submit a PR there. IIUC, I need to call #
#
#
r <- function(y, s, g = 10) {
b("", "")
#
q <- o(d(i), function(i) {
d(op(t[[p]]), n(i = i))
})
f(calls) <- f(g)
mb <- j(c(
q(a::b), r,
y(u = 1)
))
k(b)
}
#
# |
I've fixed 16041 in R-devel, 75178. Could you please check if this fixes also the problem you found using |
Ok, cool. Let's see. |
After having temporarily fixed a problem with our CI tool
https://travis-ci.org/lorenzwalthert/styler/builds/420091391 What is the best practice to deal with fixed bugs in base R? Circumvent them explicitly only depending on the R version? Or any time? |
Thanks, I am happy to hear it worked. How to deal with base R fixes, it depends. In this case I think using the workaround unnecessarily is undesirable due to performance overhead, so I think it would make sense to add a guard on specific R version. Even though not terribly pretty, you could for instead use the double parsing only when |
As discussed at krlmlr/dplyr@1c06efc, styler does not return the same prettified code when run from within R
and from the command line
R -q -e 'styler::style_file("R/bench-compare.r")'
For the majority of other files in this repo (dplyr package), the styling is identical.
The text was updated successfully, but these errors were encountered: