-
Notifications
You must be signed in to change notification settings - Fork 235
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mismatch of UTF-8 charachters between code and docs usage section #1186
Comments
Related issue: #748 |
roxygen2 needs to parse the function definition, to find the arguments, default values, etc. Then it uses In a UTF-8 locale the escaped Unicode characters are not restored: ❯ deparse(parse(text = "function(x=\"\u2019\"){}"))
[1] "structure(expression(function(x = \"’\") {"
[2] "}), srcfile = <environment>, wholeSrcref = structure(c(1L, 0L, "
[3] "2L, 0L, 0L, 0L, 1L, 2L), srcfile = <environment>, class = \"srcref\"))" and in a non-UTF-8 locale they are escaped differently: ❯ Sys.setlocale(locale = "C")
[1] "C/C/C/C/C/en_US.UTF-8"
> deparse(parse(text = "function(x=\"\u2019\"){}"))
[1] "structure(expression(function(x = \"<U+2019>\") {"
[2] "}), srcfile = <environment>, wholeSrcref = structure(c(1L, 0L, "
[3] "2L, 0L, 0L, 0L, 1L, 2L), srcfile = <environment>, class = \"srcref\"))" So to fix this we'd need to change how roxygen2 parses/deparses the code. Maybe it is possible to restore the original code from the parse tree, but this is not trivial, and the parse tree also has bugs. A workaround to your problem is to supply a |
We could maybe just re-escape all non-ASCII characters on the theory that you probably need to do that to appease R CMD check anyway? |
It seems that if we parse from a file, then the escaped form is kept in the parse data: cat("function(x=\"\\u2019\"){x}\n", file = tmp <- tempfile())
getParseData(parse(tmp, keep.source = TRUE))
#> line1 col1 line2 col2 id parent token terminal text
#> 19 1 1 1 23 19 0 expr FALSE
#> 1 1 1 1 8 1 19 FUNCTION TRUE function
#> 2 1 9 1 9 2 19 '(' TRUE (
#> 3 1 10 1 10 3 19 SYMBOL_FORMALS TRUE x
#> 4 1 11 1 11 4 19 EQ_FORMALS TRUE =
#> 5 1 12 1 19 5 7 STR_CONST TRUE "\\u2019"
#> 7 1 12 1 19 7 19 expr FALSE
#> 6 1 20 1 20 6 19 ')' TRUE )
#> 16 1 21 1 23 16 19 expr FALSE
#> 10 1 21 1 21 10 16 '{' TRUE {
#> 11 1 22 1 22 11 13 SYMBOL TRUE x
#> 13 1 22 1 22 13 16 expr FALSE
#> 12 1 23 1 23 12 16 '}' TRUE } Created on 2021-04-16 by the reprex package (v2.0.0) |
WAT |
Yeah, it is the same from |
It doesn't seem like there's much we can do about this, and it should naturally become less important as more windows users switch to 4.2. |
i've had some escaped UTF-8 characters in function arguments for several years. today for the first time, the windbuilder package check for R-devel complains about a mismatch between the function code and the usage section of the roxygen2 (7.1.1-1cran1.2004.0 ubuntu package) generated docs:
in the function code, the respective characters are escaped as
\u2019
, in the .Rd files they appear unescaped, try:shouldn't roxygen2 keep the function code as-is?
The text was updated successfully, but these errors were encountered: