-
Notifications
You must be signed in to change notification settings - Fork 235
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Problems with special characters #654
Comments
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
I reran with roxygen2 6.0.1.9000, which I hope is the latest dev version. A few more cases produced output, but there was no change to the final results. Attached are text files with detailed results from the two runs. The columns are pretty self-explanatory, I think, except perhaps these
To see the differences that might affect the final results, I cut the first 9 columns and diff'ed them (attached). A new bug has creeped in: roxescape.cran.txt |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
The problems with the HTML escapes for roxygen2:::markdown("`a && b`")
#> [1] "`a && b`"
roxygen2:::markdown("`<>`")
#> [1] "`<>`" Created on 2019-09-10 by the reprex package (v0.3.0) |
To unpack what's going on here, here are a few helpers to convert markdown to Rd (using roxygen2), and then convert Rd to text and html (using the tools package) library(purrr)
roxy_md <- function(x) {
roxygen2:::markdown(x)
}
parse_rd <- function(x) {
con <- textConnection(x)
on.exit(close(con), add = TRUE)
tryCatch(
tools::parse_Rd(con, fragment = TRUE, encoding = "UTF-8"),
warning = function(cnd) NULL
)
}
rd_text <- function(x) {
x <- parse_rd(x)
if (is.null(x)) return(NA_character_)
path <- tempfile()
tools::Rd2txt(x, path, fragment = TRUE)
gsub("\n$", "", readChar(path, 100))
}
rd_html <- function(x) {
x <- parse_rd(x)
if (is.null(x)) return(NA_character_)
path <- tempfile()
tools::Rd2HTML(x, path, fragment = TRUE)
gsub("\n$", "", readChar(path, 1000))
}
rd_deparse <- function(x) {
paste0(as.character(x, deparse = TRUE), collapse = "")
} Let's first look at what happens when we surround symbols with symbols <- c(
"&", # needs escaping in html
"%", # Rd comment
"{}", # matched parens,
"{", # unmatched parens
"\\", # single backslash
"\\\\" # double backslash
)
tibble::tibble(
code = paste0("`", symbols, "`"),
rd = map_chr(code, roxy_md),
text = map_chr(rd, rd_text),
html = map_chr(rd, rd_html)
) This yields the following table (not using reprex here since the additional layer of quoting when processed via Rmd is giving me different results):
I think that's correct, as It's a bit more informative to surround the symbols in quotes because that should make a string that is always parseable: tibble::tibble(
code = paste0("`\"", symbols, "\"`"),
rd = map_chr(code, roxy_md),
text = map_chr(rd, rd_text),
html = map_chr(rd, rd_html)
)
I think row 5 is correct because it generates |
Maybe the problem is that we're translating (Making this change would also require adjusting pkgdown since currently it only auto-links the contents of |
Or maybe if the code parses, generate |
Impressive! It's been a long time since I looked at this issue, but it seems you've fixed it. I agree with your solution to conditionally generate |
Greetings
After encountering problems with a few special characters I undertook a comprehensive test to see what worked and what didn’t. My test process involved generating
@param
tags with descriptions containing special characters in 4 contexts: normal text, quoted text, normal code, and quoted code. For each context, I attempted three ways to get the character to work: naked, escaped, and double-escaped. By way of example, the test lines for special character ‘$’ (with apologies for the messed up formatting of the 'code' cases) are#' @param param0003 text unescaped normal: $
#' @param param0021 text unescaped quoted: "$"
#' @param param0039 text escaped normal: \$
#' @param param0057 text escaped quoted: "\$"
#' @param param0075 text double normal: \\$
#' @param param0093 text double quoted: "\\$"
#' @param param0111 code unescaped normal:
`$`#' @param param0129 code unescaped quoted:
`"$"`#' @param param0147 code escaped normal:
`$`#' @param param0165 code escaped quoted:
'"$"`#' @param param0183 code double normal:
`\\$``#' @param param0201 code double quoted:
`"\\$"`I placed the test lines in a .R file (attached), converted the roxygen to Rd using
devtools::document
, and converted the Rd to HTML usingtools::Rd2HTML
. Every so often I produced PDF usingR CMD Rd2pdf
just to be safe and never saw a case where the conversion to HTML worked, while the PDF conversion had problems.The special characters I tested were
& % $ # _ { } ~ ^ \ @ [ ] ( ) {} [] ()
. I included balanced pairs -{}
,[]
,()
- since balanced and unbalanced work differently in some contexts. These are the 10 LaTeX special characters, plus a few that I saw mentioned as special in roxygen or Rd, plus parentheses for good measure.The table below shows what needs to be typed to get each special character rendered correctly, or 'NONE' if none of my attempts worked.
Summary
$ & ( () ) @ [ [] ] ^ _ ~
#
works without drama except at start-of-text where (in a separate test) it triggers an error (“attempt to apply non-function”).\
failures only occur when it’s is at the end of the text or code. It works without drama elsewhere.%
works when escaped in text, but as reported by others, it doesn’t work in codeReasoning that the problems may be caused by Rd limitations, I redid the test generating Rd directly. The results were far better: everything worked except
\
in quoted code. I’m happy to provide the Rd results should that be of interest.It goes without saying I’m also happy to provide the Perl script I used to generate the test cases or modify the script to run additional test cases as you wish.
roxescape.R.zip
The text was updated successfully, but these errors were encountered: