Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unicode Remapping Causes Function-Documentation Mismatch #1121

Closed
billdenney opened this issue Jun 21, 2020 · 5 comments
Closed

Unicode Remapping Causes Function-Documentation Mismatch #1121

billdenney opened this issue Jun 21, 2020 · 5 comments
Labels
bug an unexpected problem or unintended behavior

Comments

@billdenney
Copy link
Contributor

This is related to #592.

I am trying to write a Unicode to ASCII simplifier which will map specific characters before doing a general mapping using stringi. One of the characters that often shows up in my work is the Greek character lowercase mu. There are two Unicode code points that typically have the same glyph but one is considered "Greek mu" and one is "micro sign". These are c("\u03bc", "\u00b5").

As I have learned more and more, operating system and locale are often important for this, and so this is on Windows 10 with the English/United States locale (full session info is in the reprex).

When I generated the documentation for these with devtools::document(), both of these were mapped to "\u00b5". So, I got an error in Travis-CI:

* checking for code/documentation mismatches ... WARNING
Codoc mismatches from documentation object 'unicode_to_ascii':
unicode_to_ascii.character
  Code: function(x, verbose = FALSE, pattern = c("μ", "µ"), replacement
                 = c("u", "u"), general_conversion = TRUE, ...)
  Docs: function(x, verbose = FALSE, pattern = c("µ", "µ"), replacement
                 = c("u", "u"), general_conversion = TRUE, ...)
  Mismatches in argument default values:
    Name: 'pattern' Code: c("μ", "µ") Docs: c("µ", "µ")

Is there any way to prevent this Unicode remapping?

In the reprex below, the last 4 lines are the problem. They should look like:

any(grepl(x=doc, pattern="\u03bc"))
#> [1] TRUE
any(grepl(x=doc, pattern="\u00b5"))
#> [1] FALSE
devtools::create("dummypack", list(License="GPL-2"), open=FALSE)
#> v Creating 'dummypack/'
#> v Setting active project to 'C:/Users/Bill Denney/AppData/Local/Temp/Rtmpsj40tU/reprex30b0128c2ad7/dummypack'
#> v Creating 'R/'
#> v Writing 'DESCRIPTION'
#> Package: dummypack
#> Title: What the Package Does (One Line, Title Case)
#> Version: 0.0.0.9000
#> Authors@R (parsed):
#>     * First Last <first.last@example.com> [aut, cre] (YOUR-ORCID-ID)
#> Description: What the package does (one paragraph).
#> License: GPL-2
#> Encoding: UTF-8
#> LazyData: true
#> Roxygen: list(markdown = TRUE)
#> RoxygenNote: 7.1.0
#> v Writing 'NAMESPACE'
#> v Setting active project to '<no active project>'

cat("
  #' Some function
  #' @importFrom graphics plot
  #' @param b Some label
  a <- function(b = '\\u03bc') {plot(1, main=b)}
", file="dummypack/R/a.R")

devtools::document("dummypack")
#> Updating dummypack documentation
#> Loading dummypack
#> Writing NAMESPACE
#> Writing a.Rd
doc <- readLines("dummypack/man/a.Rd")
any(grepl(x=doc, pattern="\u03bc"))
#> [1] FALSE
any(grepl(x=doc, pattern="\u00b5"))
#> [1] TRUE

Created on 2020-06-21 by the reprex package (v0.3.0)

Session info
devtools::session_info()
#> - Session info ---------------------------------------------------------------
#>  setting  value                       
#>  version  R version 4.0.1 (2020-06-06)
#>  os       Windows 10 x64              
#>  system   x86_64, mingw32             
#>  ui       RTerm                       
#>  language (EN)                        
#>  collate  English_United States.1252  
#>  ctype    English_United States.1252  
#>  tz       America/New_York            
#>  date     2020-06-21                  
#> 
#> - Packages -------------------------------------------------------------------
#>  ! package     * version    date       lib source        
#>    assertthat    0.2.1      2019-03-21 [1] CRAN (R 4.0.0)
#>    backports     1.1.7      2020-05-13 [1] CRAN (R 4.0.0)
#>    callr         3.4.3      2020-03-28 [1] CRAN (R 4.0.0)
#>    cli           2.0.2      2020-02-28 [1] CRAN (R 4.0.0)
#>    commonmark    1.7        2018-12-01 [1] CRAN (R 4.0.0)
#>    crayon        1.3.4      2017-09-16 [1] CRAN (R 4.0.0)
#>    desc          1.2.0      2018-05-01 [1] CRAN (R 4.0.0)
#>    devtools      2.3.0      2020-04-10 [1] CRAN (R 4.0.0)
#>    digest        0.6.25     2020-02-23 [1] CRAN (R 4.0.0)
#>  R dummypack   * 0.0.0.9000 <NA>       [?] <NA>          
#>    ellipsis      0.3.1      2020-05-15 [1] CRAN (R 4.0.0)
#>    evaluate      0.14       2019-05-28 [1] CRAN (R 4.0.0)
#>    fansi         0.4.1      2020-01-08 [1] CRAN (R 4.0.0)
#>    fs            1.4.1      2020-04-04 [1] CRAN (R 4.0.0)
#>    git2r         0.27.1     2020-05-03 [1] CRAN (R 4.0.0)
#>    glue          1.4.1      2020-05-13 [1] CRAN (R 4.0.0)
#>    highr         0.8        2019-03-20 [1] CRAN (R 4.0.0)
#>    htmltools     0.4.0      2019-10-04 [1] CRAN (R 4.0.0)
#>    knitr         1.28       2020-02-06 [1] CRAN (R 4.0.0)
#>    magrittr      1.5        2014-11-22 [1] CRAN (R 4.0.0)
#>    memoise       1.1.0      2017-04-21 [1] CRAN (R 4.0.0)
#>    pkgbuild      1.0.8      2020-05-07 [1] CRAN (R 4.0.0)
#>    pkgload       1.1.0      2020-05-29 [1] CRAN (R 4.0.0)
#>    prettyunits   1.1.1      2020-01-24 [1] CRAN (R 4.0.0)
#>    processx      3.4.2      2020-02-09 [1] CRAN (R 4.0.0)
#>    ps            1.3.3      2020-05-08 [1] CRAN (R 4.0.0)
#>    purrr         0.3.4      2020-04-17 [1] CRAN (R 4.0.0)
#>    R6            2.4.1      2019-11-12 [1] CRAN (R 4.0.0)
#>    Rcpp          1.0.4.6    2020-04-09 [1] CRAN (R 4.0.0)
#>    remotes       2.1.1      2020-02-15 [1] CRAN (R 4.0.0)
#>    rlang         0.4.6      2020-05-02 [1] CRAN (R 4.0.0)
#>    rmarkdown     2.2        2020-05-31 [1] CRAN (R 4.0.0)
#>    roxygen2      7.1.0      2020-03-11 [1] CRAN (R 4.0.0)
#>    rprojroot     1.3-2      2018-01-03 [1] CRAN (R 4.0.0)
#>    rstudioapi    0.11       2020-02-07 [1] CRAN (R 4.0.0)
#>    sessioninfo   1.1.1      2018-11-05 [1] CRAN (R 4.0.0)
#>    stringi       1.4.6      2020-02-17 [1] CRAN (R 4.0.0)
#>    stringr       1.4.0      2019-02-10 [1] CRAN (R 4.0.0)
#>    testthat      2.3.2      2020-03-02 [1] CRAN (R 4.0.0)
#>    usethis       1.6.1      2020-04-29 [1] CRAN (R 4.0.0)
#>    withr         2.2.0      2020-04-20 [1] CRAN (R 4.0.0)
#>    xfun          0.14       2020-05-20 [1] CRAN (R 4.0.0)
#>    xml2          1.3.2      2020-04-23 [1] CRAN (R 4.0.0)
#>    yaml          2.2.1      2020-02-01 [1] CRAN (R 4.0.0)
#> 
#> [1] C:/Users/Bill Denney/Documents/R/win-library/4.0
#> [2] C:/Program Files/R/R-4.0.1/library
#> 
#>  R -- Package was removed from disk.
@billdenney
Copy link
Contributor Author

As an addendum, I just tested this on Linux (Ubuntu 18.04), and it the issue does not occur there. This appears to be Windows-specific (or at least present on Windows and absent on Linux).

@gaborcsardi
Copy link
Member

My guess is that this is the parse() -> deparse() loop, I am not entirely why it only happens on Windows.

x <- '"\\u03bc"'
charToRaw(x)
#>  [1] 22 5c 75 30 33 62 63 22
y <- deparse(eval(parse(text = x)))
charToRaw(y)
#> [1] 22 b5 22

@gaborcsardi gaborcsardi added the bug an unexpected problem or unintended behavior label Jul 23, 2020
@billdenney
Copy link
Contributor Author

@gaborcsardi Thanks for the quick review.

To bidirectionally link the conversations about this issue: https://stat.ethz.ch/pipermail/r-package-devel/2020q3/005822.html

@gaborcsardi
Copy link
Member

Reprex:

roxygen2::roc_proc_text(roxygen2::rd_roclet(), "
  #' Title
  #' Desc
  fun <- function(x = '\u03bc') { }
")[[1]]
#> % Generated by roxygen2: do not edit by hand
#> % Please edit documentation in ./<text>
#> \name{fun}
#> \alias{fun}
#> \title{Title
#> Desc}
#> \usage{
#> fun(x = "μ")
#> }
#> \description{
#> Title
#> Desc
#> }

Created on 2020-07-23 by the reprex package (v0.3.0)

@hadley
Copy link
Member

hadley commented Apr 16, 2021

Duplicate of #1186

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug an unexpected problem or unintended behavior
Projects
None yet
Development

No branches or pull requests

3 participants