Suggestion: Autolink urls on man generated for DESCRIPTION. #1265

dieghernan · 2021-10-29T09:53:49Z

Hi,

I am producing the Rd file of my packages with the following :

#' @keywords internal
"_PACKAGE"

And it is fine, however I find that the text on the Description field of my DESCRIPION is not autolinked, as it happens in the rest of my documents (maybe it is not parsing as .md?).

Would you be open to explore this? I prepared a reprex to check how the same text is parsed diferently depending if it is placed on a regular .R file or in the DESCRIPTION file:

desc_text <- paste(
  "Tools to extract information from the Intergovernmental Organizations",
  "('IGO') Database , version 3, provided by the Correlates of War Project",
  "<https://correlatesofwar.org/>. See also Pevehouse, J. C. et al. (2020), ",
  " <doi:10.1177/0022343319881175>. Version 3 includes information from ",
  " 1815 to 2014."
)

text_fun <- paste0(
  "#' igoR: Intergovernmental Organizations Database
        #'
        #' @description ",
  desc_text,
  "\n#' @md\nfoo <- function() {}"
)



out <- roxygen2::roc_proc_text(
  roxygen2::rd_roclet(),
  text_fun
)[[1]]

# Autolinking on urls and dois
out
#> % Generated by roxygen2: do not edit by hand
#> % Please edit documentation in ./<text>
#> \name{foo}
#> \alias{foo}
#> \title{igoR: Intergovernmental Organizations Database}
#> \usage{
#> foo()
#> }
#> \description{
#> Tools to extract information from the Intergovernmental Organizations ('IGO') Database , version 3, provided by the Correlates of War Project \url{https://correlatesofwar.org/}. See also Pevehouse, J. C. et al. (2020),   \url{doi:10.1177/0022343319881175}. Version 3 includes information from   1815 to 2014.
#> }

# url and doi as a link \url{ } ;)
# Now create a package and use "_PACKAGE" for documenting

temp_pkg <- file.path(tempdir(), "test")
usethis::create_package(temp_pkg, open = FALSE)
#> v Creating 'C:/Users/XXXX/AppData/Local/Temp/RtmpOUqitX/test/'
#> v Setting active project to 'C:/Users/XXXX/AppData/Local/Temp/RtmpOUqitX/test'
#> v Creating 'R/'
#> v Writing 'DESCRIPTION'
#> Package: test
#> Title: What the Package Does (One Line, Title Case)
#> Version: 0.0.0.9000
#> Authors@R (parsed):
#>     * First Last <first.last@example.com> [aut, cre] (YOUR-ORCID-ID)
#> Description: What the package does (one paragraph).
#> License: `use_mit_license()`, `use_gpl3_license()` or friends to pick a
#>     license
#> Encoding: UTF-8
#> Roxygen: list(markdown = TRUE)
#> RoxygenNote: 7.1.2
#> v Writing 'NAMESPACE'
#> v Setting active project to '<no active project>'

desc_text <- paste(
  "Tools to extract information from the Intergovernmental Organizations",
  "('IGO') Database , version 3, provided by the Correlates of War Project",
  "<https://correlatesofwar.org/>. See also Pevehouse, J. C. et al. (2020), ",
  " <doi:10.1177/0022343319881175>. Version 3 includes information from ",
  " 1815 to 2014."
)

desc::desc_set(
  Description = desc_text,
  file = file.path(tempdir(), "test", "DESCRIPTION")
)
#> Package: test
#> Title: What the Package Does (One Line, Title Case)
#> Version: 0.0.0.9000
#> Authors@R (parsed):
#>     * First Last <first.last@example.com> [aut, cre] (YOUR-ORCID-ID)
#> Description: Tools to extract information from the Intergovernmental
#>     Organizations ('IGO') Database , version 3, provided by the Correlates
#>     of War Project <https://correlatesofwar.org/>. See also Pevehouse, J.
#>     C. et al. (2020), <doi:10.1177/0022343319881175>. Version 3 includes
#>     information from 1815 to 2014.
#> License: `use_mit_license()`, `use_gpl3_license()` or friends to pick a
#>     license
#> Encoding: UTF-8
#> Roxygen: list(markdown = TRUE)
#> RoxygenNote: 7.1.2


source <- "
  #' @keywords internal
  \"_PACKAGE\""


write(source, file.path(temp_pkg, "R", "test-package.R"))


roxygen2::roxygenise(temp_pkg)
#> i Loading test
#> Writing test-package.Rd

readLines(con = file.path(temp_pkg, "man", "test-package.Rd"))[8:10]
#> [1] "\\description{"                                                                                                                                                                                                                                                                                          
#> [2] "Tools to extract information from the Intergovernmental Organizations ('IGO') Database , version 3, provided by the Correlates of War Project <https://correlatesofwar.org/>. See also Pevehouse, J. C. et al. (2020), <doi:10.1177/0022343319881175>. Version 3 includes information from 1815 to 2014."
#> [3] "}"

# No links :(

^{Created on 2021-10-29 by the reprex package (v2.0.1)}

The text was updated successfully, but these errors were encountered:

gaborcsardi · 2021-10-29T10:08:25Z

Indeed it is not parsed as md, because it is not supposed to be md. IDK if there is a good solution here.

FWIW one workaround is to avoid using "_PACKAGE" and instead use @docType package and create the page of the package manually.

Bisaloo · 2022-03-29T13:38:25Z

I understand the choice of not parsing as md but I think it would make sense to convert URLs and <doi:...> or <arxiv:...> tags since those are supported (and encouraged) by CRAN.

hadley · 2022-03-29T18:36:21Z

This would require some extra manipulation in object_defaults.package(). Do we have a list of these special urls? Neither is mentioned in Writing R extensions.

dieghernan · 2022-03-29T19:27:35Z

Hi, as per the Checklist for CRAN submissions, I think those special urls are just <doi: ...> and <arXiv:...>, aside of regular urls.

Bisaloo · 2022-03-29T20:37:44Z

Yes, it seems to be all. Maëlle identified the source for this feature and I cannot see anything else: ropensci/roweb3#56 (comment)

dieghernan · 2022-03-30T10:16:50Z

So out of curiosity, I made a small analysis of the Description field of the DESCRIPTION files of all the CRAN packages (based on this StackOverflow question). I don't want to overload the issue, so I leave here a quick summary:

8.148 CRAN packages (out of 19.073 at date 2022-03-30, i.e. 42.72%) have a string on the Description field that matches the pattern <text__numbers_and_symbols>. I used the regex "<(\\S*?)>", that still returns some false positives, but I just went along with it.
There are a total of 13.531 strings with the corresponding pattern. Out of curiosity, the package lactcurves has 45!! strings with the pattern.
I try to find the domain on the pattern <text__numbers_and_symbols>, using as delimiters . and :. Lots of false positives here, but the most common patterns are:

domain	n	porc	cumsum	cumporc
<doi:	7862	58.104	58.104	58.104
<https:	2902	21.447	79.551	79.551
<arXiv:	940	6.947	86.498	86.498
<DOI:	869	6.422	92.920	92.920
<http:	785	5.801	98.721	98.721
<ISBN:	38	0.281	99.002	99.002
<arxiv:	37	0.273	99.275	99.275
<isbn:	15	0.111	99.386	99.386
<10.	10	0.074	99.460	99.460
<doi.	10	0.074	99.534	99.534

Full reprex

library(stringr)
library(dplyr, warn.conflicts = FALSE)
#> Warning: package 'dplyr' was built under R version 4.1.2
library(tidyr, warn.conflicts = FALSE)
#> Warning: package 'tidyr' was built under R version 4.1.2

cran <- tools::CRAN_package_db()


cran_mod <- cran %>%
  mutate(date_pack = as.Date(str_split_fixed(Packaged, " ", 2)[, 1])) %>%
  select(Package, date_pack)


extract_urls <- str_extract_all(cran$Description,
  # Regex can be improved ...
  regex("<(\\S*?)>"),
  simplify = TRUE
) %>%
  as_tibble() %>%
  bind_cols(cran_mod, .) %>%
  filter(V1 != "")
#> Warning: The `x` argument of `as_tibble.matrix()` must have unique column names if `.name_repair` is omitted as of tibble 2.0.0.
#> Using compatibility `.name_repair`.
#> This warning is displayed once every 8 hours.
#> Call `lifecycle::last_lifecycle_warnings()` to see where this warning was generated.



paste0(
  "Number of packages with <pattern> in Description: ",
  nrow(extract_urls),
  " out of ", nrow(cran), " (",
  round(100 * nrow(extract_urls) / nrow(cran), 2),
  "%)"
)
#> [1] "Number of packages with <pattern> in Description: 8148 out of 19073 (42.72%)"



# Analyse patterns
allurls <- extract_urls %>%
  pivot_longer(
    cols = -c(Package, date_pack),
    values_to = "url"
  ) %>%
  # Remove blanks. etc
  filter(url != "" & !is.na(url))

# Total urls enclosed by <>
nrow(allurls)
#> [1] 13531

# Split by pattern, I would use : and . for splitting

allurls <- allurls %>%
  mutate(split = gsub(".", ".|",
                      gsub(":", ":|", url, fixed = TRUE),
                      fixed = TRUE),
         domain = str_split_fixed(split, "\\|", n = 2)[, 1]
         )

alldomains <- allurls %>%
  group_by(domain) %>%
  summarise(
    max_date = max(date_pack, na.rm = TRUE),
    n = n()
  ) %>%
  arrange(desc(n))

alldomains <- alldomains %>%
  mutate(
    porc = round(100 * n / sum(alldomains$n), 3),
    cumporc = cumsum(porc)
  )

head(alldomains, 10)
#> # A tibble: 10 x 5
#>    domain  max_date       n   porc cumporc
#>    <chr>   <date>     <int>  <dbl>   <dbl>
#>  1 <doi:   2022-03-30  7862 58.1      58.1
#>  2 <https: 2022-03-30  2902 21.4      79.6
#>  3 <arXiv: 2022-03-30   940  6.95     86.5
#>  4 <DOI:   2022-03-29   869  6.42     92.9
#>  5 <http:  2022-03-29   785  5.80     98.7
#>  6 <ISBN:  2022-03-15    38  0.281    99.0
#>  7 <arxiv: 2022-03-15    37  0.273    99.3
#>  8 <isbn:  2022-02-20    15  0.111    99.4
#>  9 <10.    2020-12-05    10  0.074    99.5
#> 10 <doi.   2020-07-28    10  0.074    99.5

^{Created on 2022-03-30 by the reprex package (v2.0.1)}

hadley · 2022-03-30T12:37:51Z

Thanks for the investigation! Do you also want to do a PR? 😄

Fixes #1265. Fixes #1164.

dieghernan mentioned this issue Oct 29, 2021

Submission: {cffr} Generate Citation File Format ('cff') Metadata for R Packages ropensci/software-review#463

Closed

27 tasks

hadley added feature a feature request or enhancement rd ✍️ labels Mar 29, 2022

This was referenced Mar 31, 2022

How to deal with DOIs that include ">" #1164

Closed

Autolink url from package DESCRIPTION when using "_PACKAGE" #1315

Merged

hadley closed this as completed in #1315 Apr 6, 2022

hadley pushed a commit that referenced this issue Apr 6, 2022

Improving package description auto-linking (#1315)

0ecdcde

Fixes #1265. Fixes #1164.

dieghernan mentioned this issue Apr 6, 2022

DOI handling on the DESCRIPTION file docs #1321

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Suggestion: Autolink urls on man generated for DESCRIPTION. #1265

Suggestion: Autolink urls on man generated for DESCRIPTION. #1265

dieghernan commented Oct 29, 2021 •

edited

Loading

gaborcsardi commented Oct 29, 2021

Bisaloo commented Mar 29, 2022

hadley commented Mar 29, 2022

dieghernan commented Mar 29, 2022 •

edited

Loading

Bisaloo commented Mar 29, 2022

dieghernan commented Mar 30, 2022

hadley commented Mar 30, 2022

Suggestion: Autolink urls on man generated for DESCRIPTION. #1265

Suggestion: Autolink urls on man generated for DESCRIPTION. #1265

Comments

dieghernan commented Oct 29, 2021 • edited Loading

gaborcsardi commented Oct 29, 2021

Bisaloo commented Mar 29, 2022

hadley commented Mar 29, 2022

dieghernan commented Mar 29, 2022 • edited Loading

Bisaloo commented Mar 29, 2022

dieghernan commented Mar 30, 2022

hadley commented Mar 30, 2022

dieghernan commented Oct 29, 2021 •

edited

Loading

dieghernan commented Mar 29, 2022 •

edited

Loading