-
Notifications
You must be signed in to change notification settings - Fork 192
New strings_as_factors_linter #1020
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
11 commits
Select commit
Hold shift + click to select a range
9c2d41a
New strings_as_factors_linter
MichaelChirico cf30a09
Merge branch 'master' into strings_as_factors
MichaelChirico 11b652c
Merge branch 'master' into strings_as_factors
MichaelChirico fe151cc
improve lint message
MichaelChirico 3fd4de8
Merge branch 'master' into strings_as_factors
MichaelChirico d80fec7
more feedback
MichaelChirico b0a8097
Merge branch 'master' into strings_as_factors
MichaelChirico 576a9ff
Merge branch 'strings_as_factors' of github.com:r-lib/lintr into stri…
MichaelChirico 3b2c757
grammar in message
MichaelChirico 16b0c3e
condense tests with msg in variable
MichaelChirico 2f50ba4
tweak NEWS
MichaelChirico File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,78 @@ | ||
#' Identify cases where stringsAsFactors should be supplied explicitly | ||
#' | ||
#' Designed for code bases written for versions of R before 4.0 seeking to upgrade to R >= 4.0, where | ||
#' one of the biggest pain points will surely be the flipping of the | ||
#' default value of `stringsAsFactors` from `TRUE` to `FALSE`. | ||
#' | ||
#' It's not always possible to tell statically whether the change will break | ||
#' existing code because R is dynamically typed -- e.g. in `data.frame(x)` | ||
#' if `x` is a string, this code will be affected, but if `x` is a number, | ||
#' this code will be unaffected. However, in `data.frame(x = 'a')`, the | ||
#' output will unambiguously be affected. We can instead supply | ||
#' `stringsAsFactors = TRUE`, which will make this code backwards-compatible. | ||
#' | ||
#' See <https://developer.r-project.org/Blog/public/2020/02/16/stringsasfactors>. | ||
#' | ||
#' @evalRd rd_tags("strings_as_factors_linter") | ||
#' @seealso [linters] for a complete list of linters available in lintr. | ||
#' @export | ||
strings_as_factors_linter <- function() { | ||
Linter(function(source_file) { | ||
if (length(source_file$xml_parsed_content) == 0L) { | ||
return(list()) | ||
} | ||
|
||
xml <- source_file$xml_parsed_content | ||
|
||
atomic_const_cond <- "STR_CONST" | ||
# like data.frame('a') and data.frame(c('a', 'b')) | ||
c_cond <- xp_and( | ||
"expr[SYMBOL_FUNCTION_CALL[text() = 'c']]", | ||
"expr[STR_CONST]", | ||
"not(expr[SYMBOL or expr])" | ||
) | ||
# save and re-use for rep_cond which satisfies the same condition | ||
atomic_or_c_cond <- sprintf("%s or (%s)", atomic_const_cond, c_cond) | ||
# like data.frame(rep(c('a', 'b'), 10L)) | ||
rep_cond <- xp_and( | ||
"SYMBOL_FUNCTION_CALL[text() = 'rep']", | ||
sprintf("following-sibling::expr[1][%s]", atomic_or_c_cond) | ||
) | ||
|
||
# like data.frame(character()) or data.frame(as.character(1)) | ||
known_character_funs <- c( | ||
"character", "as.character", "paste", "sprintf", | ||
"format", "formatC", "prettyNum", "toString", "encodeString" | ||
) | ||
character_cond <- sprintf( | ||
"expr[SYMBOL_FUNCTION_CALL[%s]]", | ||
xp_text_in_table(known_character_funs) | ||
) | ||
|
||
expr_cond <- xp_and( | ||
'expr[SYMBOL_FUNCTION_CALL[text() = "data.frame"]]', | ||
sprintf( | ||
"expr[((%s) or (expr[%s]) or (%s)) and not(%s)]", | ||
atomic_or_c_cond, | ||
rep_cond, | ||
character_cond, | ||
"preceding-sibling::SYMBOL_SUB[1][text() = 'row.names']" | ||
), | ||
'not(SYMBOL_SUB[text() = "stringsAsFactors"])' | ||
) | ||
bad_expr <- xml2::xml_find_all(xml, sprintf("//expr[%s]", expr_cond)) | ||
|
||
return(lapply( | ||
bad_expr, | ||
xml_nodes_to_lint, | ||
source_file = source_file, | ||
lint_message = paste( | ||
"This code relies on the default value of stringsAsFactors,", | ||
"which changed in version R 4.0. Please supply an explicit value for", | ||
"stringsAsFactors for this code to work with versions of R both before", | ||
"and after this switch." | ||
), | ||
type = "warning" | ||
)) | ||
}) | ||
} |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Oops, something went wrong.
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Oops, something went wrong.
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Oops, something went wrong.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,73 @@ | ||
test_that("strings_as_factors_linter skips allowed usages", { | ||
expect_lint("data.frame(1:3)", NULL, strings_as_factors_linter()) | ||
expect_lint("data.frame(x = 1:3)", NULL, strings_as_factors_linter()) | ||
|
||
expect_lint("data.frame(x = 'a', stringsAsFactors = TRUE)", NULL, strings_as_factors_linter()) | ||
expect_lint("data.frame(x = 'a', stringsAsFactors = FALSE)", NULL, strings_as_factors_linter()) | ||
expect_lint("data.frame(x = c('a', 'b'), stringsAsFactors = FALSE)", NULL, strings_as_factors_linter()) | ||
|
||
# strings in argument names to c() don't get linted | ||
# TODO(michaelchirico): fix this; currently generates a false positive. | ||
# expect_lint("data.frame(x = c('a b' = 1L, 'b c' = 2L))", NULL, strings_as_factors_linter()) | ||
|
||
# characters supplied to row.names are not affected | ||
expect_lint("data.frame(x = 1:3, row.names = c('a', 'b', 'c'))", NULL, strings_as_factors_linter()) | ||
|
||
# ambiguous cases passes | ||
expect_lint("data.frame(x = c(xx, 'a'))", NULL, strings_as_factors_linter()) | ||
expect_lint("data.frame(x = c(foo(y), 'a'))", NULL, strings_as_factors_linter()) | ||
}) | ||
|
||
test_that("strings_as_factors_linter blocks simple disallowed usages", { | ||
msg <- "This code relies on the default value of stringsAsFactors" | ||
|
||
expect_lint("data.frame('a')", msg, strings_as_factors_linter()) | ||
expect_lint("data.frame(c('a', 'b'))", msg, strings_as_factors_linter()) | ||
expect_lint("data.frame(x = 1:5, y = 'b')", msg, strings_as_factors_linter()) | ||
expect_lint("data.frame(x = 1:5, y, z = 'b')", msg, strings_as_factors_linter()) | ||
|
||
# catch row.names when combined with a character vector | ||
expect_lint("data.frame(x = c('c', 'd', 'e'), row.names = c('a', 'b', 'c'))", msg, strings_as_factors_linter()) | ||
expect_lint("data.frame(row.names = c('c', 'd', 'e'), x = c('a', 'b', 'c'))", msg, strings_as_factors_linter()) | ||
|
||
# when everything is a literal, type promotion means the column is character | ||
expect_lint("data.frame(x = c(TRUE, 'a'))", msg, strings_as_factors_linter()) | ||
}) | ||
|
||
test_that("strings_as_factors_linters catches rep(char) usages", { | ||
msg <- "This code relies on the default value of stringsAsFactors" | ||
|
||
expect_lint("data.frame(rep('a', 10L))", msg, strings_as_factors_linter()) | ||
expect_lint("data.frame(rep(c('a', 'b'), 10L))", msg, strings_as_factors_linter()) | ||
|
||
# literal char, not mixed or non-char | ||
expect_lint("data.frame(rep(1L, 10L))", NULL, strings_as_factors_linter()) | ||
expect_lint("data.frame(rep(c(x, 'a'), 10L))", NULL, strings_as_factors_linter()) | ||
# however, type promotion of literals is caught | ||
expect_lint("data.frame(rep(c(TRUE, 'a'), 10L))", msg, strings_as_factors_linter()) | ||
}) | ||
|
||
test_that("strings_as_factors_linter catches character(), as.character() usages", { | ||
msg <- "This code relies on the default value of stringsAsFactors" | ||
|
||
expect_lint("data.frame(a = character())", msg, strings_as_factors_linter()) | ||
expect_lint("data.frame(a = character(1L))", msg, strings_as_factors_linter()) | ||
expect_lint("data.frame(a = as.character(x))", msg, strings_as_factors_linter()) | ||
|
||
# but not for row.names | ||
expect_lint("data.frame(a = 1:10, row.names = as.character(1:10))", NULL, strings_as_factors_linter()) | ||
}) | ||
|
||
test_that("strings_as_factors_linter catches more functions with string output", { | ||
msg <- "This code relies on the default value of stringsAsFactors" | ||
|
||
expect_lint("data.frame(a = paste(1, 2, 3))", msg, strings_as_factors_linter()) | ||
expect_lint("data.frame(a = sprintf('%d', 1:10))", msg, strings_as_factors_linter()) | ||
expect_lint("data.frame(a = format(x, just = 'right'))", msg, strings_as_factors_linter()) | ||
expect_lint("data.frame(a = formatC(x, format = '%d'))", msg, strings_as_factors_linter()) | ||
expect_lint("data.frame(a = prettyNum(x, big.mark = ','))", msg, strings_as_factors_linter()) | ||
expect_lint("data.frame(a = toString(x))", msg, strings_as_factors_linter()) | ||
expect_lint("data.frame(a = encodeString(x))", msg, strings_as_factors_linter()) | ||
# but not for row.names | ||
expect_lint("data.frame(a = 1:10, row.names = paste(1:10))", NULL, strings_as_factors_linter()) | ||
}) |
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.