-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
This PR addresses issue #4 (format_job_args: no visible binding for global variable ‘tags’ and ‘fields_metadata’) #28
Conversation
R/.RData
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file should be removed from this PR. Additionally, let's add *.RData
to the .gitignore file for this repository.
R/lineage.R
Outdated
@@ -358,21 +358,21 @@ ipg2lin <- function(accessions, ipg_file, | |||
if (length(refseq_rows) != 0) { | |||
refseq_ipg_dt <- ipg_dt[refseq_rows, ] | |||
refseq_lins <- GCA2lin(refseq_ipg_dt, | |||
assembly_path = refseq_assembly_path, | |||
.data$assembly_path = refseq_assembly_path, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The .data
prefix is not required in this instance, as assembly_path
is an argument to GCA2lin()
.
R/lineage.R
Outdated
genbank_lins <- GCA2lin(.data$gca_ipg_dt, | ||
.data$assembly_path = genbank_assembly_path, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The .data
prefix is not required in this instance, as gca_ipg_dt
and assembly_path
are an arguments to GCA2lin()
.
R/lineage.R
Outdated
lineagelookup_path | ||
) | ||
} | ||
|
||
|
||
lins <- GCA2lin(prot_data = ipg_dt, assembly_path, lineagelookup_path) | ||
lins <- lins[!is.na(Lineage)] %>% unique() | ||
lins <- GCA2lin(prot_data = ipg_dt, .data$assembly_path, lineagelookup_path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The .data
prefix is not required in this instance, again we appear to be specifying function arguments
R/lineage.R
Outdated
@@ -323,7 +323,7 @@ ipg2lin <- function(accessions, ipg_file, | |||
ipg_dt <- fread(ipg_file, sep = "\t", fill = T) | |||
|
|||
accessions <- unique(accessions) | |||
ipg_dt <- ipg_dt[Protein %in% accessions] | |||
ipg_dt <- ipg_dt[.data$Protein %in% accessions] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, data.table -- this package requires a different solution to the "no visible binding for global variable" warning. .data
will not work in this context. When subsetting, as in this line, we should define "Protein" in a separate .R file, called globals.R
. The contents of this file will resemble the following:
#' @importFrom utils globalVariables
utils::globalVariables(c("Protein"))
We can use this file to put all data.table variables that aren't defined otherwise to clear the R-CMD warnings.
R/acc2lin.R
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is more of just a general comment. It looks like ipg2lin()
is present here, and also in lineage.R. Will need to disambiguate the functions to see which is in active use.
R/assign_job_queue.R
Outdated
@@ -94,7 +94,7 @@ get_proc_medians <- function(dir_job_results) { | |||
dplyr::summarise( | |||
dplyr::across( | |||
dplyr::everything(), | |||
\(x) median(x, na.rm = TRUE) | |||
\(x) .data$median(x, na.rm = TRUE) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here, the error we see for this function is:
"no visible global function definition for ‘median’"
In this case, we need to add an Rd tag, specifying which median function to use. In this case adding:
#' @importFrom stats median
To the Roxygen2 skeleton above this function should clear the warning
R/assign_job_queue.R
Outdated
@@ -126,7 +126,7 @@ write_proc_medians_table <- function(dir_job_results, filepath) { | |||
names_to = "process", | |||
values_to = "median_seconds" | |||
) |> | |||
dplyr::arrange(dplyr::desc(median_seconds)) | |||
dplyr::arrange(dplyr::desc(.data$median_seconds)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent, this is exactly how we would like to use the .data
prefix!
R/cleanup.R
Outdated
@@ -342,7 +342,7 @@ remove_tails <- function(prot, by_column = "DomArch", | |||
# !! Insert line to read domains_keep | |||
|
|||
# Contains all domains separated by "|" | |||
domains_for_grep <- paste(domains_keep$domains, collapse = "|") | |||
domains_for_grep <- paste(.data$domains_keep$domains, collapse = "|") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't look like NSE is used here, so the .data prefix is not needed. However, we do need to figure out how domains_keep
is defined. It may have originally been sourced from the Global Environment
R/cleanup.R
Outdated
@@ -693,35 +693,35 @@ cleanup_GeneDesc <- function(prot, column) { | |||
pick_longer_duplicate <- function(prot, column) { | |||
col <- sym(column) | |||
|
|||
prot$row.orig <- 1:nrow(prot) | |||
prot$.data$row.orig <- 1:nrow(prot) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not needed here.
R/cleanup.R
Outdated
|
||
# dup_rows[which(nchar(pull(dup_rows,{{col}})) == max(nchar(pull(dup_rows,{{col}}))))[2:nrow(dup_rows)], "row.orig"] | ||
remove_rows <- c(remove_rows, to_remove) | ||
} | ||
|
||
# grab all the longest rows | ||
unique_dups <- prot[-remove_rows, ] %>% select(-row.orig) | ||
unique_dups <- prot[-remove_rows, ] %>% select(-.data$row.orig) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is likely where the warning was generated from for this function and this is the correct application of the .data prefix
R/cleanup.R
Outdated
|
||
longest_rows <- c() | ||
remove_rows <- c() | ||
for (acc in dup_acc) { | ||
dup_rows <- dups %>% filter(AccNum == acc) | ||
dup_rows <- dups %>% filter(.data$AccNum == acc) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is also a good fix for the warning generated for AccNum
R/cleanup.R
Outdated
|
||
# Get list of duplicates | ||
dups <- prot %>% | ||
group_by(AccNum) %>% | ||
group_by(.data$AccNum) %>% |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
R/cleanup.R
Outdated
summarize("count" = n()) %>% | ||
filter(count > 1) %>% | ||
arrange(-count) %>% | ||
merge(prot, by = "AccNum") | ||
|
||
dup_acc <- dups$AccNum | ||
dup_acc <- dups$.data$AccNum |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.data is not needed here
R/ipr2viz.R
Outdated
ipr_out <- read_tsv(infile_ipr, col_names = T, col_types = iprscan_cols) | ||
ipr_out <- ipr_out %>% filter(Name %in% accessions) | ||
ipr_out <- read_tsv(infile_ipr, col_names = T, col_types = .data$iprscan_cols) | ||
ipr_out <- ipr_out %>% filter(.data$Name %in% accessions) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
R/ipr2viz.R
Outdated
analysis_cols <- paste0("DomArch.", analysis) | ||
infile_full <- infile_full %>% select(analysis_cols, Lineage_short, QueryName, PcPositive, AccNum) | ||
infile_full <- infile_full %>% select(analysis_cols, .data$Lineage_short, .data$QueryName, .data$PcPositive, .data$AccNum) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
R/ipr2viz.R
Outdated
@@ -152,35 +152,35 @@ ipr2viz <- function(infile_ipr = NULL, infile_full = NULL, accessions = c(), | |||
# Filter by Top Accessions per Accession per DomArch and Lineage | |||
ipr_out <- subset( | |||
ipr_out, | |||
ipr_out$AccNum %in% top_acc | |||
ipr_out$.data$AccNum %in% top_acc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This use of .data is unnecessary. Here, we can already tell that AccNum
comes from ipr_out
as it was explicitly defined as ipr_out$AccNum
R/ipr2viz.R
Outdated
ipr_out <- ipr_out %>% arrange(.data$AccNum, .data$StartLoc, .data$StopLoc) | ||
ipr_out_sub <- filter( | ||
ipr_out, | ||
grepl(pattern = analysis, x = Analysis) | ||
grepl(pattern = analysis, x = .data$Analysis) | ||
) | ||
# dynamic analysis labeller | ||
analyses <- ipr_out_sub %>% | ||
select(Analysis) %>% | ||
select(.data$Analysis) %>% | ||
distinct() | ||
analysis_labeler <- analyses %>% | ||
pivot_wider(names_from = Analysis, values_from = Analysis) | ||
pivot_wider(names_from = .data$Analysis, values_from = .data$Analysis) | ||
|
||
lookup_tbl_path <- "/data/research/jravilab/common_data/cln_lookup_tbl.tsv" | ||
lookup_tbl <- read_tsv(lookup_tbl_path, col_names = T, col_types = lookup_table_cols) | ||
lookup_tbl <- read_tsv(lookup_tbl_path, col_names = T, col_types = .data$lookup_table_cols) | ||
|
||
lookup_tbl <- lookup_tbl %>% select(-ShortName) # Already has ShortName -- Just needs SignDesc | ||
# ipr_out_sub = ipr_out_sub %>% select(-ShortName) | ||
lookup_tbl <- lookup_tbl %>% select(-.data$ShortName) # Already has ShortName -- Just needs SignDesc | ||
# ipr_out_sub = ipr_out_sub %>% select(-.data$ShortName) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
R/ipr2viz.R
Outdated
# TODO: Fix lookup table and uncomment below | ||
# ipr_out_sub <- merge(ipr_out_sub, lookup_tbl, by.x = "DB.ID", by.y = "DB.ID") | ||
|
||
## PLOTTING | ||
## domains as separate arrows | ||
# For odering with tree | ||
# ipr_out_sub$Name <- paste0(" ", ipr_out_sub$Name) | ||
# ipr_out_sub$.data$Name <- paste0(" ", ipr_out_sub$.data$Name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.data is not needed here, as ipr_out_sub$Name
was explicit about where the variable came from.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @SunSummoner,
Thank you so much for your contribution! There is a lot that works well in this PR, and I appreciate the effort you put into tackling some tricky situations in the files and functions you started with.
I have not commented on all code changes in this PR but have instead attempted to provide representative examples of where .data was used correctly and where it was not.
One thing that will help with identifying these issues more accurately is the liberal use of running devtools::check()
. With this PR, many of the tricky situations where .data was applied incorrectly could have been discovered locally. I'd recommend changing a few global variable definitions, running devtools::check()
and evaluating the result to see if the warning has disappeared.
Again, this is a strange issue because you will generally not notice it outside of creating an RPackage. Go ahead and attempt a few revisions and feel free to reach out with questions!
Thank you @the-mayer I'll make the required changes in the PR. I was having some issue with |
What kind of change(s) are included?
Checklist