Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

This PR addresses issue #4 (format_job_args: no visible binding for global variable ‘tags’ and ‘fields_metadata’) #28

Closed
wants to merge 0 commits into from

Conversation

SunSummoner
Copy link
Collaborator

What kind of change(s) are included?

  • Feature (adds or updates new capabilities)
  • Bug fix (fixes an issue).
  • Enhancement (adds functionality).
  • Breaking change (these changes would cause existing functionality to not work as expected).

Checklist

  • I have read and followed the CONTRIBUTING.md guidelines.
  • I have searched for existing content to ensure this is not a duplicate.
  • I have performed a self-review of these additions (including spelling, grammar, and related).
  • I have deleted all non-relevant text in this pull request template.
  • @the-mayer please take a look at this. Thank you!

@the-mayer the-mayer self-assigned this Oct 8, 2024
@the-mayer the-mayer added the enhancement New feature or request label Oct 8, 2024
R/.RData Outdated
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file should be removed from this PR. Additionally, let's add *.RData to the .gitignore file for this repository.

R/lineage.R Outdated
@@ -358,21 +358,21 @@ ipg2lin <- function(accessions, ipg_file,
if (length(refseq_rows) != 0) {
refseq_ipg_dt <- ipg_dt[refseq_rows, ]
refseq_lins <- GCA2lin(refseq_ipg_dt,
assembly_path = refseq_assembly_path,
.data$assembly_path = refseq_assembly_path,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The .data prefix is not required in this instance, as assembly_path is an argument to GCA2lin().

R/lineage.R Outdated
Comment on lines 367 to 368
genbank_lins <- GCA2lin(.data$gca_ipg_dt,
.data$assembly_path = genbank_assembly_path,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The .data prefix is not required in this instance, as gca_ipg_dt and assembly_path are an arguments to GCA2lin().

R/lineage.R Outdated
lineagelookup_path
)
}


lins <- GCA2lin(prot_data = ipg_dt, assembly_path, lineagelookup_path)
lins <- lins[!is.na(Lineage)] %>% unique()
lins <- GCA2lin(prot_data = ipg_dt, .data$assembly_path, lineagelookup_path)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The .data prefix is not required in this instance, again we appear to be specifying function arguments

R/lineage.R Outdated
@@ -323,7 +323,7 @@ ipg2lin <- function(accessions, ipg_file,
ipg_dt <- fread(ipg_file, sep = "\t", fill = T)

accessions <- unique(accessions)
ipg_dt <- ipg_dt[Protein %in% accessions]
ipg_dt <- ipg_dt[.data$Protein %in% accessions]
Copy link
Collaborator

@the-mayer the-mayer Oct 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, data.table -- this package requires a different solution to the "no visible binding for global variable" warning. .data will not work in this context. When subsetting, as in this line, we should define "Protein" in a separate .R file, called globals.R. The contents of this file will resemble the following:

#' @importFrom utils globalVariables
utils::globalVariables(c("Protein"))

We can use this file to put all data.table variables that aren't defined otherwise to clear the R-CMD warnings.

R/acc2lin.R Outdated
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is more of just a general comment. It looks like ipg2lin() is present here, and also in lineage.R. Will need to disambiguate the functions to see which is in active use.

@@ -94,7 +94,7 @@ get_proc_medians <- function(dir_job_results) {
dplyr::summarise(
dplyr::across(
dplyr::everything(),
\(x) median(x, na.rm = TRUE)
\(x) .data$median(x, na.rm = TRUE)
Copy link
Collaborator

@the-mayer the-mayer Oct 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here, the error we see for this function is:
"no visible global function definition for ‘median’"

In this case, we need to add an Rd tag, specifying which median function to use. In this case adding:

#' @importFrom stats median

To the Roxygen2 skeleton above this function should clear the warning

@@ -126,7 +126,7 @@ write_proc_medians_table <- function(dir_job_results, filepath) {
names_to = "process",
values_to = "median_seconds"
) |>
dplyr::arrange(dplyr::desc(median_seconds))
dplyr::arrange(dplyr::desc(.data$median_seconds))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent, this is exactly how we would like to use the .data prefix!

R/cleanup.R Outdated
@@ -342,7 +342,7 @@ remove_tails <- function(prot, by_column = "DomArch",
# !! Insert line to read domains_keep

# Contains all domains separated by "|"
domains_for_grep <- paste(domains_keep$domains, collapse = "|")
domains_for_grep <- paste(.data$domains_keep$domains, collapse = "|")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't look like NSE is used here, so the .data prefix is not needed. However, we do need to figure out how domains_keep is defined. It may have originally been sourced from the Global Environment

R/cleanup.R Outdated
@@ -693,35 +693,35 @@ cleanup_GeneDesc <- function(prot, column) {
pick_longer_duplicate <- function(prot, column) {
col <- sym(column)

prot$row.orig <- 1:nrow(prot)
prot$.data$row.orig <- 1:nrow(prot)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not needed here.

R/cleanup.R Outdated

# dup_rows[which(nchar(pull(dup_rows,{{col}})) == max(nchar(pull(dup_rows,{{col}}))))[2:nrow(dup_rows)], "row.orig"]
remove_rows <- c(remove_rows, to_remove)
}

# grab all the longest rows
unique_dups <- prot[-remove_rows, ] %>% select(-row.orig)
unique_dups <- prot[-remove_rows, ] %>% select(-.data$row.orig)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is likely where the warning was generated from for this function and this is the correct application of the .data prefix

R/cleanup.R Outdated

longest_rows <- c()
remove_rows <- c()
for (acc in dup_acc) {
dup_rows <- dups %>% filter(AccNum == acc)
dup_rows <- dups %>% filter(.data$AccNum == acc)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is also a good fix for the warning generated for AccNum

R/cleanup.R Outdated

# Get list of duplicates
dups <- prot %>%
group_by(AccNum) %>%
group_by(.data$AccNum) %>%
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

R/cleanup.R Outdated
summarize("count" = n()) %>%
filter(count > 1) %>%
arrange(-count) %>%
merge(prot, by = "AccNum")

dup_acc <- dups$AccNum
dup_acc <- dups$.data$AccNum
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.data is not needed here

R/ipr2viz.R Outdated
ipr_out <- read_tsv(infile_ipr, col_names = T, col_types = iprscan_cols)
ipr_out <- ipr_out %>% filter(Name %in% accessions)
ipr_out <- read_tsv(infile_ipr, col_names = T, col_types = .data$iprscan_cols)
ipr_out <- ipr_out %>% filter(.data$Name %in% accessions)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

R/ipr2viz.R Outdated
analysis_cols <- paste0("DomArch.", analysis)
infile_full <- infile_full %>% select(analysis_cols, Lineage_short, QueryName, PcPositive, AccNum)
infile_full <- infile_full %>% select(analysis_cols, .data$Lineage_short, .data$QueryName, .data$PcPositive, .data$AccNum)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

R/ipr2viz.R Outdated
@@ -152,35 +152,35 @@ ipr2viz <- function(infile_ipr = NULL, infile_full = NULL, accessions = c(),
# Filter by Top Accessions per Accession per DomArch and Lineage
ipr_out <- subset(
ipr_out,
ipr_out$AccNum %in% top_acc
ipr_out$.data$AccNum %in% top_acc
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This use of .data is unnecessary. Here, we can already tell that AccNum comes from ipr_out as it was explicitly defined as ipr_out$AccNum

R/ipr2viz.R Outdated
Comment on lines 160 to 176
ipr_out <- ipr_out %>% arrange(.data$AccNum, .data$StartLoc, .data$StopLoc)
ipr_out_sub <- filter(
ipr_out,
grepl(pattern = analysis, x = Analysis)
grepl(pattern = analysis, x = .data$Analysis)
)
# dynamic analysis labeller
analyses <- ipr_out_sub %>%
select(Analysis) %>%
select(.data$Analysis) %>%
distinct()
analysis_labeler <- analyses %>%
pivot_wider(names_from = Analysis, values_from = Analysis)
pivot_wider(names_from = .data$Analysis, values_from = .data$Analysis)

lookup_tbl_path <- "/data/research/jravilab/common_data/cln_lookup_tbl.tsv"
lookup_tbl <- read_tsv(lookup_tbl_path, col_names = T, col_types = lookup_table_cols)
lookup_tbl <- read_tsv(lookup_tbl_path, col_names = T, col_types = .data$lookup_table_cols)

lookup_tbl <- lookup_tbl %>% select(-ShortName) # Already has ShortName -- Just needs SignDesc
# ipr_out_sub = ipr_out_sub %>% select(-ShortName)
lookup_tbl <- lookup_tbl %>% select(-.data$ShortName) # Already has ShortName -- Just needs SignDesc
# ipr_out_sub = ipr_out_sub %>% select(-.data$ShortName)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

R/ipr2viz.R Outdated
# TODO: Fix lookup table and uncomment below
# ipr_out_sub <- merge(ipr_out_sub, lookup_tbl, by.x = "DB.ID", by.y = "DB.ID")

## PLOTTING
## domains as separate arrows
# For odering with tree
# ipr_out_sub$Name <- paste0(" ", ipr_out_sub$Name)
# ipr_out_sub$.data$Name <- paste0(" ", ipr_out_sub$.data$Name)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.data is not needed here, as ipr_out_sub$Name was explicit about where the variable came from.

Copy link
Collaborator

@the-mayer the-mayer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @SunSummoner,

Thank you so much for your contribution! There is a lot that works well in this PR, and I appreciate the effort you put into tackling some tricky situations in the files and functions you started with.

I have not commented on all code changes in this PR but have instead attempted to provide representative examples of where .data was used correctly and where it was not.

One thing that will help with identifying these issues more accurately is the liberal use of running devtools::check(). With this PR, many of the tricky situations where .data was applied incorrectly could have been discovered locally. I'd recommend changing a few global variable definitions, running devtools::check() and evaluating the result to see if the warning has disappeared.

Again, this is a strange issue because you will generally not notice it outside of creating an RPackage. Go ahead and attempt a few revisions and feel free to reach out with questions!

@SunSummoner
Copy link
Collaborator Author

Hi @SunSummoner,

Thank you so much for your contribution! There is a lot that works well in this PR, and I appreciate the effort you put into tackling some tricky situations in the files and functions you started with.

I have not commented on all code changes in this PR but have instead attempted to provide representative examples of where .data was used correctly and where it was not.

One thing that will help with identifying these issues more accurately is the liberal use of running devtools::check(). With this PR, many of the tricky situations where .data was applied incorrectly could have been discovered locally. I'd recommend changing a few global variable definitions, running devtools::check() and evaluating the result to see if the warning has disappeared.

Again, this is a strange issue because you will generally not notice it outside of creating an RPackage. Go ahead and attempt a few revisions and feel free to reach out with questions!

Thank you @the-mayer I'll make the required changes in the PR. I was having some issue with devtools::check() in my version of R so I'll try checking that up too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants