This PR addresses issue #4 (format_job_args: no visible binding for global variable ‘tags’ and ‘fields_metadata’) #28

SunSummoner · 2024-10-03T20:56:51Z

What kind of change(s) are included?

Feature (adds or updates new capabilities)
Bug fix (fixes an issue).
Enhancement (adds functionality).
Breaking change (these changes would cause existing functionality to not work as expected).

Checklist

I have read and followed the CONTRIBUTING.md guidelines.
I have searched for existing content to ensure this is not a duplicate.
I have performed a self-review of these additions (including spelling, grammar, and related).
I have deleted all non-relevant text in this pull request template.
@the-mayer please take a look at this. Thank you!

the-mayer · 2024-10-08T20:17:20Z

R/.RData

This file should be removed from this PR. Additionally, let's add *.RData to the .gitignore file for this repository.

the-mayer · 2024-10-08T20:37:57Z

R/lineage.R

@@ -358,21 +358,21 @@ ipg2lin <- function(accessions, ipg_file,
    if (length(refseq_rows) != 0) {
        refseq_ipg_dt <- ipg_dt[refseq_rows, ]
        refseq_lins <- GCA2lin(refseq_ipg_dt,
-            assembly_path = refseq_assembly_path,
+            .data$assembly_path = refseq_assembly_path,


The .data prefix is not required in this instance, as assembly_path is an argument to GCA2lin().

the-mayer · 2024-10-08T20:39:17Z

R/lineage.R

+        genbank_lins <- GCA2lin(.data$gca_ipg_dt,
+            .data$assembly_path = genbank_assembly_path,


The .data prefix is not required in this instance, as gca_ipg_dt and assembly_path are an arguments to GCA2lin().

the-mayer · 2024-10-08T21:02:31Z

R/lineage.R

            lineagelookup_path
        )
    }


-    lins <- GCA2lin(prot_data = ipg_dt, assembly_path, lineagelookup_path)
-    lins <- lins[!is.na(Lineage)] %>% unique()
+    lins <- GCA2lin(prot_data = ipg_dt, .data$assembly_path, lineagelookup_path)


The .data prefix is not required in this instance, again we appear to be specifying function arguments

the-mayer · 2024-10-08T22:12:04Z

R/lineage.R

@@ -323,7 +323,7 @@ ipg2lin <- function(accessions, ipg_file,
    ipg_dt <- fread(ipg_file, sep = "\t", fill = T)

    accessions <- unique(accessions)
-    ipg_dt <- ipg_dt[Protein %in% accessions]
+    ipg_dt <- ipg_dt[.data$Protein %in% accessions]


ah, data.table -- this package requires a different solution to the "no visible binding for global variable" warning. .data will not work in this context. When subsetting, as in this line, we should define "Protein" in a separate .R file, called globals.R. The contents of this file will resemble the following:

#' @importFrom utils globalVariables utils::globalVariables(c("Protein"))

We can use this file to put all data.table variables that aren't defined otherwise to clear the R-CMD warnings.

the-mayer · 2024-10-09T14:05:33Z

R/acc2lin.R

This is more of just a general comment. It looks like ipg2lin() is present here, and also in lineage.R. Will need to disambiguate the functions to see which is in active use.

the-mayer · 2024-10-09T14:09:33Z

R/assign_job_queue.R

@@ -94,7 +94,7 @@ get_proc_medians <- function(dir_job_results) {
        dplyr::summarise(
            dplyr::across(
                dplyr::everything(),
-                \(x) median(x, na.rm = TRUE)
+                \(x) .data$median(x, na.rm = TRUE)


Here, the error we see for this function is:
"no visible global function definition for ‘median’"

In this case, we need to add an Rd tag, specifying which median function to use. In this case adding:

#' @importFrom stats median

To the Roxygen2 skeleton above this function should clear the warning

the-mayer · 2024-10-09T14:11:28Z

R/assign_job_queue.R

@@ -126,7 +126,7 @@ write_proc_medians_table <- function(dir_job_results, filepath) {
            names_to = "process",
            values_to = "median_seconds"
        ) |>
-        dplyr::arrange(dplyr::desc(median_seconds))
+        dplyr::arrange(dplyr::desc(.data$median_seconds))


Excellent, this is exactly how we would like to use the .data prefix!

the-mayer · 2024-10-09T14:16:05Z

R/cleanup.R

@@ -342,7 +342,7 @@ remove_tails <- function(prot, by_column = "DomArch",
        # !! Insert line to read domains_keep

        # Contains all domains separated by "|"
-        domains_for_grep <- paste(domains_keep$domains, collapse = "|")
+        domains_for_grep <- paste(.data$domains_keep$domains, collapse = "|")


It doesn't look like NSE is used here, so the .data prefix is not needed. However, we do need to figure out how domains_keep is defined. It may have originally been sourced from the Global Environment

the-mayer · 2024-10-09T14:17:33Z

R/cleanup.R

@@ -693,35 +693,35 @@ cleanup_GeneDesc <- function(prot, column) {
 pick_longer_duplicate <- function(prot, column) {
    col <- sym(column)

-    prot$row.orig <- 1:nrow(prot)
+    prot$.data$row.orig <- 1:nrow(prot)


Not needed here.

the-mayer · 2024-10-09T14:18:36Z

R/cleanup.R


        # dup_rows[which(nchar(pull(dup_rows,{{col}})) == max(nchar(pull(dup_rows,{{col}}))))[2:nrow(dup_rows)], "row.orig"]
        remove_rows <- c(remove_rows, to_remove)
    }

    # grab all the longest rows
-    unique_dups <- prot[-remove_rows, ] %>% select(-row.orig)
+    unique_dups <- prot[-remove_rows, ] %>% select(-.data$row.orig)


This is likely where the warning was generated from for this function and this is the correct application of the .data prefix

the-mayer · 2024-10-09T14:19:01Z

R/cleanup.R


    longest_rows <- c()
    remove_rows <- c()
    for (acc in dup_acc) {
-        dup_rows <- dups %>% filter(AccNum == acc)
+        dup_rows <- dups %>% filter(.data$AccNum == acc)


This is also a good fix for the warning generated for AccNum

the-mayer · 2024-10-09T14:19:17Z

R/cleanup.R


    # Get list of duplicates
    dups <- prot %>%
-        group_by(AccNum) %>%
+        group_by(.data$AccNum) %>%


the-mayer · 2024-10-09T14:19:28Z

R/cleanup.R

        summarize("count" = n()) %>%
        filter(count > 1) %>%
        arrange(-count) %>%
        merge(prot, by = "AccNum")

-    dup_acc <- dups$AccNum
+    dup_acc <- dups$.data$AccNum


.data is not needed here

the-mayer · 2024-10-09T14:22:26Z

R/ipr2viz.R

-    ipr_out <- read_tsv(infile_ipr, col_names = T, col_types = iprscan_cols)
-    ipr_out <- ipr_out %>% filter(Name %in% accessions)
+    ipr_out <- read_tsv(infile_ipr, col_names = T, col_types = .data$iprscan_cols)
+    ipr_out <- ipr_out %>% filter(.data$Name %in% accessions)


the-mayer · 2024-10-09T14:22:36Z

R/ipr2viz.R

    analysis_cols <- paste0("DomArch.", analysis)
-    infile_full <- infile_full %>% select(analysis_cols, Lineage_short, QueryName, PcPositive, AccNum)
+    infile_full <- infile_full %>% select(analysis_cols, .data$Lineage_short, .data$QueryName, .data$PcPositive, .data$AccNum)


the-mayer · 2024-10-09T14:24:03Z

R/ipr2viz.R

@@ -152,35 +152,35 @@ ipr2viz <- function(infile_ipr = NULL, infile_full = NULL, accessions = c(),
    # Filter by Top Accessions per Accession per DomArch and Lineage
    ipr_out <- subset(
        ipr_out,
-        ipr_out$AccNum %in% top_acc
+        ipr_out$.data$AccNum %in% top_acc


This use of .data is unnecessary. Here, we can already tell that AccNum comes from ipr_out as it was explicitly defined as ipr_out$AccNum

the-mayer · 2024-10-09T14:25:08Z

R/ipr2viz.R

+    ipr_out <- ipr_out %>% arrange(.data$AccNum, .data$StartLoc, .data$StopLoc)
    ipr_out_sub <- filter(
        ipr_out,
-        grepl(pattern = analysis, x = Analysis)
+        grepl(pattern = analysis, x = .data$Analysis)
    )
    # dynamic analysis labeller
    analyses <- ipr_out_sub %>%
-        select(Analysis) %>%
+        select(.data$Analysis) %>%
        distinct()
    analysis_labeler <- analyses %>%
-        pivot_wider(names_from = Analysis, values_from = Analysis)
+        pivot_wider(names_from = .data$Analysis, values_from = .data$Analysis)

    lookup_tbl_path <- "/data/research/jravilab/common_data/cln_lookup_tbl.tsv"
-    lookup_tbl <- read_tsv(lookup_tbl_path, col_names = T, col_types = lookup_table_cols)
+    lookup_tbl <- read_tsv(lookup_tbl_path, col_names = T, col_types = .data$lookup_table_cols)

-    lookup_tbl <- lookup_tbl %>% select(-ShortName) # Already has ShortName -- Just needs SignDesc
-    # ipr_out_sub = ipr_out_sub %>% select(-ShortName)
+    lookup_tbl <- lookup_tbl %>% select(-.data$ShortName) # Already has ShortName -- Just needs SignDesc
+    # ipr_out_sub = ipr_out_sub %>% select(-.data$ShortName)


the-mayer · 2024-10-09T14:25:54Z

R/ipr2viz.R

    # TODO: Fix lookup table and uncomment below
    # ipr_out_sub <- merge(ipr_out_sub, lookup_tbl, by.x = "DB.ID", by.y = "DB.ID")

    ## PLOTTING
    ## domains as separate arrows
    # For odering with tree
-    # ipr_out_sub$Name <- paste0(" ", ipr_out_sub$Name)
+    # ipr_out_sub$.data$Name <- paste0(" ", ipr_out_sub$.data$Name)


.data is not needed here, as ipr_out_sub$Name was explicit about where the variable came from.

the-mayer

Hi @SunSummoner,

Thank you so much for your contribution! There is a lot that works well in this PR, and I appreciate the effort you put into tackling some tricky situations in the files and functions you started with.

I have not commented on all code changes in this PR but have instead attempted to provide representative examples of where .data was used correctly and where it was not.

One thing that will help with identifying these issues more accurately is the liberal use of running devtools::check(). With this PR, many of the tricky situations where .data was applied incorrectly could have been discovered locally. I'd recommend changing a few global variable definitions, running devtools::check() and evaluating the result to see if the warning has disappeared.

Again, this is a strange issue because you will generally not notice it outside of creating an RPackage. Go ahead and attempt a few revisions and feel free to reach out with questions!

SunSummoner · 2024-10-10T14:47:38Z

Hi @SunSummoner,

Thank you so much for your contribution! There is a lot that works well in this PR, and I appreciate the effort you put into tackling some tricky situations in the files and functions you started with.

I have not commented on all code changes in this PR but have instead attempted to provide representative examples of where .data was used correctly and where it was not.

One thing that will help with identifying these issues more accurately is the liberal use of running devtools::check(). With this PR, many of the tricky situations where .data was applied incorrectly could have been discovered locally. I'd recommend changing a few global variable definitions, running devtools::check() and evaluating the result to see if the warning has disappeared.

Again, this is a strange issue because you will generally not notice it outside of creating an RPackage. Go ahead and attempt a few revisions and feel free to reach out with questions!

Thank you @the-mayer I'll make the required changes in the PR. I was having some issue with devtools::check() in my version of R so I'll try checking that up too.

jananiravi requested review from the-mayer and falquaddoomi October 5, 2024 21:53

the-mayer self-assigned this Oct 8, 2024

the-mayer added the enhancement New feature or request label Oct 8, 2024

the-mayer reviewed Oct 8, 2024

View reviewed changes

the-mayer reviewed Oct 9, 2024

View reviewed changes

R/cleanup.R Outdated

# Get list of duplicates

dups <- prot %>%

group_by(AccNum) %>%

group_by(.data$AccNum) %>%

Copy link

Collaborator

the-mayer Oct 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

the-mayer reviewed Oct 9, 2024

View reviewed changes

the-mayer requested changes Oct 9, 2024

View reviewed changes

SunSummoner closed this Oct 19, 2024

SunSummoner force-pushed the main branch from 6b03555 to 1c43150 Compare October 19, 2024 13:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

This PR addresses issue #4 (format_job_args: no visible binding for global variable ‘tags’ and ‘fields_metadata’) #28

This PR addresses issue #4 (format_job_args: no visible binding for global variable ‘tags’ and ‘fields_metadata’) #28

SunSummoner commented Oct 3, 2024

the-mayer Oct 8, 2024

the-mayer Oct 8, 2024

the-mayer Oct 8, 2024

the-mayer Oct 8, 2024

the-mayer Oct 8, 2024 •

edited

Loading

the-mayer Oct 9, 2024

the-mayer Oct 9, 2024 •

edited

Loading

the-mayer Oct 9, 2024

the-mayer Oct 9, 2024

the-mayer Oct 9, 2024

the-mayer Oct 9, 2024

the-mayer Oct 9, 2024

the-mayer Oct 9, 2024

the-mayer Oct 9, 2024

the-mayer Oct 9, 2024

the-mayer Oct 9, 2024

the-mayer Oct 9, 2024

the-mayer Oct 9, 2024

the-mayer Oct 9, 2024

the-mayer left a comment

SunSummoner commented Oct 10, 2024

		genbank_lins <- GCA2lin(.data$gca_ipg_dt,
		.data$assembly_path = genbank_assembly_path,

This PR addresses issue #4 (format_job_args: no visible binding for global variable ‘tags’ and ‘fields_metadata’) #28

This PR addresses issue #4 (format_job_args: no visible binding for global variable ‘tags’ and ‘fields_metadata’) #28

Conversation

SunSummoner commented Oct 3, 2024

What kind of change(s) are included?

Checklist

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

the-mayer Oct 8, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

the-mayer Oct 9, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

the-mayer left a comment

Choose a reason for hiding this comment

SunSummoner commented Oct 10, 2024

the-mayer Oct 8, 2024 •

edited

Loading

the-mayer Oct 9, 2024 •

edited

Loading