Skip to content
This repository has been archived by the owner on Dec 15, 2021. It is now read-only.

Prop classification on first element #207

Closed
MichelSpalburg opened this issue Feb 21, 2017 · 19 comments
Closed

Prop classification on first element #207

MichelSpalburg opened this issue Feb 21, 2017 · 19 comments
Labels

Comments

@MichelSpalburg
Copy link

I used a simple QueueRanked query with multiple elements and a single classification:
QueueTrended(reportsuite.id = "ag-adi-eu-prod",
date.from = "2016-02-01",
date.to = "2016-02-01",
top ="1000",
metrics = c("visits"),
elements = c("evar5"),
classification = "Country (c)",
date.granularity = "week", segment.id = "", data.current = TRUE,
expedite = FALSE, interval.seconds = 5, max.attempts = 10000,
validate = TRUE, enqueueOnly = FALSE)

However using multiple elements:
QueueTrended(reportsuite.id = "ag-adi-eu-prod",
date.from = "2016-02-01",
date.to = "2016-02-01",
top ="1000",
metrics = c("visits"),
elements = c("evar5","mobiledevicetype"),
classification = "Country (c)",
date.granularity = "week", segment.id = "", data.current = TRUE,
expedite = FALSE, interval.seconds = 5, max.attempts = 10000,
validate = TRUE, enqueueOnly = FALSE)

This will result in the following error:
Error in if (!is.null(elements[i, ]$classification) && nchar(elements[i, :
missing value where TRUE/FALSE needed

It seems using classification only works on a single element instead of the first element.

@randyzwitch
Copy link
Owner

Thanks for reporting this. At first glance, it looks like this might be the same error as #162. I'll dig into this some more and hopefully get this figured out.

@mfalcioni1
Copy link

mfalcioni1 commented Feb 23, 2017

As a heads up, I did discover a work around for this that involves editing the "BuildInnerBreakDownsRecursively" function, which appears to be the source of the problem.

for(i in 1:nrow(elements)) { if(!is.null(elements[i,]$classification) && nchar(elements[i,]$classification) > 0) { elements.named[i] <- paste0(elements[i,]$id,": ",elements[i,]$classification) } }
This is the part of the function that is breaking, and for whatever reason the contents of the compound if written here is not returning a logical. If you comment this entire block of code out, the report will return just fine with all the expect variables in the resulting data frame. I think commenting this out is low impact as it seems these few lines of code are just adding some additional columns to the report to indicate that a SAINT classification was used (hard to tell without getting in the weeds and testing the function).

So the steps to work around are:

  1. In your R console type trace(BuildInnerBreakdownsRecursively, edit = T)
  2. Find the lines of code I pasted above and comment them out.
  3. Save
  4. Run your report
    Note that you will need to repeat these steps every time you reload the RSiteCatalyst package since the edit in trace does not permanently change the package source code.

Hope this helps as a workaround for others encountering the error, and gives you some direction for a fix Randy. Thanks for all the work you've done on the package!

Edit: Forgot to make mention that this error only happens when length(elements) > length(classification), so hopefully that provides more direction in the debugging process.

@randyzwitch
Copy link
Owner

Thanks for digging into this, I'll see if I can shim this into the code without disrupting any of the other functions that rely on BuildInnterBreakdownsRecursively

@MichelSpalburg
Copy link
Author

Many thanks @mfalcioni1 this solution works great!

@randyzwitch Thanks for the quick reaction! Looking forward to the fix. Would even be better if a full solution for classifications was in place like suggested in #162 . As we use many element classifications for reporting .

@randyzwitch
Copy link
Owner

@mfalcioni1, I believe the code you reference is here:

https://github.com/randyzwitch/RSiteCatalyst/blob/master/R/BuildInnerBreakdownsRecursively.R#L84-L88

I've commented out these lines and am running the test suite now. If it doesn't break anything, then I'll try and incorporate a test from my account that fails in the manner as @MichelSpalburg's test does, and see if everything is still stable.

I have to believe this code was written for a specific purpose, so I'm hesitant to just comment it out and release a new version.

@mfalcioni1
Copy link

@randyzwitch Correct, that is the code I was referencing. I agree that commenting it out for a release without understanding the full purpose it serves would be an ill advised move. I was just hoping to offer a hack solution to serve as a bridge for those encountering the error until the code could be properly tested and debugged.

@randyzwitch
Copy link
Owner

Given this might be same as #162 and #172, will keep this open and close the other two.

@randyzwitch
Copy link
Owner

Of course, when I commented out those lines, none of the tests failed. Which means I don't have a test written to explain what those lines do :(

@slin30
Copy link
Contributor

slin30 commented Mar 1, 2017

Came across this myself today-- have not had a chance to test @mfalcioni1's workaround (nice find, btw), but just thinking about it, might it be as simple as swapping nchar() for e.g. is.na()?

For example:

x <- "A"
y <- NA_character_

# With nchar(), NA will return NA, but we do not see this as long as x is not NA
if(!is.null(x) && nchar(x) > 0) {
  message("OK")
}
if(!is.null(x) && !is.na(x)) {
  message("OK")
}

# Not so much if we have NA, although checking via is.na() should work in both scenarios
if(!is.null(y) && nchar(y) > 0) {
  message("OK")
}
if(!is.null(y) && !is.na(y)) {
  message("OK")
}

@randyzwitch
Copy link
Owner

Thanks for the code review @slin30!

Since I don't have a test written that triggers this code logic, only thing I can do is implement it and try it.

https://github.com/randyzwitch/RSiteCatalyst/tree/fix_na

If everyone could try out their API calls using this branch, it will get us closer to finding a solution. Devtools code should be:

library(devtools)
install_github("randyzwitch/RSiteCatalyst", ref = "fix_na")

@slin30
Copy link
Contributor

slin30 commented Mar 2, 2017

@randyzwitch, this seems quite promising so far. Since we are testing, I thought some context might be helpful.

This is a small experiment I ran a short time ago. The goal was to view products in a subset of subject areas (up to 18 targets, out of 639 total). There were two ways to approach this:

  1. Create a single segment containing the classification values, apply that segment, and then query to pull the top n values per each of the target subject areas.
  2. Create 18 segments on the basis of each of the products that are associated with each subject area.

The first option was by far preferable, since the second option would require me to create segments using the product names (due to how we have things set up), and each subject area has dozens, if not hundreds of product names. Then, I'd have to run a query for each segment.

I ended up taking option 2 since I encountered the bug (which now seems fixed!) when I tried to add a second element (for product name). Fortunately, option 2 ended up being a useful test of some of the Segment.Save functions I am working on, which made it quite easy to create the segments. And, when reporting, running a query per segment is no more difficult than running a single query, and gave me enough time to get (more) coffee.

Preliminaries:

library(devtools)
#install.packages("digest") # needed to install this for some reason ,no big deal
install_github("randyzwitch/RSiteCatalyst", ref = "fix_na")

# libs
library(magrittr)
library(RSiteCatalyst)
library(data.table)
library(jsonlite)
library(SCAdmin)
library(stringr)
# Baseline arglist --------------------------------------------------------

# Here is the reference (positive control) arglist. 
# Single element, classification with element
base_arglist <- list(
  reportsuite.id = targ_rsid, 
  date.from = date_start, 
  date.to = date_end, 
  elements = targ_evar, 
  classification = targ_class, 
  metrics = "uniquevisitors",
  top = 700
)

str(base_arglist)
# List of 7
# $ reportsuite.id: chr "elsevier-kn-prod"
# $ date.from     : chr "2016-01-01"
# $ date.to       : chr "2017-01-01"
# $ elements      : chr "evar28"
# $ classification: chr "Subject Areas [v28]"
# $ metrics       : chr "uniquevisitors"
# $ top           : num 700

# Make the call
QR_base <- do.call(QueueRanked, base_arglist)
setDT(QR_base)

# View some results
ncol(QR_base)
# 5
nrow(QR_base)
# 639
cbind(mapply(class, QR_base), mapply(length, QR_base))
#                [,1]        [,2] 
# name           "character" "639"
# url            "character" "639"
# uniquevisitors "numeric"   "639"
# segment.id     "factor"    "639" # <- This is a bit strange and unexpected, as an aside...
# segment.name   "factor"    "639" # <- ...we get a return type of "factor" when no segment.id provided


# New segment -------------------------------------------------------------

# Made a segment, as I want to pull only a subset of the subject areas in the set of 
# 639. Here is what the segment looks like
# Note: Using some stuff from my in-dev Segments package, since I thought it might be
# helpful to get a sense of the segment contents

# NB: Only 17 of the 18 subject areas will actually be found in the time frame used
# throughout (as you will see below)

seg_class_additional <- call.Get_Segments(selected = "s300000520_58abadb4e4b06910029e4ae3", 
                                          fields = c("definition", "description", "modified")) %>%
  SCAdmin:::flatten_container(.)

str(seg_class_additional)
# List of 2
# $ segment_meta:'data.frame':	1 obs. of  4 variables:
# ..$ id         : chr "s300000520_58abadb4e4b06910029e4ae3"
# ..$ name       : chr "Daniel_SA_hits"
# ..$ description: chr "18 subject areas"
# ..$ modified   : chr "2017-02-20 19:02:12"
# $ L0          :List of 2
# ..$ cont_meta:'data.frame':	1 obs. of  2 variables:
# .. ..$ type    : chr "hits"
# .. ..$ operator: chr "or"
# ..$ cont_rule:List of 1
# .. ..$ :'data.frame':	18 obs. of  5 variables:
# .. .. ..$ name          : chr [1:18] "Subject Areas [v28]" "Subject Areas [v28]" "Subject Areas [v28]" "Subject Areas [v28]" ...
# .. .. ..$ element       : chr [1:18] "evar28" "evar28" "evar28" "evar28" ...
# .. .. ..$ classification: chr [1:18] "Subject Areas [v28]" "Subject Areas [v28]" "Subject Areas [v28]" "Subject Areas [v28]" ...
# .. .. ..$ operator      : chr [1:18] "equals" "equals" "equals" "equals" ...
# .. .. ..$ value         : chr [1:18] "chemistry & chemical engineering" "biochemistry- biology & biotechnology" "food science" "pharmaceuticals- cosmetics & toiletries" ...


# Make a new arglist ------------------------------------------------------

# Now, let's try the test
# We want to use the classification and element to pull only the target subject areas
# Additionally, we want to pull, for each of the selected subject areas, the top 10 
# products within each subject area

new_arglist <- base_arglist
new_arglist$segment.id <- "s300000520_58abadb4e4b06910029e4ae3"
new_arglist$elements <- c(targ_evar, targ_elem)
new_arglist$top = c(50, 10) # we go a bit high on the top for subject area, just to be safe

str(new_arglist)
# List of 8
# $ reportsuite.id: chr "elsevier-kn-prod"
# $ date.from     : chr "2016-01-01"
# $ date.to       : chr "2017-01-01"
# $ elements      : chr [1:2] "evar28" "evar75"
# $ classification: chr "Subject Areas [v28]"
# $ metrics       : chr "uniquevisitors"
# $ top           : num [1:2] 50 10
# $ segment.id    : chr "s300000520_58abadb4e4b06910029e4ae3"

# Make the call
QR_new <- do.call(QueueRanked, new_arglist)
setDT(QR_new)

# View some results
ncol(QR_new)
# 5
nrow(QR_new)
# 170
cbind(mapply(class, QR_new), mapply(length, QR_new))
#                             [,1]        [,2] 
# evar28: Subject Areas [v28] "character" "170"
# evar75                      "character" "170"
# uniquevisitors              "numeric"   "170"
# segment.id                  "character" "170"
# segment.name                "character" "170"

# And see if we have 17 unique subject areas, each with 10 products
QR_new[, uniqueN(evar75), by = c("evar28: Subject Areas [v28]")][, .(n_row = nrow(.SD), N = unique(V1))]
#    n_row  N
# 1:    17 10 # all good!

# Now for the check -------------------------------------------------------

# Let's see if comparing apples to apples, in terms of subject area, 
# works out, via a simple sum of comparable subject areas

# Note that even though we ask for only the first 10 products in the new call, 
# the sums should still total up, as the products that are beyond a rank of 9
# are rolled up into an aggregate "::other::" item, so we really have 9 top products
# plus an "::other::", per subject area.

# Subset the full (first) call result set by the subject areas in the test call result set
QR_base_subset <- QR_base[name %in% QR_new$`evar28: Subject Areas [v28]`]

# Sum of test call result set
QR_new[, sum(uniquevisitors)] # 664587
# Sum of subset (first) result set
QR_base_subset[, sum(uniquevisitors)] # 664587

# and to double-confirm:
identical(QR_new[, sum(uniquevisitors)], QR_base_subset[, sum(uniquevisitors)]) # TRUE

##################

# Also does not seem to matter if we "fill" classification with e.g. "" or not
new_arglist_2 <- new_arglist
new_arglist_2$classification <- c(targ_class, "")
# str(new_arglist_2)
# List of 8
# $ reportsuite.id: chr "elsevier-kn-prod"
# $ date.from     : chr "2016-01-01"
# $ date.to       : chr "2017-01-01"
# $ elements      : chr [1:2] "evar28" "evar75"
# $ classification: chr [1:2] "Subject Areas [v28]" ""  # <- appending a "" here
# $ metrics       : chr "uniquevisitors"
# $ top           : num [1:2] 50 10
# $ segment.id    : chr "s300000520_58abadb4e4b06910029e4ae3"

QR_new_2 <- do.call(QueueRanked, new_arglist_2)

Map(all.equal, QR_new, QR_new_2)
# $`evar28: Subject Areas [v28]`
# [1] TRUE
# 
# $evar75
# [1] TRUE
# 
# $uniquevisitors
# [1] TRUE
# 
# $segment.id
# [1] TRUE
# 
# $segment.name
# [1] TRUE

Session Info:

devtools::session_info()
Session info ------------------------------------------------------------------------------------------------------------------------------------
 setting  value                       
 version  R version 3.3.2 (2016-10-31)
 system   x86_64, mingw32             
 ui       RStudio (1.0.136)           
 language (EN)                        
 collate  English_United States.1252  
 tz       America/New_York            
 date     2017-03-01                  

Packages ----------------------------------------------------------------------------------------------------------------------------------------
 package       * version         date       source                                    
 assertthat      0.1             2013-12-06 CRAN (R 3.3.1)                            
 base64enc       0.1-3           2015-07-28 CRAN (R 3.3.0)                            
 curl            2.3             2016-11-24 CRAN (R 3.3.2)                            
 data.table    * 1.10.4          2017-02-01 CRAN (R 3.3.2)                            
 DBI             0.5-1           2016-09-10 CRAN (R 3.3.1)                            
 devtools      * 1.12.0          2016-06-24 CRAN (R 3.3.2)                            
 digest          0.6.12          2017-01-27 CRAN (R 3.3.2)                            
 dplyr           0.5.0           2016-06-24 CRAN (R 3.3.1)                            
 git2r           0.16.0          2016-11-20 CRAN (R 3.3.2)                            
 httr            1.2.1           2016-07-03 CRAN (R 3.3.1)                            
 jsonlite      * 1.3             2017-02-28 CRAN (R 3.3.2)                            
 magrittr      * 1.5             2014-11-22 CRAN (R 3.3.1)                            
 memoise         1.0.0           2016-01-29 CRAN (R 3.3.1)                            
 plyr            1.8.4           2016-06-08 CRAN (R 3.3.1)                            
 purrr           0.2.2           2016-06-18 CRAN (R 3.3.1)                            
 R6              2.2.0           2016-10-05 CRAN (R 3.3.2)                            
 Rcpp            0.12.8          2016-11-17 CRAN (R 3.3.2)                            
 readxl          0.1.1           2016-03-28 CRAN (R 3.3.0)                            
 RSiteCatalyst * 1.4.10.20170301 2017-03-02 Github (randyzwitch/RSiteCatalyst@565e2d5)
 SCAdmin       * 0.0.0.9000      2017-03-01 local (@ef9da04)                          
 stringi         1.1.2           2016-10-01 CRAN (R 3.3.2)                            
 stringr       * 1.2.0           2017-02-18 CRAN (R 3.3.2)                            
 tibble          1.2             2016-08-26 CRAN (R 3.3.1)                            
 withr           1.0.2           2016-06-20 CRAN (R 3.3.1)                            
 wzMisc          0.1.033         2017-02-15 Github (slin30/wzMisc@8f74649)            
 xml2            1.1.1           2017-01-24 CRAN (R 3.3.2)  

@randyzwitch
Copy link
Owner

There's a lot going on here @slin30....are you saying that the fix_na branch works in your scenario, where prior API calls failed?

If this is the fix, I'd love to get it merged in, as I'm going to post a new version to CRAN soon before Adobe Summit

@slin30
Copy link
Contributor

slin30 commented Mar 2, 2017

Yes, the fix_na branch works in my scenario, where the call using the master branch failed, with the same error others have reported.

I am cautiously optimistic that others will confirm in like manner once they've had a chance to test their scenarios.

And apologies for the super long, verbose response. A lot of this was documented on my side previously, and I felt that extra caution (and transparency) was warranted since I thought of the fix on the commute home...

@randyzwitch
Copy link
Owner

Not a problem on the length, I was more reacting to "quite promising"...thought you might have found a place where this new code doesn't work.

As I look at the logic, I'm wondering if this wasn't the original intent:

> nchar("")
[1] 0
> is.na("")
[1] FALSE

In this case, nchar is testing that the string isn't a zero-length string, which can be a convention in this library for a default argument. So is.na might not be enough, but rather a 3rd condition might need to be added to the IF statement to ensure the same behavior as before.

@WillemPaling, do you remember the history behind how this function got these arguments?

https://github.com/randyzwitch/RSiteCatalyst/blob/master/R/BuildInnerBreakdownsRecursively.R#L84-L88

@slin30
Copy link
Contributor

slin30 commented Mar 2, 2017

@randyzwitch, yep, re: "quite promising," I meant to imply that I could only speak for my own specific scenario now working, but no idea how/if this addresses other scenarios.

@randyzwitch
Copy link
Owner

The more I look at it, I'm starting to think the correct line should be switching the is.null to is.na and leaving the nchar line alone:

        for(i in 1:nrow(elements)) {
          if(!is.na(elements[i,]$classification) && nchar(elements[i,]$classification)>0) {
            elements.named[i] <- paste0(elements[i,]$id,": ",elements[i,]$classification)
          }
        }

In this scenario, the logic would be checking to see if the value is not NA, and if not, that the string is not "" (i.e. more than 0 characters). This makes more sense than the prior idea of adding a 3rd predicate to keep checking "nullness"

@slin30
Copy link
Contributor

slin30 commented Mar 3, 2017

That makes perfect sense-- had been thinking myself that there is no situation where we should encounter NULL in a data.frame, unless we're dealing with e.g. list-columns. So the missingness scenarios are pretty much NA and "".

randyzwitch added a commit that referenced this issue Mar 3, 2017
@randyzwitch
Copy link
Owner

I was able to replicate the problem, and more importantly, fix it using the following calls:

a <- QueueTrended(
             reportsuite.id = "zwitchdev",
             date.from = "2016-02-01",
             date.to = "2016-02-01",
             top ="1000",
             metrics = c("visits"),
             elements = c("page"),
             classification = "JuliaPages",
             date.granularity = "week", 
             segment.id = "", 
             data.current = TRUE,
             expedite = FALSE, 
             interval.seconds = 5, 
             max.attempts = 10000,
             validate = TRUE, 
             enqueueOnly = FALSE
             )

#error simulating test for GitHub #207
aa <- QueueTrended(
  reportsuite.id = "zwitchdev",
  date.from = "2016-02-01",
  date.to = "2016-02-01",
  top ="1000",
  metrics = c("visits"),
  elements = c("page", "mobiledevicetype"),
  classification = "JuliaPages",
  date.granularity = "week", 
  segment.id = "", 
  data.current = TRUE,
  expedite = FALSE, 
  interval.seconds = 5, 
  max.attempts = 10000,
  validate = TRUE, 
  enqueueOnly = FALSE
)

The fix is currently pushed to the fix_na branch. When it passes, I'll merge it into master, and then this will be the last thing I fix before pushing a new version to CRAN. So if people could test this out today, that would be great, so that I have one more chance to fix before submitting to CRAN.

@slin30
Copy link
Contributor

slin30 commented Mar 3, 2017

Just reran the full script from my test of the previous fix_na branch and can confirm that the results are identical with the new fix_na branch. All seems to be working on my end.

Session info -----------------------------------------------------------------------
 setting  value                       
 version  R version 3.3.2 (2016-10-31)
 system   x86_64, mingw32             
 ui       RStudio (1.0.136)           
 language (EN)                        
 collate  English_United States.1252  
 tz       America/New_York            
 date     2017-03-03                  

Packages ---------------------------------------------------------------------------
 package       * version         date       source                                    
 base64enc       0.1-3           2015-07-28 CRAN (R 3.3.0)                            
 curl            2.3             2016-11-24 CRAN (R 3.3.2)                            
 data.table    * 1.10.4          2017-02-01 CRAN (R 3.3.2)                            
 devtools      * 1.12.0          2016-06-24 CRAN (R 3.3.1)                            
 digest          0.6.12          2017-01-27 CRAN (R 3.3.2)                            
 httr            1.2.1           2016-07-03 CRAN (R 3.3.2)                            
 jsonlite      * 1.3             2017-02-28 CRAN (R 3.3.2)                            
 magrittr      * 1.5             2014-11-22 CRAN (R 3.3.0)                            
 memoise         1.0.0           2016-01-29 CRAN (R 3.3.0)                            
 plyr            1.8.4           2016-06-08 CRAN (R 3.3.1)                            
 R6              2.2.0           2016-10-05 CRAN (R 3.3.2)                            
 Rcpp            0.12.8          2016-11-17 CRAN (R 3.3.2)                            
 readxl          0.1.1           2016-03-28 CRAN (R 3.3.0)                            
 RSiteCatalyst * 1.4.10.20170303 2017-03-03 Github (randyzwitch/RSiteCatalyst@f8141e0)
 SCAdmin       * 0.0.0.9000      2017-03-03 git (@eb5d21f)                            
 stringi         1.1.2           2016-10-01 CRAN (R 3.3.2)                            
 stringr       * 1.2.0           2017-02-18 CRAN (R 3.3.2)                            
 withr           1.0.2           2016-06-20 CRAN (R 3.3.1)                            
 wzMisc          0.1.033         2017-02-15 Github (slin30/wzMisc@8f74649)            
 xml2            1.1.1           2017-01-24 CRAN (R 3.3.2)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

4 participants