-
-
Notifications
You must be signed in to change notification settings - Fork 39
Prop classification on first element #207
Comments
Thanks for reporting this. At first glance, it looks like this might be the same error as #162. I'll dig into this some more and hopefully get this figured out. |
As a heads up, I did discover a work around for this that involves editing the "BuildInnerBreakDownsRecursively" function, which appears to be the source of the problem.
So the steps to work around are:
Hope this helps as a workaround for others encountering the error, and gives you some direction for a fix Randy. Thanks for all the work you've done on the package! Edit: Forgot to make mention that this error only happens when length(elements) > length(classification), so hopefully that provides more direction in the debugging process. |
Thanks for digging into this, I'll see if I can shim this into the code without disrupting any of the other functions that rely on |
Many thanks @mfalcioni1 this solution works great! @randyzwitch Thanks for the quick reaction! Looking forward to the fix. Would even be better if a full solution for classifications was in place like suggested in #162 . As we use many element classifications for reporting . |
@mfalcioni1, I believe the code you reference is here: https://github.com/randyzwitch/RSiteCatalyst/blob/master/R/BuildInnerBreakdownsRecursively.R#L84-L88 I've commented out these lines and am running the test suite now. If it doesn't break anything, then I'll try and incorporate a test from my account that fails in the manner as @MichelSpalburg's test does, and see if everything is still stable. I have to believe this code was written for a specific purpose, so I'm hesitant to just comment it out and release a new version. |
@randyzwitch Correct, that is the code I was referencing. I agree that commenting it out for a release without understanding the full purpose it serves would be an ill advised move. I was just hoping to offer a hack solution to serve as a bridge for those encountering the error until the code could be properly tested and debugged. |
Of course, when I commented out those lines, none of the tests failed. Which means I don't have a test written to explain what those lines do :( |
Came across this myself today-- have not had a chance to test @mfalcioni1's workaround (nice find, btw), but just thinking about it, might it be as simple as swapping For example:
|
Thanks for the code review @slin30! Since I don't have a test written that triggers this code logic, only thing I can do is implement it and try it. https://github.com/randyzwitch/RSiteCatalyst/tree/fix_na If everyone could try out their API calls using this branch, it will get us closer to finding a solution. Devtools code should be: library(devtools)
install_github("randyzwitch/RSiteCatalyst", ref = "fix_na") |
@randyzwitch, this seems quite promising so far. Since we are testing, I thought some context might be helpful. This is a small experiment I ran a short time ago. The goal was to view products in a subset of subject areas (up to 18 targets, out of 639 total). There were two ways to approach this:
The first option was by far preferable, since the second option would require me to create segments using the product names (due to how we have things set up), and each subject area has dozens, if not hundreds of product names. Then, I'd have to run a query for each segment. I ended up taking option 2 since I encountered the bug (which now seems fixed!) when I tried to add a second element (for product name). Fortunately, option 2 ended up being a useful test of some of the Preliminaries: library(devtools)
#install.packages("digest") # needed to install this for some reason ,no big deal
install_github("randyzwitch/RSiteCatalyst", ref = "fix_na")
# libs
library(magrittr)
library(RSiteCatalyst)
library(data.table)
library(jsonlite)
library(SCAdmin)
library(stringr) # Baseline arglist --------------------------------------------------------
# Here is the reference (positive control) arglist.
# Single element, classification with element
base_arglist <- list(
reportsuite.id = targ_rsid,
date.from = date_start,
date.to = date_end,
elements = targ_evar,
classification = targ_class,
metrics = "uniquevisitors",
top = 700
)
str(base_arglist)
# List of 7
# $ reportsuite.id: chr "elsevier-kn-prod"
# $ date.from : chr "2016-01-01"
# $ date.to : chr "2017-01-01"
# $ elements : chr "evar28"
# $ classification: chr "Subject Areas [v28]"
# $ metrics : chr "uniquevisitors"
# $ top : num 700
# Make the call
QR_base <- do.call(QueueRanked, base_arglist)
setDT(QR_base)
# View some results
ncol(QR_base)
# 5
nrow(QR_base)
# 639
cbind(mapply(class, QR_base), mapply(length, QR_base))
# [,1] [,2]
# name "character" "639"
# url "character" "639"
# uniquevisitors "numeric" "639"
# segment.id "factor" "639" # <- This is a bit strange and unexpected, as an aside...
# segment.name "factor" "639" # <- ...we get a return type of "factor" when no segment.id provided
# New segment -------------------------------------------------------------
# Made a segment, as I want to pull only a subset of the subject areas in the set of
# 639. Here is what the segment looks like
# Note: Using some stuff from my in-dev Segments package, since I thought it might be
# helpful to get a sense of the segment contents
# NB: Only 17 of the 18 subject areas will actually be found in the time frame used
# throughout (as you will see below)
seg_class_additional <- call.Get_Segments(selected = "s300000520_58abadb4e4b06910029e4ae3",
fields = c("definition", "description", "modified")) %>%
SCAdmin:::flatten_container(.)
str(seg_class_additional)
# List of 2
# $ segment_meta:'data.frame': 1 obs. of 4 variables:
# ..$ id : chr "s300000520_58abadb4e4b06910029e4ae3"
# ..$ name : chr "Daniel_SA_hits"
# ..$ description: chr "18 subject areas"
# ..$ modified : chr "2017-02-20 19:02:12"
# $ L0 :List of 2
# ..$ cont_meta:'data.frame': 1 obs. of 2 variables:
# .. ..$ type : chr "hits"
# .. ..$ operator: chr "or"
# ..$ cont_rule:List of 1
# .. ..$ :'data.frame': 18 obs. of 5 variables:
# .. .. ..$ name : chr [1:18] "Subject Areas [v28]" "Subject Areas [v28]" "Subject Areas [v28]" "Subject Areas [v28]" ...
# .. .. ..$ element : chr [1:18] "evar28" "evar28" "evar28" "evar28" ...
# .. .. ..$ classification: chr [1:18] "Subject Areas [v28]" "Subject Areas [v28]" "Subject Areas [v28]" "Subject Areas [v28]" ...
# .. .. ..$ operator : chr [1:18] "equals" "equals" "equals" "equals" ...
# .. .. ..$ value : chr [1:18] "chemistry & chemical engineering" "biochemistry- biology & biotechnology" "food science" "pharmaceuticals- cosmetics & toiletries" ...
# Make a new arglist ------------------------------------------------------
# Now, let's try the test
# We want to use the classification and element to pull only the target subject areas
# Additionally, we want to pull, for each of the selected subject areas, the top 10
# products within each subject area
new_arglist <- base_arglist
new_arglist$segment.id <- "s300000520_58abadb4e4b06910029e4ae3"
new_arglist$elements <- c(targ_evar, targ_elem)
new_arglist$top = c(50, 10) # we go a bit high on the top for subject area, just to be safe
str(new_arglist)
# List of 8
# $ reportsuite.id: chr "elsevier-kn-prod"
# $ date.from : chr "2016-01-01"
# $ date.to : chr "2017-01-01"
# $ elements : chr [1:2] "evar28" "evar75"
# $ classification: chr "Subject Areas [v28]"
# $ metrics : chr "uniquevisitors"
# $ top : num [1:2] 50 10
# $ segment.id : chr "s300000520_58abadb4e4b06910029e4ae3"
# Make the call
QR_new <- do.call(QueueRanked, new_arglist)
setDT(QR_new)
# View some results
ncol(QR_new)
# 5
nrow(QR_new)
# 170
cbind(mapply(class, QR_new), mapply(length, QR_new))
# [,1] [,2]
# evar28: Subject Areas [v28] "character" "170"
# evar75 "character" "170"
# uniquevisitors "numeric" "170"
# segment.id "character" "170"
# segment.name "character" "170"
# And see if we have 17 unique subject areas, each with 10 products
QR_new[, uniqueN(evar75), by = c("evar28: Subject Areas [v28]")][, .(n_row = nrow(.SD), N = unique(V1))]
# n_row N
# 1: 17 10 # all good!
# Now for the check -------------------------------------------------------
# Let's see if comparing apples to apples, in terms of subject area,
# works out, via a simple sum of comparable subject areas
# Note that even though we ask for only the first 10 products in the new call,
# the sums should still total up, as the products that are beyond a rank of 9
# are rolled up into an aggregate "::other::" item, so we really have 9 top products
# plus an "::other::", per subject area.
# Subset the full (first) call result set by the subject areas in the test call result set
QR_base_subset <- QR_base[name %in% QR_new$`evar28: Subject Areas [v28]`]
# Sum of test call result set
QR_new[, sum(uniquevisitors)] # 664587
# Sum of subset (first) result set
QR_base_subset[, sum(uniquevisitors)] # 664587
# and to double-confirm:
identical(QR_new[, sum(uniquevisitors)], QR_base_subset[, sum(uniquevisitors)]) # TRUE
##################
# Also does not seem to matter if we "fill" classification with e.g. "" or not
new_arglist_2 <- new_arglist
new_arglist_2$classification <- c(targ_class, "")
# str(new_arglist_2)
# List of 8
# $ reportsuite.id: chr "elsevier-kn-prod"
# $ date.from : chr "2016-01-01"
# $ date.to : chr "2017-01-01"
# $ elements : chr [1:2] "evar28" "evar75"
# $ classification: chr [1:2] "Subject Areas [v28]" "" # <- appending a "" here
# $ metrics : chr "uniquevisitors"
# $ top : num [1:2] 50 10
# $ segment.id : chr "s300000520_58abadb4e4b06910029e4ae3"
QR_new_2 <- do.call(QueueRanked, new_arglist_2)
Map(all.equal, QR_new, QR_new_2)
# $`evar28: Subject Areas [v28]`
# [1] TRUE
#
# $evar75
# [1] TRUE
#
# $uniquevisitors
# [1] TRUE
#
# $segment.id
# [1] TRUE
#
# $segment.name
# [1] TRUE Session Info: devtools::session_info()
Session info ------------------------------------------------------------------------------------------------------------------------------------
setting value
version R version 3.3.2 (2016-10-31)
system x86_64, mingw32
ui RStudio (1.0.136)
language (EN)
collate English_United States.1252
tz America/New_York
date 2017-03-01
Packages ----------------------------------------------------------------------------------------------------------------------------------------
package * version date source
assertthat 0.1 2013-12-06 CRAN (R 3.3.1)
base64enc 0.1-3 2015-07-28 CRAN (R 3.3.0)
curl 2.3 2016-11-24 CRAN (R 3.3.2)
data.table * 1.10.4 2017-02-01 CRAN (R 3.3.2)
DBI 0.5-1 2016-09-10 CRAN (R 3.3.1)
devtools * 1.12.0 2016-06-24 CRAN (R 3.3.2)
digest 0.6.12 2017-01-27 CRAN (R 3.3.2)
dplyr 0.5.0 2016-06-24 CRAN (R 3.3.1)
git2r 0.16.0 2016-11-20 CRAN (R 3.3.2)
httr 1.2.1 2016-07-03 CRAN (R 3.3.1)
jsonlite * 1.3 2017-02-28 CRAN (R 3.3.2)
magrittr * 1.5 2014-11-22 CRAN (R 3.3.1)
memoise 1.0.0 2016-01-29 CRAN (R 3.3.1)
plyr 1.8.4 2016-06-08 CRAN (R 3.3.1)
purrr 0.2.2 2016-06-18 CRAN (R 3.3.1)
R6 2.2.0 2016-10-05 CRAN (R 3.3.2)
Rcpp 0.12.8 2016-11-17 CRAN (R 3.3.2)
readxl 0.1.1 2016-03-28 CRAN (R 3.3.0)
RSiteCatalyst * 1.4.10.20170301 2017-03-02 Github (randyzwitch/RSiteCatalyst@565e2d5)
SCAdmin * 0.0.0.9000 2017-03-01 local (@ef9da04)
stringi 1.1.2 2016-10-01 CRAN (R 3.3.2)
stringr * 1.2.0 2017-02-18 CRAN (R 3.3.2)
tibble 1.2 2016-08-26 CRAN (R 3.3.1)
withr 1.0.2 2016-06-20 CRAN (R 3.3.1)
wzMisc 0.1.033 2017-02-15 Github (slin30/wzMisc@8f74649)
xml2 1.1.1 2017-01-24 CRAN (R 3.3.2) |
There's a lot going on here @slin30....are you saying that the fix_na branch works in your scenario, where prior API calls failed? If this is the fix, I'd love to get it merged in, as I'm going to post a new version to CRAN soon before Adobe Summit |
Yes, the fix_na branch works in my scenario, where the call using the master branch failed, with the same error others have reported. I am cautiously optimistic that others will confirm in like manner once they've had a chance to test their scenarios. And apologies for the super long, verbose response. A lot of this was documented on my side previously, and I felt that extra caution (and transparency) was warranted since I thought of the fix on the commute home... |
Not a problem on the length, I was more reacting to "quite promising"...thought you might have found a place where this new code doesn't work. As I look at the logic, I'm wondering if this wasn't the original intent: > nchar("")
[1] 0
> is.na("")
[1] FALSE In this case, nchar is testing that the string isn't a zero-length string, which can be a convention in this library for a default argument. So is.na might not be enough, but rather a 3rd condition might need to be added to the IF statement to ensure the same behavior as before. @WillemPaling, do you remember the history behind how this function got these arguments? https://github.com/randyzwitch/RSiteCatalyst/blob/master/R/BuildInnerBreakdownsRecursively.R#L84-L88 |
@randyzwitch, yep, re: "quite promising," I meant to imply that I could only speak for my own specific scenario now working, but no idea how/if this addresses other scenarios. |
The more I look at it, I'm starting to think the correct line should be switching the for(i in 1:nrow(elements)) {
if(!is.na(elements[i,]$classification) && nchar(elements[i,]$classification)>0) {
elements.named[i] <- paste0(elements[i,]$id,": ",elements[i,]$classification)
}
} In this scenario, the logic would be checking to see if the value is not NA, and if not, that the string is not "" (i.e. more than 0 characters). This makes more sense than the prior idea of adding a 3rd predicate to keep checking "nullness" |
That makes perfect sense-- had been thinking myself that there is no situation where we should encounter |
I was able to replicate the problem, and more importantly, fix it using the following calls:
The fix is currently pushed to the fix_na branch. When it passes, I'll merge it into master, and then this will be the last thing I fix before pushing a new version to CRAN. So if people could test this out today, that would be great, so that I have one more chance to fix before submitting to CRAN. |
Just reran the full script from my test of the previous fix_na branch and can confirm that the results are identical with the new fix_na branch. All seems to be working on my end. Session info -----------------------------------------------------------------------
setting value
version R version 3.3.2 (2016-10-31)
system x86_64, mingw32
ui RStudio (1.0.136)
language (EN)
collate English_United States.1252
tz America/New_York
date 2017-03-03
Packages ---------------------------------------------------------------------------
package * version date source
base64enc 0.1-3 2015-07-28 CRAN (R 3.3.0)
curl 2.3 2016-11-24 CRAN (R 3.3.2)
data.table * 1.10.4 2017-02-01 CRAN (R 3.3.2)
devtools * 1.12.0 2016-06-24 CRAN (R 3.3.1)
digest 0.6.12 2017-01-27 CRAN (R 3.3.2)
httr 1.2.1 2016-07-03 CRAN (R 3.3.2)
jsonlite * 1.3 2017-02-28 CRAN (R 3.3.2)
magrittr * 1.5 2014-11-22 CRAN (R 3.3.0)
memoise 1.0.0 2016-01-29 CRAN (R 3.3.0)
plyr 1.8.4 2016-06-08 CRAN (R 3.3.1)
R6 2.2.0 2016-10-05 CRAN (R 3.3.2)
Rcpp 0.12.8 2016-11-17 CRAN (R 3.3.2)
readxl 0.1.1 2016-03-28 CRAN (R 3.3.0)
RSiteCatalyst * 1.4.10.20170303 2017-03-03 Github (randyzwitch/RSiteCatalyst@f8141e0)
SCAdmin * 0.0.0.9000 2017-03-03 git (@eb5d21f)
stringi 1.1.2 2016-10-01 CRAN (R 3.3.2)
stringr * 1.2.0 2017-02-18 CRAN (R 3.3.2)
withr 1.0.2 2016-06-20 CRAN (R 3.3.1)
wzMisc 0.1.033 2017-02-15 Github (slin30/wzMisc@8f74649)
xml2 1.1.1 2017-01-24 CRAN (R 3.3.2) |
I used a simple QueueRanked query with multiple elements and a single classification:
QueueTrended(reportsuite.id = "ag-adi-eu-prod",
date.from = "2016-02-01",
date.to = "2016-02-01",
top ="1000",
metrics = c("visits"),
elements = c("evar5"),
classification = "Country (c)",
date.granularity = "week", segment.id = "", data.current = TRUE,
expedite = FALSE, interval.seconds = 5, max.attempts = 10000,
validate = TRUE, enqueueOnly = FALSE)
However using multiple elements:
QueueTrended(reportsuite.id = "ag-adi-eu-prod",
date.from = "2016-02-01",
date.to = "2016-02-01",
top ="1000",
metrics = c("visits"),
elements = c("evar5","mobiledevicetype"),
classification = "Country (c)",
date.granularity = "week", segment.id = "", data.current = TRUE,
expedite = FALSE, interval.seconds = 5, max.attempts = 10000,
validate = TRUE, enqueueOnly = FALSE)
This will result in the following error:
Error in if (!is.null(elements[i, ]$classification) && nchar(elements[i, :
missing value where TRUE/FALSE needed
It seems using classification only works on a single element instead of the first element.
The text was updated successfully, but these errors were encountered: