Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug in tukey_hsd: wrong grouping labels when - present #19

Closed
IndrajeetPatil opened this issue Dec 23, 2019 · 5 comments
Closed

bug in tukey_hsd: wrong grouping labels when - present #19

IndrajeetPatil opened this issue Dec 23, 2019 · 5 comments

Comments

@IndrajeetPatil
Copy link

The grouping levels here are a-0 and b-1, but that's not what is returned:

library(rstatix)

g <- c(rep("a-0", 50), rep("b-1", 50))
x <- rnorm(100)
(df <- tibble::as_tibble(cbind.data.frame(g, x)))
#> # A tibble: 100 x 2
#>    g          x
#>    <fct>  <dbl>
#>  1 a-0   -2.08 
#>  2 a-0   -0.527
#>  3 a-0    2.44 
#>  4 a-0   -0.556
#>  5 a-0    0.886
#>  6 a-0    0.655
#>  7 a-0    0.882
#>  8 a-0   -0.262
#>  9 a-0    0.223
#> 10 a-0    0.117
#> # ... with 90 more rows

rstatix::tukey_hsd(aov(x ~ g, data = df))
#> Warning: Expected 2 pieces. Additional pieces discarded in 1 rows [1].
#> # A tibble: 1 x 8
#>   term  group1 group2 estimate conf.low conf.high p.adj p.adj.signif
#> * <chr> <chr>  <chr>     <dbl>    <dbl>     <dbl> <dbl> <chr>       
#> 1 g     1      b        -0.127   -0.521     0.267 0.524 ns

Created on 2019-12-23 by the reprex package (v0.3.0)

Session info
devtools::session_info()
#> - Session info ---------------------------------------------------------------
#>  setting  value                       
#>  version  R version 3.6.1 (2019-07-05)
#>  os       Windows 10 x64              
#>  system   x86_64, mingw32             
#>  ui       RTerm                       
#>  language (EN)                        
#>  collate  English_United States.1252  
#>  ctype    English_United States.1252  
#>  tz       Europe/Berlin               
#>  date     2019-12-23                  
#> 
#> - Packages -------------------------------------------------------------------
#>  package     * version    date       lib source                          
#>  abind         1.4-5      2016-07-21 [1] CRAN (R 3.5.0)                  
#>  assertthat    0.2.1      2019-03-21 [1] CRAN (R 3.6.0)                  
#>  backports     1.1.5      2019-10-02 [1] CRAN (R 3.6.1)                  
#>  broom         0.5.3.9000 2019-12-15 [1] local                           
#>  callr         3.4.0      2019-12-09 [1] CRAN (R 3.6.1)                  
#>  car           3.0-6      2019-12-23 [1] CRAN (R 3.6.1)                  
#>  carData       3.0-3      2019-11-16 [1] CRAN (R 3.6.1)                  
#>  cellranger    1.1.0      2016-07-27 [1] CRAN (R 3.5.1)                  
#>  cli           2.0.0.9000 2019-12-23 [1] Github (r-lib/cli@0293ae7)      
#>  crayon        1.3.4      2017-09-16 [1] CRAN (R 3.5.1)                  
#>  curl          4.3        2019-12-02 [1] CRAN (R 3.6.1)                  
#>  data.table    1.12.8     2019-12-09 [1] CRAN (R 3.6.1)                  
#>  desc          1.2.0      2019-11-11 [1] Github (r-lib/desc@61205f6)     
#>  devtools      2.2.1      2019-09-24 [1] CRAN (R 3.6.1)                  
#>  digest        0.6.23     2019-11-23 [1] CRAN (R 3.6.1)                  
#>  dplyr         0.8.3.9000 2019-10-10 [1] Github (tidyverse/dplyr@dcfc1d1)
#>  ellipsis      0.3.0      2019-09-20 [1] CRAN (R 3.6.1)                  
#>  evaluate      0.14       2019-05-28 [1] CRAN (R 3.6.0)                  
#>  fansi         0.4.0      2018-11-05 [1] Github (brodieG/fansi@ab11e9c)  
#>  forcats       0.4.0      2019-02-17 [1] CRAN (R 3.5.2)                  
#>  foreign       0.8-71     2018-07-20 [2] CRAN (R 3.6.1)                  
#>  fs            1.3.1      2019-05-06 [1] CRAN (R 3.6.0)                  
#>  generics      0.0.2      2019-03-05 [1] Github (r-lib/generics@c15ac43) 
#>  glue          1.3.1      2019-03-12 [1] CRAN (R 3.6.0)                  
#>  haven         2.2.0      2019-11-08 [1] CRAN (R 3.6.1)                  
#>  highr         0.8        2019-03-20 [1] CRAN (R 3.6.0)                  
#>  hms           0.5.2      2019-10-30 [1] CRAN (R 3.6.1)                  
#>  htmltools     0.4.0      2019-10-04 [1] CRAN (R 3.6.1)                  
#>  knitr         1.26       2019-11-12 [1] CRAN (R 3.6.1)                  
#>  lifecycle     0.1.0      2019-08-01 [1] CRAN (R 3.6.1)                  
#>  magrittr      1.5        2014-11-22 [1] CRAN (R 3.5.1)                  
#>  memoise       1.1.0      2017-04-21 [1] CRAN (R 3.6.0)                  
#>  openxlsx      4.1.4      2019-12-06 [1] CRAN (R 3.6.1)                  
#>  pillar        1.4.3      2019-12-20 [1] CRAN (R 3.6.1)                  
#>  pkgbuild      1.0.6      2019-10-09 [1] CRAN (R 3.6.1)                  
#>  pkgconfig     2.0.3      2019-09-22 [1] CRAN (R 3.6.1)                  
#>  pkgload       1.0.2      2018-10-29 [1] CRAN (R 3.6.0)                  
#>  prettyunits   1.0.2      2015-07-13 [1] CRAN (R 3.5.1)                  
#>  processx      3.4.1      2019-07-18 [1] CRAN (R 3.6.1)                  
#>  ps            1.3.0      2018-12-21 [1] CRAN (R 3.6.0)                  
#>  purrr         0.3.3      2019-10-18 [1] CRAN (R 3.6.1)                  
#>  R6            2.4.1      2019-11-12 [1] CRAN (R 3.6.1)                  
#>  Rcpp          1.0.3      2019-11-08 [1] CRAN (R 3.6.1)                  
#>  readxl        1.3.1      2019-03-13 [1] CRAN (R 3.6.0)                  
#>  remotes       2.1.0      2019-06-24 [1] CRAN (R 3.6.0)                  
#>  rio           0.5.16     2018-11-26 [1] CRAN (R 3.6.0)                  
#>  rlang         0.4.2      2019-11-23 [1] CRAN (R 3.6.1)                  
#>  rmarkdown     2.0        2019-12-12 [1] CRAN (R 3.6.1)                  
#>  rprojroot     1.3-2      2018-01-03 [1] CRAN (R 3.5.1)                  
#>  rstatix     * 0.3.1      2019-12-16 [1] CRAN (R 3.6.1)                  
#>  sessioninfo   1.1.1      2018-11-05 [1] CRAN (R 3.6.0)                  
#>  stringi       1.4.3      2019-03-12 [1] CRAN (R 3.6.0)                  
#>  stringr       1.4.0      2019-02-10 [1] CRAN (R 3.6.0)                  
#>  testthat      2.3.1      2019-12-01 [1] CRAN (R 3.6.1)                  
#>  tibble        2.1.3      2019-06-06 [1] CRAN (R 3.6.1)                  
#>  tidyr         1.0.0      2019-09-11 [1] CRAN (R 3.6.1)                  
#>  tidyselect    0.2.5      2018-10-11 [1] CRAN (R 3.5.1)                  
#>  usethis       1.5.1.9000 2019-12-12 [1] Github (r-lib/usethis@23dd62c)  
#>  utf8          1.1.4      2018-05-24 [1] CRAN (R 3.5.1)                  
#>  vctrs         0.2.1      2019-12-17 [1] CRAN (R 3.6.1)                  
#>  withr         2.1.2      2018-03-15 [1] CRAN (R 3.5.1)                  
#>  xfun          0.11       2019-11-12 [1] CRAN (R 3.6.1)                  
#>  yaml          2.2.0      2018-07-25 [1] CRAN (R 3.5.1)                  
#>  zeallot       0.1.0      2018-01-28 [1] CRAN (R 3.5.1)                  
#>  zip           2.0.4      2019-09-01 [1] CRAN (R 3.6.1)                  
#> 
#> [1] C:/Users/inp099/Documents/R/win-library/3.6
#> [2] C:/Program Files/R/R-3.6.1/library
@kassambara
Copy link
Owner

Internally, in the turkey_hsd() function, the following R code is used to separate comparisons into groups:

separate(comparison, into= c("group2", "group1"), sep = "-") 

Consequently, If group levels contain special chars, such as "-", a wrong result is returned.

A quick fix that appears in my mind is to systematically apply the function make.names() on the grouping variables before computing TukeyHSD().

Any better suggestion?

@IndrajeetPatil
Copy link
Author

This is a super-hacky solution I have adopted in my workflow but I feel like there might be a smarter way to do this-

https://github.com/IndrajeetPatil/pairwiseComparisons/blob/7665dffdbde6354232febafdf7825d05a73b183e/R/pairwise_comparisons.R#L184-L224

I am not sure how make.names would work since I am not familiar with that function.

@kassambara
Copy link
Owner

With make.names(), all invalid characters are translated to "."

Examples:

make.names(c("a b", "a-b", "10", "a;b"))

Output:

[1] "a.b" "a.b" "X10" "a.b"

@IndrajeetPatil
Copy link
Author

Hmm, that will solve the issue but, as a user, I will not be happy that the names were changed by the function. Rarely a good idea since people might want to rely on their names of choices downstream in a script where this function might be used.

@kassambara
Copy link
Owner

fixed now, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants