Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add several methods to use string cache #361

Merged
merged 5 commits into from
Aug 28, 2023
Merged

Conversation

sorhawell
Copy link
Collaborator

@sorhawell sorhawell commented Aug 10, 2023

Close #350, close #234

@sorhawell sorhawell linked an issue Aug 10, 2023 that may be closed by this pull request
@sorhawell sorhawell mentioned this pull request Aug 12, 2023
@etiennebacher etiennebacher changed the title fast fix of cat global srting cache Add several methods to use string cache Aug 17, 2023
@sorhawell
Copy link
Collaborator Author

@etiennebacher I think we can just merge this one in as is.

Copy link
Collaborator

@etiennebacher etiennebacher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sorhawell thanks! I tweaked the docs a bit, nothing major. I just have one question regarding #234: it looks like I can make it work if I use pl$enable_string_cache() before creating the DataFrame

pl$enable_string_cache(TRUE)
pl_letters_cat <- pl$DataFrame(list(a = factor(letters[1:3])))
pl_letters_cat$filter(
  pl$col("a")$is_in(pl$lit("a"))
)

shape: (1, 1)
┌─────┐
│ a   │
│ --- │
│ cat │
╞═════╡
│ a   │
└─────┘

but after resetting the cache to FALSE and creating the DataFrame in pl$with_string_cache() doesn't seem to work:

pl$with_string_cache({
  pl_letters_cat <- pl$DataFrame(list(a = factor(letters[1:3])))
})
pl_letters_cat$filter(
  pl$col("a")$is_in(pl$lit("a"))
)
Error: Execution halted with the following contexts
0: In R: in $collect():
0: During function call [pl_letters_cat$filter(pl$col("a")$is_in(pl$lit("a")))]
1: Encountered the following error in Rust-Polars:
joins/or comparisons on categoricals can only happen if they were created under the same global string cache

How should I use pl$with_string_cache()?

@etiennebacher
Copy link
Collaborator

My bad, I just saw that the whole thing should be in pl$with_string_cache():

pl$with_string_cache({
  pl_letters_cat <- pl$DataFrame(list(a = factor(letters[1:3])))
  pl_letters_cat$filter(
    pl$col("a")$is_in(pl$lit("a"))
  )
})

shape: (1, 1)
┌─────┐
│ a   │
│ --- │
│ cat │
╞═════╡
│ a   │
└─────┘

@etiennebacher etiennebacher merged commit f73fb86 into main Aug 28, 2023
9 of 11 checks passed
@etiennebacher etiennebacher deleted the fix_cat_global_cache branch August 28, 2023 16:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

pl$concat(how = "diagonal") fails with factor variable is_in() doesn't work with categorical variables
2 participants