-
-
Notifications
You must be signed in to change notification settings - Fork 404
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dplyr data_frame breaks performance #253
Conversation
performance
assertDataframe is function from @mllg very nice checkmate package |
The problem is actually sooner in mlr::predict We always name the column "truth" Which, as one can see in your example, does already fail,it is called "Fertility" in pred$data. The reason is here in mlr::predict, where "truth" is not a vector, but a data.frame
Actually ddplyr warns about this
Now we get into a "nice" argument: Do you, in the construction call actually pass a valid data.frame makeRegrTask("swiss", as_data_frame(swiss), "Fertility") My argument could be: |
PS: |
@zmjones |
@berndbischl |
@dickoa : Many thanks for the hint. I will look this up later. I guess I would need to set options(warn = 2L), then see if all our unit tests run with a dplyr tbl_df. |
Code to reproduce the bug is below. I only tested it with
cforest
so I could be wrong thatdplyr
is the only cause. I think a better way to handle this would be to strip the attributes when making the task, as this could be introducing other bugs. IsassertDataFrame
supposed to check for this sort of thing? I looked for it but couldn't find it.