-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Restrict transform to columns? #81
Comments
Can you elaborate on why the large cache is an issue in this scenario? Is it because you expected more performance? Or because the memory footprint is too high for the application? Assuming that it is indeed an issue, would it make sense to "materialize" the pipeline at a specific step? For example, you can always drop the cache by evaluating the pipeline with I am asking these questions because we could consider adding a column selection feature to all |
It is not really an issue for my application since the amount of data is not large. In my use case I save the cache of the pipeline to later use it on new data samples. Saving/Using the cache of Thus one needs to carefully split up the pipeline and only save/use the correct cache which is not so nice. Problem with julia> p = Select(:a) → MinMax()
Sequential(TableTransforms.Transform[Select{Tuple{Symbol}}((:a,)), Scale{Int64}(0, 1)])
julia> t = (a=rand(10), c=rand(10));
julia> _, cache = apply(p, t);
julia> z = reapply(p, (a=rand(3), c=rand(3)), cache);
julia> length(revert(p, z, cache).c)
10 |
I think the issue we are facing here is more profound. It has to do with the fact that Select's revertibility is tied to a specific input table. When you reapply the pipeline to a new table the cache isn't modified and so you cannot "unselect" the columns of the new table that were never stored. I wonder what could be done differently? What is your proposal for this new |
@antholzer I will close this issue, but feel free to reopen it if you have a proposal that we could brainstorm further. |
At the moment in order to apply a
Colwise
transformation to a selected column(s) one has to useSelect
or is there another way?I am finding that
Select
has some drawbacks when used in this way since it results in all the non selected columns being put into the cache but for e.g. the following pipeline this would not be necessaryIf I have a large pipeline/table, getting a large cache could be annoying . There is also an issue with
revert
#80.So would it make sense to introduce a wrapper transform (or something else), where we give it a subset of columns, that only applies the (
Colwise
) transform to the subset of columns? For the pipeline above this could for example look like the followingThe text was updated successfully, but these errors were encountered: