-
Notifications
You must be signed in to change notification settings - Fork 371
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement permutedims #2447
Implement permutedims #2447
Conversation
Thank you for working on it. To get a safe eltype of the resulting columns using The issues you raise are exactly the things that I expected/feared (i.e. that it would be hard to find the optimal solution) |
Aright, doing a bit more on this.
Results are for This is basically the same as before.
A couple of new observations:
With both copycols:
Both give columns of I'm not totally clear on where to add the
I had some ideas with
Very possible I'm missing something. |
Yes. To avoid promotion you would have to write something like So the conclusion is that In general an input from @nalimilan would be welcome on naming kwargs in this function (he is far superior than me in such things). |
Yes, by a fair amount.
This makes sense to me. Do you have a sense of whether the default for promotion should be |
Not sure - @nalimilan seemed to prefer Having said that I would go for |
Open questions
I wrote some docs and tests, but will wait to finalize until the above are answered. Any additional feedback appreciated |
I'm getting some weird segfaults trying to test locally that don't appear to be happening on CI. Is it possible that Revise is causing issues? |
It is necessary as the name of
Not sure - it should not be an issue. |
Great points, thanks for review! I will try to address early next week 👍 |
Note that in theory
I'm not sure how useful the second option could be in practice, but we could consider supporting it too, for example by allowing to pass any function to compute the type to an argument called |
So maybe let us make a |
This is what I thought you would prefer (that is why I have raised it). Let us then switch to |
Agree it's a pity, but makes sense. Main downside from my perspective is discoverability - what do we think about defining something like:
|
I think it is a good idea. |
src/abstractdataframe/reshape.jl
Outdated
m = permutedims(Matrix(df_notsrc)) | ||
df_tmp = rename!(DataFrame(Tables.table(m)), df[!, src_namescol], makeunique=makeunique) | ||
end | ||
return hcat!(df_permuted, df_tmp, copycols=false) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds OK. Though people may be surprised if they do e.g. permutedims(DataFrame(x=[1, 2], y=[2, 3]))
and get an error about src_colnames
as they didn't specify it. Could be worth having a specific error message for that method.
On a second thought maybe requiring to pass it (like the proposal Especially that it now occurred to me that in the future we might want |
You mean like
Or should the column names also become a new column? |
I am not sure - this is exactly what I was thinking of and both options seemed plausible. The point is that we might change our minds what is good in the future so it is better to be conservative now (have a look at #2464 where we started discussing things that worked for years and no one complained about them). |
TBH I don't 100% follow the relevance of that PR. Are you just saying that in the PR you're working on "fixing" things that have been around for years that people didn't complain about? Regardless, I definitely agree that being conservative is the right approach. But I'm not sure what the conservative version is in this case. Should we include a Just so I'm clear, the new behavior being proposed is:
|
No, I am saying that it is hard to get consensus what should be the behaviour of some functions. |
What I am considering if we should allow |
Ahh, got it. That makes sense. Ok, will take another look tomorrow |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good except for three places in the documentation where I left comments. Thank you!
@nalimilan - I am OK to merge this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, looks good apart from a few small suggestions.
Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr>
I will merge the PR once the CI passes. Thank you! |
Woo! Thank you both for thoughtful and detailed review! Really appreciate it :-) |
Thank you! |
Figured I'd get a jumpstart on Hacktoberfest, and this is something I've wanted in DataFrames for a longtime. This is an attempt to address #2420
todo
So far
I implemented 3 versions of
transpose
:transpose1
copies @tbeason 's stack/unstack method from the issue abovetranspose2
creates aMatrix
from everything except the indexing column, and doespermutedims
on ittranspose3
builds column-by-column by iterating rowsBenchmarking
First impressions
transpose1
(stack/unstack) is slower on smallDataFrame
s but scales well.transpose2
is the fastest, but might be harder to fix type issues (see below)transpose3
is pretty fast for smallDataFrame
s but doesn't scale well at allOther thoughts
All 3 versions end up with
Any
typed columns, regardless of what's in the resulting column (unless all the columns are the same type). This seems like it might be more easily fixed with 1 and 3. The behavior is also not quite what I was expecting if all of the columns are numerical:Still lots to do here, but I thought it would be worth it to get the ball rolling and see if people have any thoughts. I will probably get back to this in earnest next week.