Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stop supporting broadcasting + against whole DataFrames #2483

Closed
oxinabox opened this issue Oct 13, 2020 · 2 comments
Closed

Stop supporting broadcasting + against whole DataFrames #2483

oxinabox opened this issue Oct 13, 2020 · 2 comments

Comments

@oxinabox
Copy link
Contributor

@nalimilan makes a good point in #2464 (comment)_

What I find problematic is that by supporting ... df .+ mat, we consider that data frames and matrices are objects with compatible shapes.

(Cut discussion of Matrix(df) as that is a slightly different issue).

Supporting the addition of a Matrix and a DataFrame is pretty weird.
Infact even supporting broadcasted addition of two DataFrames is pretty weird.
Supporting DataFrame .+ Number is kind of OK, but still a bit odd.

These are weird for a few reasons.

  1. DataFrames are not vector spaces, or even monads over addition. They don't define unbroadcasted +, and they don't define zero. Not a hard deal breaking, nor at Tuples or NamedTuples. But at least the elements of a Tuple or NamedTuple are garneteed to either be monads over addition at the type level, or they are not. Still not too bad, since DataFrames are dynamic and so nothing is represented at type-level
  2. DataFrame columns are distinct in content (and ofen type), its pretty rare to want to do the same thing to all of them. You want to do it against a column (this is particularly against the scalar case)
  3. DataFrames are dynamic in Size and shape, Matrixes are generally not. It feels odd to add to something that just for now is the right size. Again not a hard rule, since we also add Vectors to Tuples. (particularly againt the case of DataFrame .+ Matrix)
  4. order often matters a lot for a DataFrame, often you will has a column that is effectively an index. A more natural operation to me feels like it should be expressed as a kind of innerjoin. Though there is a performance aspsect here: like we have hcat and innerjoin.
  5. Its breaking to remove this behavour, but nonbreaking to add it, so we should consider dropping it before 1.0 if all other things being equal. Is anyone using this?

I assume we ended up with this behavour not with intent for +, but as a side effect of supporting broadcasting for something that does make sense.
Not sure what, maybe string.(df)
I suspect because of this use case (what ever it is), its actually really annoying to remove broadcasting DataFrames.
If so maybe we should at least as a design tenant consider it not to be a relevant feature.
And make sure we don't have any weird examples in the docs that use it.

As whole object, I am just not sold on broadcasting.
On columns for sure that makes sense.

@oxinabox
Copy link
Contributor Author

I guess this comes down to "Broadcasting is about shape"
you have two things of same shape and you can broadcast them against eachother.
Regardless of anything else about the types involved

@bkamins
Copy link
Member

bkamins commented Oct 13, 2020

I guess this comes down to "Broadcasting is about shape"
you have two things of same shape and you can broadcast them against eachother.
Regardless of anything else about the types involved

This is the reasoning. Actually the tension was if DataFrame should have 2-dimensional shape (as it has now) or 1-dimensional shape (collection of rows).

We have settled to make DataFrame a 2-dimensional object which, in particular, has a consequence in broadcasting.

Also I believe that broadcasting for data frames is implemented consistently as for any container having 2 dimensions so even if someone does not find it very useful at least it should not lead to confusion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants