Stop supporting broadcasting + against whole DataFrames #2483

oxinabox · 2020-10-13T13:25:11Z

@nalimilan makes a good point in #2464 (comment)_

What I find problematic is that by supporting ... df .+ mat, we consider that data frames and matrices are objects with compatible shapes.

(Cut discussion of Matrix(df) as that is a slightly different issue).

Supporting the addition of a Matrix and a DataFrame is pretty weird.
Infact even supporting broadcasted addition of two DataFrames is pretty weird.
Supporting DataFrame .+ Number is kind of OK, but still a bit odd.

These are weird for a few reasons.

DataFrames are not vector spaces, or even monads over addition. They don't define unbroadcasted +, and they don't define zero. Not a hard deal breaking, nor at Tuples or NamedTuples. But at least the elements of a Tuple or NamedTuple are garneteed to either be monads over addition at the type level, or they are not. Still not too bad, since DataFrames are dynamic and so nothing is represented at type-level
DataFrame columns are distinct in content (and ofen type), its pretty rare to want to do the same thing to all of them. You want to do it against a column (this is particularly against the scalar case)
DataFrames are dynamic in Size and shape, Matrixes are generally not. It feels odd to add to something that just for now is the right size. Again not a hard rule, since we also add Vectors to Tuples. (particularly againt the case of DataFrame .+ Matrix)
order often matters a lot for a DataFrame, often you will has a column that is effectively an index. A more natural operation to me feels like it should be expressed as a kind of innerjoin. Though there is a performance aspsect here: like we have hcat and innerjoin.
Its breaking to remove this behavour, but nonbreaking to add it, so we should consider dropping it before 1.0 if all other things being equal. Is anyone using this?

I assume we ended up with this behavour not with intent for +, but as a side effect of supporting broadcasting for something that does make sense.
Not sure what, maybe string.(df)
I suspect because of this use case (what ever it is), its actually really annoying to remove broadcasting DataFrames.
If so maybe we should at least as a design tenant consider it not to be a relevant feature.
And make sure we don't have any weird examples in the docs that use it.

As whole object, I am just not sold on broadcasting.
On columns for sure that makes sense.

The text was updated successfully, but these errors were encountered:

oxinabox · 2020-10-13T13:35:22Z

I guess this comes down to "Broadcasting is about shape"
you have two things of same shape and you can broadcast them against eachother.
Regardless of anything else about the types involved

bkamins · 2020-10-13T13:48:11Z

I guess this comes down to "Broadcasting is about shape"
you have two things of same shape and you can broadcast them against eachother.
Regardless of anything else about the types involved

This is the reasoning. Actually the tension was if DataFrame should have 2-dimensional shape (as it has now) or 1-dimensional shape (collection of rows).

We have settled to make DataFrame a 2-dimensional object which, in particular, has a consequence in broadcasting.

Also I believe that broadcasting for data frames is implemented consistently as for any container having 2 dimensions so even if someone does not find it very useful at least it should not lead to confusion.

oxinabox closed this as completed Oct 13, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stop supporting broadcasting + against whole DataFrames #2483

Stop supporting broadcasting + against whole DataFrames #2483

oxinabox commented Oct 13, 2020

oxinabox commented Oct 13, 2020

bkamins commented Oct 13, 2020

Stop supporting broadcasting + against whole DataFrames #2483

Stop supporting broadcasting + against whole DataFrames #2483

Comments

oxinabox commented Oct 13, 2020

oxinabox commented Oct 13, 2020

bkamins commented Oct 13, 2020