Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

map over several tuples demands equal length #42216

Closed
mcabbott opened this issue Sep 11, 2021 · 14 comments · Fixed by #49336
Closed

map over several tuples demands equal length #42216

mcabbott opened this issue Sep 11, 2021 · 14 comments · Fixed by #49336
Labels
breaking This change will break code

Comments

@mcabbott
Copy link
Contributor

Instead of stopping early like zip, this is an error:

julia> map(+, (1,2,3), (4,5))
ERROR: BoundsError: attempt to access Tuple{} at index [1]
Stacktrace:
 [1] getindex(t::Tuple, i::Int64)
   @ Base ./tuple.jl:29
 [2] map (repeats 3 times)
   @ ./tuple.jl:250 [inlined]
 [3] top-level scope
   @ REPL[32]:1

julia> map(+, (1,2,3), [4,5])  # what I expected
2-element Vector{Int64}:
 5
 7

Is this a bug? It seems to be present at least since 1.0. The docstring for map says

For multiple collection arguments, apply f elementwise, and stop when when any of them is exhausted.

but I'm not sure I can quote it as authoratative, since I think I wrote it.

@Seelengrab
Copy link
Contributor

Seelengrab commented Sep 12, 2021

Copying from slack:

If it is, it could be made consistent by adding these:

julia> @eval Base map(f, ::Tuple{}, ::Tuple{Any}) = ()
map (generic function with 61 methods)                
                                                      
julia> @eval Base map(f, ::Tuple{Any}, ::Tuple{}) = ()
map (generic function with 62 methods)                
                                                      
julia> map(+, (1,2,3), (4,5))                         
(5, 7)                                                

though that won't fix the 3+ argument case. Might just need a any(isempty, tails) ? () : map(f, tails...) in the right place for that, unsure how much inference likes this.

@timholy
Copy link
Member

timholy commented Sep 12, 2021

I love that it throws an error, and I wish zip did too.

@jakobnissen
Copy link
Contributor

For usability, I think it's great that it throws an error, and it would be nice if map did that on arrays as well. Better to be strict by default - and when you're mapping over N containers, it's presumably the intention that every element of every container participates in the mapping. It's much nicer to have map error because you violated an assumption about your data instead of tracking down an error caused by map silently only mapping some of the elements of the collection.

The docstring was changed after the release of 1.6, so haven't appeared in any released Julia versions, so I don't think it's breaking to change it back.

Arguably, it's the correct behaviour to error, since the canonical docstring says "[ ... ] by applying f to each element". If it only applies to the first elements, it doesn't apply each element.

@mcabbott
Copy link
Contributor Author

mcabbott commented Sep 12, 2021

For once I actually wanted the early-stopping behaviour on tuples, yesterday. The fact that apparently nobody else did, for years, does make it seem like this isn't widely used. (And it's obviously easy to roll your own.)

Agree that the stricter behaviour would avoid some surprises. But changing it for arrays (and iterators) sounds to me like it would be breaking. It just collects Base.Generator(+, [1,2,3], [4,5]), which contains zip; I'm not sure that the fact that nobody got around to updating the docstring until recently means all that much.

For arrays note also that map(+, ones(2,2), ones(2,1)) is an error, while map(+, ones(2,2), ones(2,1,1)) is a zip. (Same since 1.0, but also no docstring until recently.)

@mcabbott
Copy link
Contributor Author

The present behaviour for map on Tuples is tested to give an error here:
https://github.com/JuliaLang/julia/blob/master/test/tuple.jl#L265

I don't actually see a test for map on arrays / iterators stopping early. (Although I have not run all tests.) Would it be crazy to change that to demand they all have the same length, or same size even? That would certainly be simpler to think about.

I doubt that zip could change its default, but it could gain an option. Then something like map(f, iters...) = collect(Generator(splat(f), zip(iters...; strict=true))) would be a generic map which behaves more like the Tuple case does now.

@stevengj
Copy link
Member

For zip, see the discussion at #20499

@o314
Copy link
Contributor

o314 commented Sep 22, 2021

I love that it throws an error, and I wish zip did too.

python, ruby, haskell, rust ... do no error when lengths mismatch on zip call, so i think this is not a good reco.

@timholy
Copy link
Member

timholy commented Sep 23, 2021

zip definitely can't change its default, at least not until Julia 2.0. Gaining a strict=true option seems like the way to go here.

@oxinabox
Copy link
Contributor

For map we might want to introduce a map_shortest that does the truncating behavior.
I likewise wish that zip errored, and we had a zip_shortest which has the current behavior.

@mcabbott
Copy link
Contributor Author

mcabbott commented Oct 11, 2021

I agree that two versions of zip would be nice. Doesn't matter much what they are called.

But the question of this issue is map, whose use of zip as an implementation detail. On tuples it is strict, but on other things it stops early:

julia> map(+, 1:2, 3:5000)
2-element Vector{Int64}:
 4
 6

That difference of behaviour seems really surprising to me. Whichever you prefer, they should surely agree. To fix it, we can change map(f, Tuple...) to stop early since that's an error now. But it sounds like nobody wants that. The other possible fix is to change map(f, iters...) not to stop early. Can we do this? After running PkgEval? Or is it too obviously breaking?

Note also that map on multi-dimensional arrays doesn't stop early. More surprisingly, zip also doesn't:

julia> map(+, [1 2], [3 4; 5 6])
ERROR: DimensionMismatch("dimensions must match: a has dims (Base.OneTo(1), Base.OneTo(2)), b has dims (Base.OneTo(2), Base.OneTo(2)), mismatch at 1")

julia> collect(zip([1 2], [3 4; 5 6]))
ERROR: DimensionMismatch

If we add an option to zip like zip(...; strict=true), we could surely make a similar option work for map too. But it still shouldn't have different defaults on different argument types.

@mcabbott
Copy link
Contributor Author

It seems that map on two vectors only gained this early stopping behaviour in Julia 1.5. Before that, it was an error -- although the case of two iterators still allowed it:

julia> map(+, 1:2, 3:5000)
ERROR: DimensionMismatch("dimensions must match: a has dims (Base.OneTo(2),), b has dims (Base.OneTo(4998),), mismatch at 1")

julia> map(+, 1:2, (3,4,5,6,7))
2-element Array{Int64,1}:
 4
 6

julia> VERSION
v"1.4.2"

Is that evidence that we could change it to be an error?

@JeffBezanson
Copy link
Member

I have always been a fan of allowing different lengths. But here's a point from triage: it might make sense to check up front and require length/shape matches for anything that has a definitive shape like an array, but for iterators allow mismatches. The reasoning is that with iterators, you have no choice but to start mapping, and only later find out whether the lengths matched, and it seems silly to "throw away" all the work.

@JeffBezanson
Copy link
Member

Also from triage: since we can't make this fully consistent now in a non-breaking way, we should probably change the tuple case to match other cases of map and zip like the docs say.

@StefanKarpinski
Copy link
Member

StefanKarpinski commented Mar 21, 2022

Slight correction (I think): we can make it consistent, but only in the direction of making it more permissive. We already allow mismatched lengths for arrays and iterators in both map and zip. We also allow mismatched tuple lengths in zip but require matching tuple lengths in map only, i.e. this is an error: map(tuple, (1, 2), (3, 4, 5)). And an inscrutable, probably unintentional one at that. We can make this consistent across the board by just fixing map with tuples to allow different lengths.

If we want a path to stricter length checking, we can do that by adding a strict::Bool keyword option to both map and zip that forces lengths to be the same. In Julia 1.x, the default would have to be false everywhere; in Julia 2.0, we could change the default to be true when all of the arguments have known length and false when some of the arguments had infinite or unknown length. I think that if strict is explicitly passed as true we should error if the lengths don't match even if we can only discover that after the zipping/mapping is done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking This change will break code
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants