Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

map(f, c) return type #15342

Closed
lmshk opened this issue Mar 3, 2016 · 26 comments
Closed

map(f, c) return type #15342

lmshk opened this issue Mar 3, 2016 · 26 comments
Labels
types and dispatch Types, subtyping and method dispatch

Comments

@lmshk
Copy link

lmshk commented Mar 3, 2016

I was surprised to find that map(identity, Set([1, 2])) returns an Array. Further, even if I write map(identity, Set{Int}([1, 2])), the result is an Array{Any, 1}.

The documentation for map states:

Transform collection c by applying f to each element.

I expected "transform collection" to mean that the result is a collection of the same outer type as c. Of course, since "transform collection" is not defined exactly, this is not promised here. But from the documentation, it is a bit difficult to understand what to expect of maps return value.

To find out what exactly is going to happen when I call map I have to do something like @which map(f, c) to find out which method will be called, and then I actually need to read the source. In my case (map(identity, Set())) that is map(f, iters...) at abstractarray.jl:1092, which creates a result = [] and returns it later on. The comment here says "generic map on any iterator" (shouldn't that say "iterable with defined length"?), and this seems to be the fallback case for map.

Proposals:

  1. I think that the map documentation should be changed to something along the lines of "Returns an iterable over the values obtained by applying f to each element in collection c." (If anyone relies on the result being an Array in the general case, they should wrap the call with collect.)
  2. If the result is more specific than that (e.g. for isa(c, Array{T, N})), these special cases should be listed in the documentation.
  3. There should be such a special case for Sets.

I would be willing to do this myself, but wanted to ask for opinions first.

@pkofod
Copy link
Contributor

pkofod commented Mar 3, 2016

fwiw, I would expect map(f, Set{T}([a,b, .., z])) to return Set{T}([f(a), f(b), ...f(z)]), but there may be special considerations I am not aware of. Hmm, on the other hand, with the current behaviour, you know how many elements you'll get out, and it's not really too much trouble to do

S = Set([1,2,3])
mapS = map(x-> x^2, S)
Z = Set{Int64}(mapS) # or whatever you want
# or more compactly
Z = Set{Int64}(map(x-> x^2, S))

@carnaval
Copy link
Contributor

carnaval commented Mar 3, 2016

I agree that we should use the same logic for the element type of the output container that we do for Arrays (ie, use the tightest type that can fit all the elements).

For the other point, I think I agree that you want map to send sets to sets, even though there is no formal reason for that. That's more up for discussion I guess.

@malmaud
Copy link
Contributor

malmaud commented Mar 3, 2016

Insofar as map transforms strings to strings and tuples to tuples, it seems pretty clear that it should map sets to sets.

@malmaud
Copy link
Contributor

malmaud commented Mar 3, 2016

Well actually, you would end up losing the information of how many elements in the original set were mapped to a given element in the transformed set.

@JeffBezanson
Copy link
Member

Some references:

#6844

Proposed new framework for map:
#15146

Need to decide how to generalize it to non-arrays.

@nalimilan
Copy link
Member

And a similar issue regarding dicts: #5794

@evanfields
Copy link

I'll echo others: I think it's desirable for map to take sets to sets. Certainly this was the behavior I intuitively expected. But more than that, in math notation like {f(x) | x \in S} is very common and represents a set; the most natural way to express this in Julia code would seem to be map(f, S).

Edit: below Stefan points out the new generator syntax, which is perhaps a more natural way of expressing the set comprehensions and thus weakens my argument that map should obviously take sets to sets.

It would also be nice to get detailed typed sets out, not Set{Any}. Naively I would think that if map applied to an array doesn't lose type information, neither should map applied to a set. See this recent post (mine) on julia-users for an example:

https://groups.google.com/forum/#!topic/julia-users/qyDWkutI9AY

@StefanKarpinski
Copy link
Member

Note that with the recently introduced generator syntax, you can express set comprehensions like so:

Set(f(x) for x in S)

That is very close to the mathematical set comprehension notation. (Historical note: we originally used | instead of for in set comprehension syntax, but since | is an operator, that was a bit of a parsing nightmare.) This isn't an argument against making map(f, S) produce a set when S is a set, just an observation.

@lmshk
Copy link
Author

lmshk commented Mar 4, 2016

There seems to be an agreement that map on Set{T} should result in a Set{T}. But what about my other points? My understanding is that whatever is being worked on right now (i.e. the new framework for map) will not be integrated into 0.4; since that is the stable version for some time to come, should its documentation be updated to clarify the current expected behavior?

@nalimilan
Copy link
Member

Feel free to make a pull request with suggestions to improve the docs. But it's not that easy to do since the behavior is currently inconsistent: sometimes the type is preserved (like for tuples, strings, bit arrays), but the fallback is to return an Array, and several core types do that. I guess you could simply mention that fallback, without listing the types for which it applies.

@StefanKarpinski
Copy link
Member

This should definitely be documented for 0.4. For the future, having map on set produce a set seems like reasonable behavior, but I'm much more concerned about having a coherent overall policy. So the question is what is the general behavior of map on all kinds of collections and how can we implement it generically. The issues @JeffBezanson referenced are working towards such a framework.

@lmshk
Copy link
Author

lmshk commented Mar 4, 2016

Ok, I will make a pull request for the documentation change when I get back from vacation next weekend.

Regarding the general behavior of map: I think it is difficult to have it work on any iterable. For instance, what should be the result of map(x -> x * x, 1:10) where the result cannot possibly be of the same (outer) type? As long as map does not always return a generic iterable, I would argue that it is better to raise an error in this case, because silently changing the type can lead to non-obvious problems.

@nalimilan
Copy link
Member

As long as map does not always return a generic iterable, I would argue that it is better to raise an error in this case, because silently changing the type can lead to non-obvious problems.

That would be terribly annoying. Better continue returning an array in that case. I don't see it as a problem since this behavior is quite predictable: it only depends on the type of the input, so it's unlikely to go unnoticed during development/testing.

@JeffBezanson
Copy link
Member

Disallowing map(f, 1:10) is not great, since to get around that you'd have to expand the range to a dense vector, which is unnecessarily inefficient.

@tkelman
Copy link
Contributor

tkelman commented Mar 5, 2016

collect(f(x) for x in 1:10) is another way around that

@toivoh
Copy link
Contributor

toivoh commented Mar 5, 2016

It does make some kind of sense to map sets to sets, but I too wonder which kind of collection types would get this special treatment and which would continue to produce array results. It would be nice to say that we currently have a simple rule that map always produces an Array, but there's actually already some exceptions, e.g.:

  • map on tuples returns tuples
  • map(f, s) on strings returns strings and errors out if f does not return chars - not so sure if this is desirable though
  • map(f, as...) on BitArrays returns a BitArray if f returns boolean values

But sets are even more different in that they are not ordered or indexable. Maybe there should be another function for mapping an iterable into a set.

It would have been nice to have a version of map where one could specify the output container type. If it would be filled with push! then it should work for sets as well.

@nalimilan
Copy link
Member

It could be useful to allow specifying the return container type to map, but by default I think it makes sense to preserve the type, will Array as a fallback when it's not possible (e.g. ranges). We could also make the behavior for strings more consistent with that for BitArray: when the function does not return a Char, map should just return an array.

@eschnett
Copy link
Contributor

eschnett commented Mar 5, 2016

It's important to have a map function (or equivalent) that preserves the container type. There are containers that hold additional information, not just the elements they contains, and returning an array loses this information. For example, a tree has a particular structure, or there could be a wrapper for an array that keeps access statistics, or remembers the minimum and maximum of its elements. Regenerating this additional information might be expensive or impossible.

Finally, there's the obvious case to be made for map! that doesn't change the container type. Introducing a difference between the return types of map and map! seems strange. If there's a need for a map-like function that always returns an array, then it should be called mapcollect.

@toivoh
Copy link
Contributor

toivoh commented Mar 5, 2016

I'm just wary of all the trouble we've had with trying to preserve array types for array operations using similar, which has turned out to be a not very well defined operation. But since map always maps elements to elements I suppose it's a bit simpler.

@eschnett
Copy link
Contributor

eschnett commented Mar 5, 2016

I'm surprised to hear that similar is difficult to implement. I'm curious. Can you elaborate, or point me to a discussion or PR?

@nalimilan
Copy link
Member

@eschnett You missed that debate? See #11574 and linked issues. Have a good read! :-)

@eschnett
Copy link
Contributor

eschnett commented Mar 5, 2016

I see. I've encountered the same problem at other occasions. My solution
was to define, for each collection, the inverse of eltype. C++ containers
also have this functionality. I sometimes call this function tycon, which
is haskellese for "type constructor": It takes the element type as input
and returns the collection type.

So tycon(Array, Nullable{Int}) would return NullableArray{Int} etc.

One could wrap this into the definition of similar, which would mean that
each collection type might need to overload it.

-erik

On Saturday, March 5, 2016, Milan Bouchet-Valat notifications@github.com
wrote:

@eschnett You missed that debate? See #11574 and linked issues. Have a
good read! :-)


Reply to this email directly or view it on GitHub.<
https://ci5.googleusercontent.com/proxy/7y6i1vM2wVgpqHvnw7ZAq7LX403acda7R-7Pxu0CW_vHssrjbGmnCeb4SgVcMZZvxtU6uN8wmGGaOmJcM5pHsxyQwMV7HnkZ3j9Szzy-H-KzTZp7I7gz6a8ksoHCNGiMiWIETa7yY53lab8toH7S9G7FYmuCcA=s0-d-e1-ft#https://github.com/notifications/beacon/AANCCqd7YmIoMPHPIvAFhGkiUtA27dsVks5pqbadgaJpZM4HoeJv.gif

Erik Schnetter schnetter@gmail.com
http://www.perimeterinstitute.ca/personal/eschnetter/

@toivoh
Copy link
Contributor

toivoh commented Mar 6, 2016

Another thing to consider is how to handle map(f,a,b) where a and b are of different container types. It seems to me that this should be disallowed completely for sets, since they are unordered. Otherwise, I guess we could promote the container types together, if we first remove the element type parameters from them. Hmm, feels like this is getting complicated pretty fast...

I'm not sure it will be easier with map's dynamic approach to determining the output element type either.

@eschnett
Copy link
Contributor

eschnett commented Mar 6, 2016

The simplest case of a map with multiple arguments is zip. It should be possible to implement zip(a,b) as map((x,y) -> (x,y), a, b), or map(f, a,b) via zip (as map(f,a,b) = map(xy->f(xy[1],xy[2]), zip(a,b)).

Personally, I favour the restriction that all collections passed to map must have the same type and same structure (i.e. same array shapes etc.). However, zip is already more generous than that.

Yes, passing multiple sets to map does not make sense.

JeffBezanson added a commit that referenced this issue Mar 18, 2016
this is a possible fix for #15342

type-accumulating Dict constructor
JeffBezanson added a commit that referenced this issue Mar 18, 2016
this is a possible fix for #15342

type-accumulating Dict constructor
JeffBezanson added a commit that referenced this issue Mar 23, 2016
this is a possible fix for #15342

type-accumulating Dict constructor
JeffBezanson added a commit that referenced this issue Mar 24, 2016
this is a possible fix for #15342

type-accumulating Dict constructor
JeffBezanson added a commit that referenced this issue Mar 24, 2016
this is a possible fix for #15342

type-accumulating Dict constructor
JeffBezanson added a commit that referenced this issue Mar 24, 2016
…traits

make HasEltype and HasLength the default

this is a possible fix for #15342

type-accumulating Dict constructor

some tests for type accumulation in `map` and `Dict`
JeffBezanson added a commit that referenced this issue Mar 24, 2016
…traits

make HasEltype and HasLength the default

this is a possible fix for #15342

type-accumulating Dict constructor

some tests for type accumulation in `map` and `Dict`
JeffBezanson added a commit that referenced this issue Mar 28, 2016
…traits

make HasEltype and HasLength the default

this is a possible fix for #15342

type-accumulating Dict constructor

some tests for type accumulation in `map` and `Dict`
@yuyichao
Copy link
Contributor

yuyichao commented Apr 7, 2016

This issue seems "fixed"? Although the new behavior have caused other issues. Ref #15789.

@kshyatt kshyatt added the types and dispatch Types, subtyping and method dispatch label Jul 28, 2016
@pabloferz
Copy link
Contributor

I believe this is fixed. And the issues with mapping over Dict should be fixed by #15789 and #17968, right?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
types and dispatch Types, subtyping and method dispatch
Projects
None yet
Development

No branches or pull requests