Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

find* functions and pairs()/keys() #25999

Closed
nalimilan opened this issue Feb 11, 2018 · 7 comments · Fixed by #26095
Closed

find* functions and pairs()/keys() #25999

nalimilan opened this issue Feb 11, 2018 · 7 comments · Fixed by #26095
Labels
collections Data structures holding multiple items, e.g. sets search & find The find* family of functions
Milestone

Comments

@nalimilan
Copy link
Member

find* functions have been made more general in #24673, and now use pairs (and therefore keys) to get indices for AbstractArray, AbstractDict, Tuple, NamedTuple and AbstractString types. For HasShape iterators, they use cartesian indices, and for other iterators they use linear indices.

julia/base/array.jl

Lines 1500 to 1505 in 11c08ad

_pairs(A::Union{AbstractArray, AbstractDict, AbstractString, Tuple, NamedTuple}) = pairs(A)
_pairs(iter) = _pairs(IteratorSize(iter), iter)
# includes HasShape{1} for consistency with keys(::AbstractVector)
_pairs(::Union{HasLength, HasShape{1}}, iter) = zip(1:length(iter), iter)
_pairs(::HasShape, iter) = zip(CartesianIndices(size(iter)), iter)
_pairs(::Union{SizeUnknown, IsInfinite}, iter) = zip(Iterators.countfrom(1), iter)

However, this scheme suffers from a significant limitation: even if a custom indexable type other than the ones listed above implements pairs/keys, these functions are ignored and the generic iterator fallback will be used instead. This is because there is currently no trait to identify indexable types, so the internal _pairs function has to be defined on a hardcoded set of types so that we don't try to call keys on iterators which may not implement it.

Changing this after 1.0 would be technically breaking pairs/keys would suddenly be used for types which implement these functions, and they could return different types from the fallback.

Possible solutions to fix this before 1.0:

  • Use method_exists(pairs, itr). This would work but AFAIK the compiler does not handle this efficiently at this point.
  • Add an Indexable trait, and call pairs/keys for types which implement it. This trait could also be useful in other cases (e.g. broadcast, see Broadcast had one job (e.g. broadcasting over iterators and generator) #18618).
  • Rename the internal _pairs function to something else, and recommend that types override it. This doesn't sound like a good approach since it would be mostly redundant with pairs/keys.
@nalimilan nalimilan added collections Data structures holding multiple items, e.g. sets triage This should be discussed on a triage call search & find The find* family of functions labels Feb 11, 2018
@JeffBezanson
Copy link
Member

Very good points. I think functions like find* that refer to indices should require their argument to implement pairs or keys/indices. It seems dubious to me to use linear indices 1:n for arbitrary iterators. To make that work, we'd need something like with_indices(idxs, itr) that creates an association between some indices and an iterator. You could pass with_indices(countfrom(1), itr) to get the current behavior. Of course we'd need to decide what that's called and what it returns.

@nalimilan
Copy link
Member Author

Well, it sounds useful to be able to pass non-indexable iterators and get the position of the match, even if that's not really an index which can be used with that iterator. For example, you could use zip to combine vectors of the same length and test for a condition on several elements. Or you could use eachline to find the first line matching a condition. Telling people to use a with_indices wrapper doesn't sound very user-friendly.

Overall it seems to me that we lack collection traits, which would make things much clearer and would allow providing reasonably useful fallbacks for non-indexable iterators.

@JeffBezanson
Copy link
Member

I don't agree that traits make things much clearer. With traits, in order to make X work, you need to define Y, which is difficult to discover. Traits also allow behaviors to change behind your back as packages evolve, e.g. if something becomes iterable or indexable that wasn't before.

@StefanKarpinski
Copy link
Member

I agree with Jeff here that it doesn't really make sense to implicitly use 1:n "indices" for things that don't have indices. If you return something as an index, then it should actually be an index. If you have something iterable you want to use 1:n as its indices, then we can provide a wrapper that gives it that behavior.

@JeffBezanson
Copy link
Member

We can deprecate the current behavior of arbitrary iterators to something like

first(i for (i, x) in zip(countfrom(1), itr) if f(x))

@StefanKarpinski StefanKarpinski removed the triage This should be discussed on a triage call label Feb 15, 2018
@StefanKarpinski StefanKarpinski added this to the 1.0 milestone Feb 15, 2018
@vtjnash
Copy link
Member

vtjnash commented Feb 15, 2018

Or to:

for (i, x) in enumerate(itr)
    f(x) && return i
end

if we want to preserve the not-found behavior of returning nothing

@nalimilan
Copy link
Member Author

Note that while on 0.6 the methods accepted any iterable, in practice they relied on getindex being defined. So we probably don't need any deprecation, people will just have to implement keys/pairs for their custom collections if they don't already, and the error will tell them exactly that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
collections Data structures holding multiple items, e.g. sets search & find The find* family of functions
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants