Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

findmin with filtered iterators fail #47124

Open
lmiq opened this issue Oct 11, 2022 · 9 comments
Open

findmin with filtered iterators fail #47124

lmiq opened this issue Oct 11, 2022 · 9 comments
Labels
domain:iteration Involves iteration or the iteration protocol domain:search & find The find* family of functions

Comments

@lmiq
Copy link
Contributor

lmiq commented Oct 11, 2022

See: https://discourse.julialang.org/t/findmin-max-and-generators/88473?u=lmiq

Basically, this does not work:

julia> findmin(Iterators.filter(!=(1), [3,2,1]))
ERROR: MethodError: no method matching keys(::Base.Iterators.Filter{Base.Fix2{typeof(!=), Int64}, Vector{Int64}})
@jakobnissen
Copy link
Contributor

I'm not sure this is a bug. What would the index of an Iterators.Filter be? We could arbitrarily decide it was a Base.OneTo, but that would be very confusing, since then findmin(Iterators.filter(!=(1), [1,2])) == (2, 1).

@lmiq
Copy link
Contributor Author

lmiq commented Oct 11, 2022

My general expectation is that functions applied on iterators return the same as if applied to the collected versions of the same iterators.

At least it would be nice to have a better error message there, if one does not want to introduce that possible source of confusion.

@LilithHafner
Copy link
Member

    findmin(itr) -> (x, index)

Return the minimal element of the collection `itr` and its index or key.

I interpret this to mean that findmin should be defined for all iterators. If the iterator has no keys, then it should use its indices (Base.OneTo by default).

If we choose not to fix this issue, then we should adjust the docstring to include a similar caveat as pairs "...for any collection that maps a set of keys to a set of values".

@LilithHafner LilithHafner added domain:iteration Involves iteration or the iteration protocol domain:search & find The find* family of functions labels Oct 13, 2022
@aplavin
Copy link
Contributor

aplavin commented Oct 13, 2022

If the iterator has no keys, then it should use its indices (Base.OneTo by default).

Doesn't look like iterators have any indices by default:

julia> it = Iterators.filter(!=(1), [3,2,1])
Base.Iterators.Filter{Base.Fix2{typeof(!=), Int64}, Vector{Int64}}(Base.Fix2{typeof(!=), Int64}(!=, 1), [3, 2, 1])

julia> eachindex(it)
ERROR: MethodError: no method matching keys(::Base.Iterators.Filter{Base.Fix2{typeof(!=), Int64}, Vector{Int64}})

@knuesel
Copy link
Member

knuesel commented Jan 17, 2023

As I understand this was a conscious design decision, see #25999 and #26095 (these were for other find* functions but it seems logical to do the same with findmin and findmax).

Maybe what is missing is an easy way to wrap an iterator in a type that supports pairs? For example we could define pairs for Iterators.Enumerate:

julia> Base.pairs(it::Iterators.Enumerate) = Iterators.map(Base.splat(=>), it);

julia> findmin(enumerate(Iterators.filter(!=(1), [3,2,1])))
(2, 2)

@jlapeyre
Copy link
Contributor

jlapeyre commented Jan 19, 2023

See also #34851
The PR linked above also has a comment with further information.

@jlapeyre
Copy link
Contributor

jlapeyre commented Jan 19, 2023

My general expectation is that functions applied on iterators return the same as if applied to the collected versions of the same iterators.

Yes. Often itr and collect(itr) are considered semantically equivalent. In fact, Base defines collect(itr::Any), which tries to collect an iterator, in one case assuming the container created "implements" push!. Sometimes, in response to requests like this, you see. "Well, you should do findthingy(collect(itr), blah). " Why require this extra, allocating, step?

It doesn't make sense to me to implement semantics for default keys in some places, but not here.

@knuesel
Copy link
Member

knuesel commented Jan 19, 2023

I don't think it's right to aim for equivalence of itr and collect(itr). The collect function is defined to return an Array. That's a concrete type with a lot of methods (including linear algebra) that I wouldn't expect to work on arbitrary iterators.

@jlapeyre
Copy link
Contributor

I agree. I don't have a clear view of when they are sort of the same. I used "Often" to indicate that I don't claim they are unconditionally equivalent. There are many places, all over the ecosystem, where passing an iterator fails, while passing collect(itr) succeeds, even thought the elements are just processed sequentially. Suppose such function were to accept either itr and collect(itr) and give the same result. The simplest case is if it just does for x in a. Or, if we arrange for the keys of the former to be the same as the keys of the latter, there would be no semantic difference, in fact no detectable difference, inside the function.

I haven't seen a concrete example of problems that would arise from being able to do something like pairs or keys. But, I do understand that the consequences would have to be considered carefully. One objection is that perhaps some iterator naturally has keys other than 1, 2, .... An answer is, define a method, or use a trait or something to give this iterator other keys. Maybe a method for enumerate or pairs or keys.

Another idea is an entirely new function ienumerate or ipairs that gives the behavior we want.

Hmmm. I'm confused about the current status. If I do z = (x for x in 1:3), then keys(z) and pairs(z) do work with 1.7, but not 1.6. At least for Generators, the iterator is pulled out. I suppose this is uncontroversial, because these are they only keys that make sense in this case. What we are asking for is a way to give an iterator with no concept of keys the same keys that collect gives it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
domain:iteration Involves iteration or the iteration protocol domain:search & find The find* family of functions
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants