Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement keys and pairs for Enumerate #48318

Closed
wants to merge 1 commit into from

Conversation

knuesel
Copy link
Member

@knuesel knuesel commented Jan 17, 2023

This implements keys and pairs for iterators returned by enumerate.

This addresses #47124, enabling e.g.

julia> findmin(enumerate(Iterators.filter(!=(1), [3,2,1])))
(2, 2)

The definition of pairs (in addition to keys) is required for iterators that have no length defined, as in the above example.

@knuesel knuesel force-pushed the pairs-of-enumerate branch from 23c1f45 to 6ffa806 Compare January 17, 2023 16:46
@brenhinkeller brenhinkeller added iteration Involves iteration or the iteration protocol feature Indicates new feature / enhancement requests labels Jan 18, 2023
@jlapeyre
Copy link
Contributor

jlapeyre commented Jan 19, 2023

See also #34851

This Related topics have come up several times in various forums.

In particular, the example in the issue linked above can also be fixed like this:

julia> Base.pairs(A) = enumerate(A)

julia> findmin(Iterators.filter(!=(1), [3,2,1]))
(2, 2)

@knuesel
Copy link
Member Author

knuesel commented Jan 19, 2023

This PR makes it possible to get pairs for any iterator, by wrapping it with enumerate. It think it's good to require this explicit step: for some iterators the indices from enumerate might not make much sense. If that was done by default, then if later someone wants to implement the keys that really make sense, it would become a breaking change.

@aplavin
Copy link
Contributor

aplavin commented Jan 19, 2023

This PR would lead to weird and confusing behavior:

julia> it = enumerate(10:15)

julia> pairs(it)
6-element Vector{Pair{Int64, Int64}}:
 1 => 10
 2 => 11
 3 => 12
 4 => 13
 5 => 14
 6 => 15

julia> pairs(collect(it))
pairs(::Vector{Tuple{Int64, Int64}})(...):
  1 => (1, 10)
  2 => (2, 11)
  3 => (3, 12)
  4 => (4, 13)
  5 => (5, 14)
  6 => (6, 15)

One expects that map(last, pairs(x)) are always values of x, but that would be broken here.

@knuesel
Copy link
Member Author

knuesel commented Jan 19, 2023

I don't think it's so surprising:

julia> dict = Dict(:a => 1, :b => 2);

julia> pairs(dict)
Dict{Symbol, Int64} with 2 entries:
  :a => 1
  :b => 2

julia> pairs(collect(dict))
pairs(::Vector{Pair{Symbol, Int64}})(...):
  1 => :a=>1
  2 => :b=>2

Each iterator type is free to decide if the keys should be included in the iteration result (collected by collect).

But it's true that something is weird:

julia> values(enumerate(10:11)) |> collect
2-element Vector{Tuple{Int64, Int64}}:
 (1, 10)
 (2, 11)

julia> values(Dict(:a => 1, :b => 2))
ValueIterator for a Dict{Symbol, Int64} with 2 entries. Values:
  1
  2

Ideally, values would be overloaded for Enumerate:

Base.values(it::Iterators.Enumerate) = values(it.itr)

but that would be breaking...

@aplavin
Copy link
Contributor

aplavin commented Jan 19, 2023

Indeed, thanks for pointing the actual issue better!

@jakobnissen
Copy link
Contributor

jakobnissen commented Jan 19, 2023

There is a deeper inconsistency in Julia that most iterators iterate only its values, except Dicts, which iterate key/value pairs.

I think it'd be best if we, as far as possible, stick with only iterating the values. Hence, I suggest this behavior:

julia> println(collect(enumerate("abc")))
[(1, 'a'), (2, 'b'), (3, 'c')]

julia> println(collect(pairs(enumerate("abc"))))
[1 => (1, 'a'), 2 => (2, 'b'), 3 => (3, 'c')]

It may seem weird that the latter includes the keys twice - but it really doesn't. The values of an Enumerate are 2-tuples. It just so happens that this PR also proposes adding keys to Enumerate, which would be the same as the first element of the 2-tuples.

@knuesel
Copy link
Member Author

knuesel commented Jan 19, 2023

That would be more consistent indeed (though note that the special behavior affects all AbstractDict including Base.Pairs).

If we really interpret (1, 'a') as just a value, then there's no reason to define keys for this iterator: it's just another iterator that doesn't have meaningful keys.

Still, it would be nice to have a wrapper that takes any iterator, and implements keys/pairs based on the element positions. If enumerate is not appropriate, what about a Numbered type, and numbered(itr; start=1) function, that iterates the itr values but also provides keys/pairs by counting from start?

@jlapeyre
Copy link
Contributor

I agree with @jakobnissen . The fact that Dict iterates key/value pairs is really unfortunate. That Dictionaries.jl does not do this is a big advantage in writing generic code. This was an explicit and important goal for the author of that package. I think limiting the odd behavior to Dict is best.

Furthermore, pairs should return Pairs. Neither this PR, nor the suggestion in #34851 get all of this quite right.

@knuesel
Copy link
Member Author

knuesel commented Jan 19, 2023

Furthermore, pairs should return Pairs. Neither this PR, nor the suggestion in #34851 get all of this quite right.

I'm not sure about that: Pairs requires that keys can be decoupled from the data, while the documentation of pairs is more general, it describes the result as "an iterator over key => value pairs for any collection that maps a set of keys to a set of values".

This gives some freedom, for example to implement pairs for iterators that have no known length (think the result of Iterators.filter or eachline(filename)). This is why this PR doesn't use Pairs.

(For example Base defines pairs(a::AbstractDict) = a.)

@knuesel
Copy link
Member Author

knuesel commented Jan 20, 2023

Closing this since Enumerate doesn't align cleanly with the keys/values/pairs interface.

Probably a better approach is to provide a new type that can wrap any iterator and associate arbitrary keys or pairs (with keys counting from 1 as a special case, maybe default). Similar to Pairs but more aligned with iterators, while Pairs implements AbstractDict (which suggests random access).

@knuesel knuesel closed this Jan 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Indicates new feature / enhancement requests iteration Involves iteration or the iteration protocol
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants