Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Define pairs(A) = enumerate(A) #34851

Closed
jlapeyre opened this issue Feb 23, 2020 · 9 comments · May be fixed by #36175
Closed

Define pairs(A) = enumerate(A) #34851

jlapeyre opened this issue Feb 23, 2020 · 9 comments · May be fixed by #36175

Comments

@jlapeyre
Copy link
Contributor

The method for findall with no specialization on the collection type is

findall(testf::Function, A) = collect(first(p) for p in pairs(A) if testf(last(p)))

For a singly-linked list, like DataStructures.Cons, findall fails because there is no method keys(::Cons). Constructing, say linear indices for Cons requires traversing the list to find it's length; more precisely, I don't see a way around traversal. Writing a fallback method

pairs(A) = enumerate(A)

would solve the problem for any iterable, including Cons. This assumes that
by default the keys for a collection are 1, 2, ....
In the case of Cons, it's possible that traversing twice, preallocating output after the first pass, would be more efficient. I did not try this, but I doubt it is more efficient.

See JuliaCollections/DataStructures.jl#580 .

A somewhat related issue: I think the single argument method for findall, which is currently

findall(A) = collect(first(p) for p in pairs(A) if last(p))

would be better refactored as

findall(A) = findall(isequal(true), A)
@bramtayl
Copy link
Contributor

What would be lost it we deprecated enumerate to pairs?

@jlapeyre
Copy link
Contributor Author

I don't think that will work. pairs is meant to provide an interface to anything that has the structure of a map. In particular it provides a uniform way to iterate over key(or index) and value pairs for both Arrays and Dicts.

Return an iterator over key => value pairs for any collection that maps a set of keys to a set of values. This includes arrays, where the keys are the array indices.

enumerate always associates counting numbers with values. In particular enumerate(::Dict) does something very different than pairs. In fact enumerate just wraps its argument in Enumerate without caring at all what that argument is. But, the assumption is that the argument is an iterator. That is, it's elements can be ordered and counted.

pairs on the other hand has no fall-back method. It does something particular with Dict and Array knowing the structure of them. I am suggesting that if pairs is not defined for some type, then no one has thought about giving it an explicit map structure. But if that type can be iterated over, then that type already has a notion of associating counting numbers as keys.

Furthermore pairs associates CartesianIndexs with multidimensional Arrays by default, while enumerate again uses counting numbers. So you can't identify enumerate and pairs for Arrays.

The bottom line is that the semantics of enumerate and pairs don't always coincide. But, identifying them is a good guess in lieu of an explicit method for pairs.

In the case of Cons an easy solution is pairs(x::Cons) = enumerate(x). If this situation has not come up for other types, then it's probably not worth writing a definition for pairs(::Any). I asked here just to see if anyone had a similar experience or had thought about it.

@jlapeyre
Copy link
Contributor Author

The issue address the same problem as #34674.
It looks like the solution here (due to @oxinabox ) is more general because it applies to more than Generators.

@tkf
Copy link
Member

tkf commented May 25, 2020

I think pairs(A) = enumerate(A) as a generic fallback is a bad idea. We should define and understand the API of pairs before expanding its implementation. I think this should hold:

map(first, paris(A)) == keys(A)
map(last, paris(A)) == values(A)

keys and values should be coupled together with getindex:

collect(values(A)) == [A[k] for k in keys(A)]

Clearly, generic iterables cannot support these interfaces. So, I think it's better to define pairs for each case (maybe by defining keys and values so that the default pairs implementation work).

@jlapeyre
Copy link
Contributor Author

I agree that expanding its implementation this way would (partly) define the API. It says that by default, the keys of a set are taken from its property of being ordered. Dict, non-one-based arrays, etc. override this. (By the way collect(enumerate(Set([1,2,3]))) works. It should perhaps rather throw an error.) Making pairs(A) = enumerate(A) is then an implementation detail. But, it's one that may be useful for efficiency:

For a singly linked list, as in JuliaCollections/DataStructures.jl#580, collecting the keys and values are each O(n). You might define keys and values for Cons, but you want to avoid composing them in a way that traverses the list repeatedly. I realize this argument is sort of mixing defining semantics with properties of a data structure. But, pairs(A) = enumerate(A), doesn't contradict seeing pairs as keys and values. It's, in part, about implementation.

On the other hand, one could special-case Cons. What are the other cases we aren't thinking of? (Not a rhetorical question).

On the other, other hand, it would be interesting to make pairs(A) = enumerate(A) and see what the outcome is in the wild, reverting if it's not obviously a good idea.

Finally, I actually favor my original suggestion here of Base.pairs(x) = (first(p) => last(p) for p in enumerate(x)) in case funtions rely on pairs actually returning Pairs.

@tkf
Copy link
Member

tkf commented May 25, 2020

As long as the implementation satisfies the API, it is OK to overload the method. So, it is perfectly fine (and arguably preferred) to overload pairs(::Cons) so that the overhead of iterate is smaller. However, Cons needs to implement keys, values, and getindex separately to satisfies the API. Note that getindex is required for satisfying the API for findall:

found = findall(f, A)
@assert all(f(A[i]) for i in found)
@assert all(!f(A[i]) for i in setdiff(keys(A), found))

@jlapeyre
Copy link
Contributor Author

Sorry for necroposting, but is there a logic behind this behavior?

This question is independent of the subject of this post. It would better be asked elsewhere. (The behavior you see is documented. Also, try Nan > 3, NaN < 3.)

@knuesel
Copy link
Member

knuesel commented Jan 19, 2023

Note that Base.pairs(A) = enumerate(A) doesn't implement pairs correctly: it should return key => value pairs while enumerate returns tuples (which happen to work in some cases, but that depends on how the caller uses the values).

@jlapeyre
Copy link
Contributor Author

Yes. This PR is not the correct solution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants