-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inference of collect #13
Comments
That's because I'm using inference API too much. I have a few ideas to avoid relying on inference too mcuh and want to do it at some point but it would be a rather big surgery. |
Its great, that you could fix the above example! There are however other broken examples: julia> using Transducers, Test
julia> @inferred collect(Take(10), 1:100)
ERROR: return type Array{Int64,1} does not match inferred return type Any
Stacktrace:
[1] error(::String) at ./error.jl:33
[2] top-level scope at REPL[9]:1 |
Ah, thanks for catching this. So, not all transducers can be inferenced at the moment. Any (combinations of) transducers with
(BTW, fixing But actually
I thought something like the following with diff --git a/src/library.jl b/src/library.jl
index 38e4efc9..b5134e51 100644
--- a/src/library.jl
+++ b/src/library.jl
@@ -373,18 +373,23 @@ end
start(rf::R_{Take}, result) = wrap(rf, xform(rf).n, start(inner(rf), result))
-next(rf::R_{Take}, result, input) =
+@inline next(rf::R_{Take}, result, input) =
wrapping(rf, result) do n, iresult
+ Base.@_inline_meta
if n > 0
- iresult = next(inner(rf), iresult, input)
+ iresult′ = next(inner(rf), iresult, input)
n -= 1
+ if n <= 0
+ return n, _complete_take(rf, iresult′)
+ end
+ return n, iresult′
+ else
+ return n, _complete_take(rf, iresult)
end
- if n <= 0
- iresult = reduced(complete(inner(rf), iresult))
- end
- return n, iresult
end
+@inline _complete_take(rf, iresult) = reduced(complete(inner(rf), iresult))
+
"""
TakeLast(n) |
Thanks for the explanations! It is counter intuitive, that |
I guess that's kind of a duality principle. Everything that is easy in iterators is "co-easy" in transducers (and vice versa). I mean, and look at the |
That's a cool explanation. A bit off topic: Do you know if there is a precise sense in which |
I'm actually very interested in this, too. I used the notion of duality in a totally non-rigorous sense above (i.e., iterators are "pull"-based API and transducers are "push"-based API). But I cannot help wondering if there is a way to actually formalise that. Unfortunately, my category theory knowledge is still too elementary to find a good formalism... |
Here are some observations that might be a starting point. What I think needs to be done is to define the following things:
Once they are defined we need to show that I am unsure what the right notion of morphism these things is. My feeling is that Model 1: No start no stopHere is the most simple model of an iterator I could come up with: If you want to run it, it is up to the user to provide an initial state. Turning around the arrows and renaming
This is a bare bones reducing function. It has no Model 2: No start, but stop
If you want to run it, you still need to provide an initial state. Turning around the arrows and renaming
Now a map Model 3: start and stop
So this is data is equivalent to
So this is a full blown iterator. Turning around the arrows and renaming
This is equivalent to
I think we can interpret this a reduction function with early stopping. ProblemsHere are some odd things / issues with this:
|
Actually, I just remembered that I've heard related concepts: Catamorphism and Anamorphism. Reading the Wikipedia pages (at least the parts that I can guess the meaning) and also watching this Lambda World talk by Zainab Ali, it seems like the relationship is something like:
It looks like that they are pretty much like what you suggested. For example, this (from Wikipedia) does exactly look like your Model 2:
So, I guess the good news is that duality of catamorphism and anamorphism is already there. But does it give us the duality of the transducers and "iterator transformations" automatically? I mean, as you said,
sounds like a pretty hard problem.
Rich Hickey's transducers (and so Transducers.jl) use " |
(BTW, going back to the original topic, many more things are inferable now in It's not yet merged into |
It is really cool that this library becomes more and more inferable. Thanks for the great work! Thanks also for sharing the anamorphism stuff and the table! I will need to think a bit about it. As for defining iterator transforms / transducers / duality, here is another idea:
Under this definition it should be easy to see that |
Thanks! Yeah, I think Transducers.jl finally got to a beta-ish-level quality so that it's ready for non-toy examples. Your observation about adjoint is very interesting.
So I guess this corresponds to the rule in transducers that you shouldn't touch This definition is quite satisfactory as it seems to correspond to the reason why |
So I should be doing |
Yeah I think
At first I thought this is a great idea. But now I am not sure about it. In a perfect world we would have foldl(adjoint(f)(rf), itr) == foldl(rf, f(itr)) This would allow for instance to write powerful + elegant unit tests: for F in [Map(sin), Take(3), Enumerate()]
f = adjoint(F)
@test foldl(adjoint(f)(rf), itr) == foldl(rf, f(itr))
end Here this fails for two reasons. The proposal is at the type level and missing currying. |
Ah, right. I need to do |
That would be more correct. But then returning lambdas is painful. They don't display nicely, give strange error messages and you can't dispatch on them, which is needed for the other direction |
Yeah, I agree it won't be practically useful (except for the unit test you mentioned which is already a good motivation for defining it, at least with a private name). |
Yeah I think defining the lambda returning version under the name |
So I implemented the iterator transform-to-transducer mapping It looks like it's working julia> collect(Transduction(itr -> Base.Generator(x -> x + 1, itr)), 1:2)
2-element Array{Any,1}:
2
3
julia> collect(Transduction(itr -> Iterators.filter(isodd, itr)), 1:2)
1-element Array{Any,1}:
1 As I thought, it's pretty ugly. For example, next(rf::R_{Transduction}, result, input) =
wrapping(rf, result) do (itr, itstate0, buffer), result
push!(buffer, input)
if itstate0 isa Unseen
ret = iterate(itr)
else
ret = iterate(itr, itstate0)
end
while true
if ret === nothing
return (itr, itstate0, buffer), reduced(complete(inner(rf), result))
end
itout, itstate = ret
result = next(inner(rf), result, itout)
if result isa Reduced || isempty(buffer)
return (itr, itstate, buffer), result
end
ret = iterate(itr, itstate)
end
end I probably didn't need to this but it's nice to have a "constructive proof" showing that iterator transforms and transducers are equivalent in terms of the semantics (but (I believe) not the performance characteristics). Anyway, I appreciate all the ideas! I now feel like I understand iterators and transducers more. |
Thanks for sharing this. I learn a lot from this discussion. Here are some comments, some of them may be already obvious to you or slight reformulations of things you already said.
using Test
using Transducers
function rightadjoint(xf)
function itf(itr)
eduction(xf, itr)
end
end
xf = Map(cos) |> Take(4)
itr = randn(10)
rf = +
@test foldl(rf, xf, itr) == Base.foldl(rf, rightadjoint(xf)(itr))
|
It seems that this is just because repeatfirst(rf) =
function(result, input)
while true
result = rf(result, input)
result isa Reduced && return result
end
end has an obvious adjoint iterator transform. If we are OK with dealing with equality of infinite iterators (as done in math with infinite sets), I guess I can generalize your definition of adjoint by saying that transducer xs = Channel() do xs
foldl(F(put!), itr; init=xs)
end
ys = Channel() do ys
foldl(put!, f(itr); init=ys)
end
while true
isclosed(xs) && isclosed(ys) && break
@assert take!(xs) == take!(ys)
end where |
...but what I wrote with Interestingly, even though |
Ah nice. One way to think about the last few posts is like this: There are two approaches to guarantee termination. One is on the iterator side and one on the reduction function side.
|
Actually, I think we only need to show it for "basis" reducing functions. By that, I mean the reducing functions generated by following helper function: reduce_on_nth(n) =
function (i, input)
if i == n
return Reduced(input)
else
return i + 1
end
end The reducing function foldl(F(reduce_on_nth(n)), itr; init=1) == foldl(reduce_on_nth(n), f(itr); init=1) holds. (More precisely, we need to use |
This sounds very challenging to me. I mean, there are iterator transforms that turn finite iterator to an infinite one (e.g., an iterator transform adjoint to the transducer |
That is a nice idea, too. Like
This was about exploring the theory. In order to prove something you need to impose finiteness conditions somewhere. For instance |
This is my interest too. At this point I don't care much about direct "benefit" of this exploration in the practical coding (though obviously it would be great if there is any; the unit testing example you brought up above was such an example). So, it's purely for fun and I appreciate the entertainment! What I meant by "challenging" was it seemed to me that formalising it via the "finite itr" route was difficult. I mean, it's hard to imagine for me that there is any "special effect" due to the finiteness of iterator compared to the finiteness of the reducing function.
Isn't it implicitly there already as we are comparing the result of Definition. A pair of transducer
holds. Directly showing it is hard, but I think we have a: Theorem. Proof. For If We now consider another case
Note that the value |
Okay here is what I had in mind:
Comments
Theorem Now we have our adjunction. Observe that so far everything is abstract nonsense. We did not use finiteness of your iterators or that they are lists. Theorem
|
Cool. Yoneda lemma has been in my things-to-learn list for a long time. I really should do it... Going back to the discussion about the "finite itr" route, as you said, you don't really need it during the category theoretic construction. So I suppose you defer mentioning finiteness until you are about to connect to the classic foldl? You can then say that |
Yeah you are completly right, finite iterators is only needed for the classical interpretation of On the reduction function side I would like to see the classical notion with early stopping. Above you said that you believe that transducers and itertransforms have different performance characteristics. Can you expand on that? |
The trick I was trying to use so far is to say that, once the reduction is finished, we don't need to care about infinity and the theory about finite iterators work fine. That is to say, we can model iterator transforms on "infinite iterators" as the transforms that works with finite iterators of arbitrary length (OK, I guess maybe it's kind of obvious but I thought this notion was enough.) BTW, I think hiding "element type" of the iterator like
Why not? Set-theoretic maps suppose to terminate so I guess there is no difficulty there? I think allowing potentially non-total function is bad, though.
This will be my main point in JuliaCon talk so it's nice that you asked now! Actually my focus will be that function __foldl__(rf, acc, coll::LazyArrays.Vcat)
for x in coll.arrays[1]
acc = rf(acc, x)
end
for x in coll.arrays[2]
acc = rf(acc, x)
end
...
for x in coll.arrays[end]
acc = rf(acc, x)
end
return acc
end (The "unrolling" is actually done via recursion and use
On the other hand, keeping track the state with Also, I think As an another example, here is my (re-)implementation of The big message is that, if you implement a collection, you are the best person to write the loop because you know all of its properties (memory layout, types, etc.) and can encode them into the looping strategy. However, classic
The first part is solved via Another benefit of transducer is that it's easy to do fan-out type reduction. I don't think it's possible with iterator transforms (with the same memory requirement). See the example of Also, there is a crazy example where To be fair, let me think about some benefits of the iterators:
What do you think about the whole logic of foldl/transducers -vs- iterators? I'd appreciate your comments on this! |
See also JuliaLang/julia#15648 (comment) |
The argument that the author of an iterator type is usually the best person to implement the reduction loop is very compelling to me. In the past I sometimes implemented specialized I can maybe explain a bit why I am excited about this library. I work with phase space files. These are basically gigantic lists of particles. Too large to fit into memory. And I am interested in questions like:
Most (all?) of these questions can be build up from basic operations like So I need a framework to build up complex operations from these primitives. Ideally that framework should have the following properties:
The only composition mechanism other then
Often I end up using
|
Thank you very much for very detailed inputs! It is encouraging that you find Transducers.jl promising.
Good to know. Maybe you'd find
FYI,
Ah yes, I guess that's important point to list as a con of transducers (a pro for iterators). This happens typically because transducers have to handle I think solving the inferrability problem with current Julia compiler requires to rely on the compiler API. I was relying on it too much before. Now, I do the complete opposite and try not to rely on the inference at all to decouple the semantics from the compiler. But I think opt-in compiler dependency is OK. It partially is implemented and you can use it with Another approach would be to forget about inferrability of the outer most call and let function-barrier type-stabilize the loop. As most of the type instabilities are coming from the "not yet started" marker object, the majority of the loop can be done in the type stable way. This would require what I call "tail-call function-barrier" technique but it makes the implement really ugly (see also: https://discourse.julialang.org/t/25831). I'm doing this only for generic iterator fallback because doing this with SIMD support is going to be really hard (although this may be the only reasonable option for now...). Also, I haven't experimented how this interacts with nested for loop case. It may require for you to wait until the type-stabilization bubble up to the outer most loop. [1] |
I take this back. If you have, e.g., |
I am closing it as I opened #18 to track the actual inference issue. But please feel free to keep posting (or open a new issue). This thread was fun! |
Yeah, learned a lot in this thread! |
There are problems with the inferred return type of collect:
The text was updated successfully, but these errors were encountered: