Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: A traits-based user-extensible @inbounds #11867

Closed
wants to merge 1 commit into from
Closed

Conversation

mbauman
Copy link
Member

@mbauman mbauman commented Jun 25, 2015

@simonster and @johnmyleswhite and I were talking about #7799 more yesterday. After the discussion, I went back to my proposal in #8227 (comment) for a traits based mechanism. The crux of the issue is that we want the user to be able to define either a simple method f(A) that is assumed to be safe or a more complicated method f(::BoundsCheck, A) that allows them to use dispatch to selectively change behavior. The problem is that we need to implement fallbacks that try either f(A) or f(BoundsCheckOn(), A) without incurring a stack overflow. I had been thinking that this lived in the same design space as a linear indexing trait dispatch, but I now think that it is a separate problem that, while tricky to compose with other traits, can be solved independently. It just requires a little bit of tip-toe-ing around:

  • getindex(...) falls back to getindex(BoundsCheckOn(), ...), which will error if neither was defined.
  • getindex(BoundsCheckOff(), ...) will fall back to getindex(...) if not defined.

This works okay, as long as you never explicitly call f(BoundsCheckOn(), A), since that can't fall-back to f(A) without a stack overflow. The trouble is that it's very tempting to write a method of f that "propagates" its bound-checking state: f(b::BoundsCheck, A) = f(b, A.data). This won't work unless we remove the error above in f(::BoundsCheckOn, A). And if we do that I don't think there's any straightforward way to avoid getting a non-obvious stack overflow in a place where we really need a sensible "you forgot to define this method" error.

Potential issues:

  • Currently only works with AbstractArray, but it may be possible to add fallbacks for all types
  • It's verboten to ever explicitly call f(BoundsCheckOn(), A), but defining functions in the form f(::BoundsCheckOn, ::Any) is totally supported and just fine. That's a little wonky.
  • This now takes up two extra arguments, so we hit setindex! no longer inlined for rank N>6 #9622 (and MAX_TUPLETYPE_LEN) two indices sooner.

I think that this may be a viable path towards a user-extensible @inbounds. To move forward, @simonster: I need your help in updating your @inbounds macro and expand-ref changes from #8227 to simply insert a BoundsCheckOff() within all @inbounds'ed getindex and setindex! calls.

This removes `unsafe_getindex` and `unsafe_setindex!` in favor of `getindex(::BoundsCheckOff, ...)` and `setindex!(::BoundsCheckOff, ...).

`getindex(...)` falls back to `getindex(BoundsCheckOn(), ...)`, which will error if neither was defined.

`getindex(BoundsCheckOff(), ...)` will fall back to `getindex(...)` if not defined.
@ScottPJones
Copy link
Contributor

Great direction! 👍

@timholy
Copy link
Member

timholy commented Jun 25, 2015

Very interested to see this. A serious look might be a few days, unfortunately.

@johnmyleswhite
Copy link
Member

Nice!

@simonster
Copy link
Member

The case that I keep worrying about is something like:

type MyType <: AbstractVector{Int} end
getindex(::MyType, ind) = 1
getindex(::MyType, ind1, ind2) = 2

With this PR, I get the output:

julia> getindex(MyType(), 1)
1

julia> getindex(MyType(), 1, 1)
2

julia> getindex(BoundsCheckOff(), MyType(), 1)
1

julia> getindex(BoundsCheckOff(), MyType(), 1, 1)
1

This is pathological since it's non-sensical for getindex to return two different results with 1 or 2 parameters, but I think it's problematic that @inbounds is changing the method that's getting called.

@johnmyleswhite
Copy link
Member

I think it's problematic that @inbounds is changing the method that's getting called.

This seems like the crux of the problem (insofar as I understand it): how do you make sure you skip doing run-time dispatch without using different methods for checked indexing and unchecked indexing?

@mbauman
Copy link
Member Author

mbauman commented Jun 26, 2015

I should have trusted the @mbauman from three months ago. Dangit.

@mbauman
Copy link
Member Author

mbauman commented Jun 26, 2015

Actually, that's an issue on master right now, too:

julia> type MyType <: AbstractVector{Int} end

julia> getindex(::MyType, ind) = 1
getindex (generic function with 121 methods)

julia> getindex(::MyType, ind1, ind2) = 2
getindex (generic function with 122 methods)

julia> getindex(MyType(), 1)
1

julia> getindex(MyType(), 1, 1)
2

julia> Base.unsafe_getindex(MyType(), 1)
1

julia> Base.unsafe_getindex(MyType(), 1, 1)
1

:-\

@johnmyleswhite
Copy link
Member

I don't see how things could work any other way when you're exploiting multiple dispatch to change behaviors. It's always possible to disassociate those four methods since the only thing holding them together is the no-punning honor code in the community.

@simonster
Copy link
Member

My proposal from the other day for solving that part of the problem was:

@generated function good_unsafe_getindex(args...)
    safesig = methods(getindex, args)[1].sig
    unsafesig = methods(Base.unsafe_getindex, args)[1].sig
    :($(unsafesig <: safesig ? (:(Base.unsafe_getindex)) : :getindex)(args...))
end

For:

type MyType <: AbstractVector{Int} end
getindex(::MyType, ind) = 1
getindex(::MyType, ind1, ind2) = 2

this gives:

julia> good_unsafe_getindex(MyType(), 1)
1

julia> good_unsafe_getindex(MyType(), 1, 2)
2

and for:

type MyUnsafeType <: AbstractVector{Int} end
getindex(::MyUnsafeType, ind) = 1
getindex(::MyUnsafeType, ind1, ind2) = 2
Base.unsafe_getindex(::MyUnsafeType, ind) = 3
Base.unsafe_getindex(::MyUnsafeType, ind1, ind2) = 4

this gives:

julia> good_unsafe_getindex(MyUnsafeType(), 1)
3

julia> good_unsafe_getindex(MyUnsafeType(), 1, 1)
4

@StefanKarpinski
Copy link
Member

the no-punning honor code in the community.

I love this phrase.

@ScottPJones
Copy link
Contributor

Umm, I'm the newbie in the crowd, can you please explain the "no-punning honor code"? (Hopefully, before I break the code and everybody yells at me!) (also, I see type punning all over the C code that's been driving me crazy!) Thanks!

@StefanKarpinski
Copy link
Member

What about factoring getindex like this:

function getindex(a, i)
    check_indices(a, i)
    unsafe_getindex(a, i)
end

Then you could override just unsafe_getindex and provide default index checking for broad classes of containers (this is often pretty generic). Kind of annoying that you extend unsafe_getindex, but I guess that's not the end of the world.

@StefanKarpinski
Copy link
Member

The no-punning honor code is basically don't do anything like this:

http://stackoverflow.com/questions/4854248/why-are-bitwise-shifts-and-used-for-cout-and-cin

@quinnj
Copy link
Member

quinnj commented Jun 26, 2015

The "no punning" means that generic functions, operators included, should have a single, canonical "meaning", and different implementations of the function (methods) shouldn't try to bend that meaning in weird or "pun"ning ways. Generic functions should be easy to reason about in terms of what they do without being "fancy" or funny.

@ScottPJones
Copy link
Contributor

Hmmm. *, ^, $? Sorry, I couldn't help myself! Don't shoot me!

@simonster
Copy link
Member

@StefanKarpinski I think this is basically @JeffBezanson's proposal from #7799 (comment). If we can get it to perform well, and if we don't mind the awkwardness of making everyone extend unsafe_getindex, maybe it's the way to go.

@ScottPJones
Copy link
Contributor

I prefer it to be explicit whether or not my method "responds" to @inbounds, it really worries me that currently @inbounds affects code that gets inlined (and may not be safe), so I like the idea of extending unsafe_getindex (I added unsafe_checkstring for precisely this reason!)

@StefanKarpinski
Copy link
Member

Note that you can still make indexing with and without bounds checking disagree by overloading both in ways that don't match. Of course, you wouldn't usually extend getindex with this approach.

@ScottPJones
Copy link
Contributor

What surprised me, was that something I wrote in method A (adding @inbounds) changed the code in method B (because it got inlined), that smelled like dynamic scoping to me!

@StefanKarpinski
Copy link
Member

Yes, this feature requires a limited form of dynamic scoping.

@ScottPJones
Copy link
Contributor

But that wouldn't be true anymore, right, with @JeffBezanson #7799? Gerry Sussman taught me to hate dynamic scoping 😀

@simonster
Copy link
Member

None of the proposals we're currently discussing will auto-propagate @inbounds.

@ScottPJones
Copy link
Contributor

Yeah! 👍 💯 🎉 👏

@mbauman
Copy link
Member Author

mbauman commented Jun 26, 2015

In response to #11867 (comment):

The basic trouble stems from wanting to call either getindex or unsafe_getindex, and also allowing types to define getindex and optionally unsafe_getindex if they want to opt-in to skipping bounds checks.

If we have automatic bounds checking in base, then you really shouldn't define getindex at all — any library code that uses @inbounds then won't support your type unless you define unsafe_getindex. Which I find ugly and unintuitive. The trick I play in #10525 is that if we end up in the library fallbacks for getindex, it means that the user hasn't defined getindex for those types. So as I convert the indices to the proper scalar access, I can check bounds and subsequently call unsafe_getindex, which can fall back to getindex. And the fallback getindex for proper scalar access is an error since the user must define that for things to work.

@ScottPJones
Copy link
Contributor

@mbauman Good points. Now I don't know what I want! (except that I don't want @inbounds to propagate)

@mbauman
Copy link
Member Author

mbauman commented Jun 26, 2015

Perhaps having these functions fall back on each other by default wouldn't be all that bad. Sure, there's a stack overflow if your type is broken and doesn't define any indexing methods. But we could detect this and add a message to the REPL error display about why there was a stack overflow. Just so long as it happens instantly (and doesn't hang in thousands of calls check_indices before hitting the overflow).

@toivoh
Copy link
Contributor

toivoh commented Jun 26, 2015

I'm still not convinced that having getindex fall back to unsafe_getindex is ever safe. With the latter you assert that the indices you have supplied are within bounds, but do we have a formal definition of what it means for an index to be within bounds for a given type? I think it might have to vary on a case by case basis, and then there is no way that we can write generic code that checks bounds (unless you write a call to checkbounds, if everyone would be required to implement that).

@simonster
Copy link
Member

getindex should only fall back to unsafe_getindex after calling checkbounds, so if no checkbounds is defined for a type, it will just throw. We can define checkbounds for AbstractArrays in Base (I don't think the correct behavior here is ambiguous), and leave checkbounds for other types to the implementers. (Or, if we define the mutually recursive fallbacks, they can just define getindex and unsafe_getindex will fall back on that.)

@timholy
Copy link
Member

timholy commented Jun 27, 2015

As an example of the complexities, I offer the case of SubArrays, where there are bounds-checks both on the indexes and the parent array. That is to say, for a SubArray S, S[3] gets translated into P[ind[3]], where P = parent(S) and ind = S.indexes[1]. If @inbounds only turns off the outer bounds check (the P[i], where i=ind[3]) while preserving the ind[3] bounds-check, there will be significant performance implications (see #11641, #11625).

This is a frustratingly difficult problem to reason about.

@mbauman
Copy link
Member Author

mbauman commented Jun 28, 2015

Closing this in favor of the proposal in #7799 (comment). I think this is trying to hard to deal with too many use-cases, which ends up making this complicated and confusing. Let's just do it the obvious way: check bounds in the base, and recommend that users define the unsafe method.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants