-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
codegen support for efficient union representations #20593
Conversation
Cool! First request: copy that commit message into the devdocs somewhere :) |
So excited to see this being worked on. Am I right to think there are still performance optimizations that can be made? For now, I'm using the following snippet to check the performance of union types:
Currently this branch produces very similar behavior as Julia 0.5 did: the union type of |
Very cool, Jameson. I'm guessing from the OP and John's comment that the remaining half of work is to make unions of two isbits types isbits themselves? Would these be laid out be like a variant/ |
That's a boring example. I can cut down the allocation count, but they don't really cost any time relative to
A slightly more interesting version is:
Although even here it is very difficult to craft a meaningful benchmark since most of the cost is in using a floating point representation, and the cost of allocation is relatively just noise:
|
I'm confused: if 100% of the cost of evaluating both |
To be honest, I am also a little unclear what those benchmarks are showing... (also - what is "altered version of PR"? is that work on the isbits case - you say "I can cut down the allocation count"?) |
That cost model only works for macro scale. In this case, the cost model is basically (yeah, the altered case is some test code I put together that does some inference re-arrangements to prevent allocation in this case in exchange for worse code in the general case. Since it's just inference work, it doesn't really matter for this PR – I could have gotten the same result just by writing out some of the extra branches in those function manually). |
Interesting. Is there some more generic cases that we can test with this branch? Maybe |
On 32bit windows:
I could spend a few cycles looking into this if we are pushing this for v0.6. |
Hmm... Is this a regression? My benchmark was a bit strange. Let's call this test "DataArrays lite": immutable NA; end
Base.:+(::NA, ::Int) = NA()
Base.:+(::NA, ::NA) = NA()
Base.:+(::Int, ::NA) = NA()
n = 10_000_000
v1 = Vector{Union{Int,NA}}(n)
v2 = Vector{Union{Int,NA}}(n)
for i = 1:n
if rand() < 0.6666
v1[i] = rand(1:100)
else
v1[i] = NA()
end
if rand() < 0.6666
v2[i] = rand(1:100)
else
v2[i] = NA()
end
end
v1 + v2 And the benchmark results: julia> @time v1 + v2; # v0.5.0
0.439152 seconds (10.00 M allocations: 228.874 MB, 24.30% gc time)
julia> @time v1 + v2; # master
0.510497 seconds (10.00 M allocations: 305.180 MiB, 23.54% gc time)
julia> @time v1 + v2; # this PR
0.747177 seconds (10.00 M allocations: 228.887 MiB, 17.05% gc time) Does this make sense? When the isbits stuff is done, I was expecting this to be within a factor of two (edit: 6?) of the EDIT: pre-allocating the output array julia> @time map!(+, v3, v1, v2); # v0.5.0
0.321886 seconds (10.00 M allocations: 152.580 MB, 2.38% gc time)
julia> @time map!(+, v3, v1, v2); # master
0.336431 seconds (10.00 M allocations: 152.580 MiB, 0.92% gc time)
julia> @time map!(+, v3, v1, v2); # this PR
0.394033 seconds (10.00 M allocations: 152.580 MiB, 1.68% gc time)
julia> @time map!(+, v3, v1, v2); # Vector{Int} case, for comparison
0.026852 seconds (4 allocations: 160 bytes) |
And finally, by using/extending nullables on v0.5: function Base.:+(i1::Nullable{Int}, i2::Nullable{Int})
if isnull(i1) || isnull(i2)
return Nullable{Int}()
else
return Nullable(get(i1) + get(i2))
end
end
v1 = Vector{Nullable{Int}}(n)
v2 = Vector{Nullable{Int}}(n)
v3 = Vector{Nullable{Int}}(n)
for i = 1:n
if rand() < 0.6666
v1[i] = Nullable(rand(1:100))
else
v1[i] = Nullable{Int}()
end
if rand() < 0.6666
v2[i] = Nullable(rand(1:100))
else
v2[i] = Nullable{Int}()
end
end yields julia> @time map!(+, v3, v1, v2);
0.132412 seconds (4 allocations: 160 bytes) which is about 6 times slower than the |
You can try this PR JuliaStats/DataArrays.jl#235 |
@vchuravy I already looked at it – the assertion is wrong. I just need to push a fix to avoid calling it there. @andyferris This PR is only for the implementation of the codegen support. It doesn't implement support for representing unions efficiently in vectors. |
Right - I'm getting ahead of you (because this is exciting 😄 ) |
8578b20
to
035f313
Compare
@nanosoldier |
Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @jrevels |
Wow cool, I've never created a 14552.04% slowdown before. I didn't even have to allocate anything! |
@nanosoldier |
Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @jrevels |
src/codegen.cpp
Outdated
jl_cgval_t value; // a value, if the var is unboxed or SSA (and thus boxroot == NULL) | ||
Instruction *boxroot; // an address, if the var might be in a jl_value_t** stack slot (marked tbaa_const, if appropriate) | ||
jl_cgval_t value; // a stack slot or constant value | ||
Value *pTIndex; // where the current value is stored |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this has the wrong comment
be88ee9
to
4109326
Compare
@nanosoldier |
Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @jrevels |
Based on a discussion with @vtjnash yesterday, my understanding was that there was a ~10% slowdown in the raytracer benchmark (in a marginal case involving type-unstable code) due to the new calling convention for functions with Union return types. However it looks like the latest commit has fixed that. If so, that's awesome and this LGTM. |
Not related to the changes here, but I'm confused by the comment on the
The comment says non-NULL if this is not a pointer, but when it's non-null |
I think there's just an extra |
4109326
to
702e20d
Compare
with names, via the usual symbol resolution mechanism in the linker. | ||
|
||
Note too that ccall functions are also handled separately, | ||
via a manual GOT + PLT. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
spell out acronyms
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
standard jargon exists for a reason
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what, to confuse people? this is documentation, very welcome to have it at all, just asking for it to be made slightly clearer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because Global Offset Table is not actually clearer, just longer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes it is clearer - I can look that up and go find out more about that if it's clear that's what's being referred to, unlike GOT on its own without any explanation, where first the reader needs to figure out what the acronym stands for
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's completely standard practice to spell acronyms out the first time you use them, then use the acronym from then on if you must http://blog.apastyle.org/apastyle/abbreviations/#Q1
otherwise the term serves no purpose to anyone who doesn't know what you're referring to
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are also folks like me who don't have a solid background with this stuff; I didn't know that GOT was Global Offset Table, and I still don't know what PLT is. Big +1 for spelling out acronyms, at least the first time they're used. I think that would help a lot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vtjnash: In any writing, you should spell out the first usage, and put the abbreviation in parens. From then on in the same document you may refer to it by abbreviation. This is a quite universal standard in writing, and claiming otherwise is simply wrong. If you're feeling extra helpful to your reader, you can also link to the wikipedia article on the topic from the first (spelled out) usage of the term.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That said, it's better to have this documentation and others can fix up the formatting and writing as long as we know what it means. I would not have known what GOT stood for without asking you, however. PLT I happen to know.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GOT is a pretty standard acronym though... Game of Thrones of course.
doc/src/devdocs/compiler.md
Outdated
`mark_julia_type` (for immediate values) and `mark_julia_slot` (for pointers to values). | ||
|
||
The function `convert_julia_type` can transform between any two types. | ||
When it returns and returns an `cgval.typ` set to `typ`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unclear - reword?
doc/src/devdocs/compiler.md
Outdated
primitives to implement union-splitting. | ||
|
||
The representation of the tagged-union is as a pair | ||
of < void* union, byte selector >. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
code highlight ?
doc/src/devdocs/compiler.md
Outdated
It records the one-based depth-first count into the type-union of the | ||
isbits objects inside. An index of zero indicates that the `union*` is | ||
actually a tagged heap-allocated `jl_value_t*`, | ||
and needs to treated as normal for a boxed object rather than as a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
needs to be treated as normal
doc/src/devdocs/compiler.md
Outdated
tagged union. | ||
|
||
The high bit of the selector (`byte & 0x80`) can be tested to determine if the | ||
`void*` if the `void*` is actually a heap-allocated box, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
duplicate "if the void*
"
doc/src/devdocs/compiler.md
Outdated
- Tuples of VecElement types get passed in vector registers. | ||
- Structs get passed on the stack. | ||
- Return values are handle similarly to arguments, | ||
with a size-cutoff at which they will instead by returned via a hidden sret argument. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will instead be returned
Thanks for adding the devdocs. That is very helpful. |
Note that extern functions are handled separately, | ||
with names, via the usual symbol resolution mechanism in the linker. | ||
|
||
Note too that ccall functions are also handled separately, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps code format ccall, i.e. ccall
?
doc/src/devdocs/compiler.md
Outdated
When it returns and returns an `cgval.typ` set to `typ`. | ||
It'll cast the object to the requested representation. | ||
It'll make boxes, allocate stack copies, and compute tagged unions as | ||
needed to perform the request. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These sentences could be combined, e.g. "It will cast the object to the requested representation, making boxes, allocating stack copies, and computing tagged unions [in the process] as needed."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure that necessarily improves readability – it just seems like a even trade, imo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if these later "it will" sentences are supposed to be connected to the "when it returns" fragment, that could be made clearer either in sentence structure or markdown formatting
- emit_sizeof | ||
- boxed | ||
- unbox | ||
- specialized cc-ret |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be nice if you could code format these
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm generally reluctant to code-format every word that also appears in code. In part because it is not really standard grammar, and in part because I try to avoid having every other word be bracketed :P
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just mean the things that only refer to functions or other code-specific thingamabobs that aren't part of normal prose. But if you're happy with it as it is, that's fine with me.
doc/src/devdocs/compiler.md
Outdated
thus avoiding the cost of re-allocating a box, | ||
while maintaining the ability to efficiently handle union-splitting based on the low bits. | ||
|
||
It is guaranteed that `byte & 0x7f` is an exact test for the type, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think either "since" should follow the comma, or the comma should be replaced by a semicolon.
doc/src/index.md
Outdated
@@ -72,6 +72,7 @@ | |||
* Documentation of Julia's Internals | |||
* [Initialization of the Julia runtime](@ref) | |||
* [Eval of Julia code](@ref) | |||
* [High-level Overview of the Native-Code Generation Process](@ref) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might put this after the memory layout section, since that and the preceding sections are helpful to understand prior to reading this (or at least it seems that way to me)
starting a document describing the overall code-generator structure
702e20d
to
eac043a
Compare
with names, via the usual symbol resolution mechanism in the linker. | ||
|
||
Note too that ccall functions are also handled separately, | ||
via a manual GOT + PLT. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vtjnash: In any writing, you should spell out the first usage, and put the abbreviation in parens. From then on in the same document you may refer to it by abbreviation. This is a quite universal standard in writing, and claiming otherwise is simply wrong. If you're feeling extra helpful to your reader, you can also link to the wikipedia article on the topic from the first (spelled out) usage of the term.
I've changed my review to "approve" on the premise that these docs are incomplete and when this is all done, @vtjnash will spell out the first instance of each acronym and link to the wikipedia page. |
# High-level Overview of the Native-Code Generation Process | ||
|
||
|
||
<placeholder> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you planning to include something here in this PR or in a subsequent one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Subsequent one it is 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I put it there so that nobody could complain that the document was immensely incomplete at describing codegen, and just jumps to explain a few small areas. It almost worked :)
@vtjnash: what needs to happen for that I'm debugging an issue where a hot @inline foo(t) = 1 < 2 ? t[1] : 1 # might seem idiotic, but resembles Base.size
bar(t) = foo(t)
code_llvm(bar, (NTuple{1,Int},))
%2 = load i64, i64* %1, align 1 (funny how this triggers different code paths, but that's in part due to #17880 I guess) |
Likely need to track where the pointer came from in the jl_cgval_t, or make some conservative estimates like we do everywhere else. I didn't do that already mostly because x86 doesn't care. |
This implements the codegen primitives and return-type calling convention to make moving around (generating, storing, inspecting, returning) inferred union types more efficient.
The representation of the tagged-union is as a pair consisting of a memory pointer of unspecified size and a byte describing its type. The memory pointer may also be a
jl_value_t*
box, which will be reflected in it's type tag. For more extensive details, see the commit message.