-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allocate objects in pools with correct alignment #21959
Conversation
src/julia_internal.h
Outdated
|
||
STATIC_INLINE jl_value_t *jl_gc_alloc_(jl_ptls_t ptls, size_t sz, void *ty) | ||
{ | ||
const size_t allocsz = sz + sizeof(jl_taggedvalue_t); | ||
if (allocsz < sz) // overflow in adding offs, size was "negative" | ||
jl_throw(jl_memory_exception); | ||
size_t alignment = JL_SMALL_BYTE_ALIGNMENT; | ||
if (ty && ((uintptr_t)ty != jl_buff_tag) && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function shouldn't read ty.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why? I need access to the alignment at this point and ty
is used in jl_set_typeof(v, ty)
later in the function.
Otherwise I will need to pass in the alignment from the outside.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The size is also passed in explicitly. I think we can just have an aligned version
base/atomics.jl
Outdated
@@ -322,7 +322,7 @@ inttype(::Type{Float32}) = Int32 | |||
inttype(::Type{Float64}) = Int64 | |||
|
|||
|
|||
alignment(::Type{T}) where {T} = ccall(:jl_alignment, Cint, (Csize_t,), sizeof(T)) | |||
alignment(::Type{T}) where {T} = ccall(:jl_alignment, Cint, (Any,), T) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(it's just datatype_alignment
now.)
src/julia_internal.h
Outdated
@@ -170,30 +176,25 @@ static const int jl_gc_sizeclasses[JL_GC_N_POOLS] = { | |||
// 64, 32, 160, 64, 16, 64, 112, 128, bytes lost |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And we should change this list for this.
src/julia_internal.h
Outdated
// szclass 16+ | ||
return 16; | ||
#endif | ||
// The pools are aligned wit JL_CACHE_BYTE_ALIGNMENT (typically 64) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
with
19b847c
to
ab60fc6
Compare
19ee5f9
to
dcf3920
Compare
7e38329
to
fe5bd9f
Compare
#define JL_CACHE_BYTE_ALIGNMENT 64 | ||
// JL_HEAP_ALIGNMENT is the maximum alignment that the GC can provide | ||
#define JL_HEAP_ALIGNMENT JL_SMALL_BYTE_ALIGNMENT | ||
#define GC_MAX_SZCLASS (2032-sizeof(void*)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's the significance of the 2032?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is the largest object size that we use pools for. Also, see the last entry in jl_gc_sizeclasses
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what are the chances of it changing in either place?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what the sizeof(void*)
comes from, but this PR is supposed to be changing this list (#21959 (comment))
src/ccall.cpp
Outdated
@@ -1214,8 +1214,9 @@ static jl_cgval_t mark_or_box_ccall_result(Value *result, bool isboxed, jl_value | |||
const DataLayout &DL = *jl_ExecutionEngine->getDataLayout(); | |||
#endif | |||
unsigned nb = DL.getTypeStoreSize(result->getType()); | |||
unsigned alignment = DL.getPrefTypeAlignment(result->getType()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could also be DL.getABITypeAlignment(result->getType())
#define JL_CACHE_BYTE_ALIGNMENT 64 | ||
// JL_HEAP_ALIGNMENT is the maximum alignment that the GC can provide | ||
#define JL_HEAP_ALIGNMENT JL_SMALL_BYTE_ALIGNMENT | ||
#define GC_MAX_SZCLASS (2032-sizeof(void*)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is the largest object size that we use pools for. Also, see the last entry in jl_gc_sizeclasses
src/julia_internal.h
Outdated
return 16 - 16376 / 2 / LLT_ALIGN(sz, 16 * 2) + 24 + N; | ||
return 16 - 16376 / 1 / LLT_ALIGN(sz, 16 * 1) + 32 + N; | ||
size_t klass = 0; | ||
while (klass < JL_GC_N_POOLS) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this seems way more expensive (also given that this function is already a pretty significant percentage of the time to allocate an object)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I know I wanted a fast way to check correctness before going back to the jump table.
#define JL_CACHE_BYTE_ALIGNMENT 64 | ||
// JL_HEAP_ALIGNMENT is the maximum alignment that the GC can provide | ||
#define JL_HEAP_ALIGNMENT JL_SMALL_BYTE_ALIGNMENT | ||
#define GC_MAX_SZCLASS (2032-sizeof(void*)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what the sizeof(void*)
comes from, but this PR is supposed to be changing this list (#21959 (comment))
33b89c5
to
06ed45f
Compare
@jlbuild !filter=linux |
AppVeyor failure is #20152 (comment) ... |
Why did this stall out? Are we close in it? |
I got sidetracked due to my relocation and I won't have time until early
August to continue working on this.
As far as I know the missing piece are:
The pool sizes need to be optimised and the function that selects the
correct pool under alignment consideration needs to be rewritten and
optimised.
If anybody would like to take a stab feel free to do so!
…On Sat, 15 Jul 2017, 11:00 Jameson Nash, ***@***.***> wrote:
Why did this stall out? Are we close in it?
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#21959 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAI3atoXS91693ibQdYyOaxu8qys0bdBks5sOB0vgaJpZM4NgVRE>
.
|
Bump. This is becoming a bit urgent, as we have frequent CI failures due to this issue. |
f487f73
to
317561d
Compare
OK, I rebased this so it is on the same stand as earlier this year. I will need help with making sure that the logic for -- edit I hope I got the rebase right and complete, so a review would be appreciated |
317561d
to
c4344a2
Compare
freebsd CI is a wonderful
AppVeyor is a lovely:
Which shouldn't have seqfaulted, but is probably a wrong alignment as well. and CircleCI seems unrelated. |
That code is not turned on for FreeBSD. |
One new test failure from Travis Mac 64bit
and #23371 on Travis Linux 32bit. |
As far as I am aware one wants to align atomic types by their natural alignment e.g. On current master this invariant in correct:
On this branch there is no longer a special |
src/julia.h
Outdated
@@ -155,6 +155,7 @@ JL_EXTENSION typedef struct { | |||
#endif | |||
jl_array_flags_t flags; | |||
uint16_t elsize; | |||
uint16_t elalign; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this needed? we don't have any spare bits right here, but you can steal some from offset
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is used in multiple places like jl_array_copy
and during deserialization to create a new array with the right alignment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, but those aren't performance sensitive. How does this differ from
el->isptr ? sizeof(void*) ? jl_gc_align(jl_tparam0(jl_typeof(el))
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure that should work, I was mostly mirroring what elsize
is doing.
#define JL_CACHE_BYTE_ALIGNMENT 64 | ||
// JL_HEAP_ALIGNMENT is the maximum alignment that the GC can provide | ||
#define JL_HEAP_ALIGNMENT JL_CACHE_BYTE_ALIGNMENT | ||
#define GC_MAX_SZCLASS (2032-sizeof(void*)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2032 has no significance. It is simply:
(((GC_PAGE_SZ - ALIGN + sizeof(jl_taggedvalue_t)) ÷ 8) ÷ ALIGN) * ALIGN
per the formula below (the 8 is arbitrary)
src/julia_internal.h
Outdated
// An alignment of 0 or 1 means unaligned and we can use sz directly. | ||
if (alignment != 0 && alignment != 1 && alignment != sz) | ||
sz = ((sz / alignment) + 1) * alignment; | ||
return sz; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
size_t alsz = LLT_ALIGN(sz, alignment);
return alignment ? alsz : sz;
src/julia_internal.h
Outdated
jl_value_t *v; | ||
if (allocsz <= GC_MAX_SZCLASS + sizeof(jl_taggedvalue_t)) { | ||
int pool_id = jl_gc_szclass(allocsz); | ||
if (klass != -1 && alignsz <= GC_MAX_SZCLASS + sizeof(jl_taggedvalue_t)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
size test seems unnecessary (redundant with klass test)
src/julia_internal.h
Outdated
osize = jl_gc_sizeclasses[pool_id]; | ||
} | ||
else { | ||
osize = p->osize; | ||
} | ||
assert((size_t)osize >= alignment && | ||
(alignment == 0 || alignment == 1 || osize % alignment == 0)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
alignment == 0 || (osize & (alignment - 1)) == 0
(if you want, also can add && (alignment & (alignment - 1) == 0)
to assert that alignment is a power-of-two)
#else | ||
# define jl_gc_alloc(ptls, sz, ty) jl_gc_alloc_(ptls, sz, ty) | ||
# define jl_gc_alloc(ptls, sz, align, ty) jl_gc_alloc_(ptls, sz, align, ty) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should keep the same names on both sides of the conditional
test/threads.jl
Outdated
@@ -280,6 +280,10 @@ let atomic_types = [Int8, Int16, Int32, Int64, Int128, | |||
filter!(T -> sizeof(T)<=8, atomic_types) | |||
end | |||
for T in atomic_types | |||
# Check that alignment is natural alignment | |||
@test Base.datatype_alignment(T) == Base.sizeof(T) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test is currently broken for Atomic{(U)Int128}
src/dump.c
Outdated
@@ -1349,7 +1355,7 @@ static jl_value_t *jl_deserialize_value_array(jl_serializer_state *s, jl_value_t | |||
for (i = 0; i < ndims; i++) { | |||
dims[i] = jl_unbox_long(jl_deserialize_value(s, NULL)); | |||
} | |||
jl_array_t *a = jl_new_array_for_deserialization((jl_value_t*)NULL, ndims, dims, isunboxed, elsize); | |||
jl_array_t *a = jl_new_array_for_deserialization((jl_value_t*)NULL, ndims, dims, isunboxed, elsize, elalign); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vtjnash Any idea how to get elalign
at this point of the serialisation state? I don't have a eltype here yet, right? For that I need aty
and that is depended on a
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
compute it during serialization
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok just not keep it around ;) Gotcha
src/julia_internal.h
Outdated
if (allocsz <= GC_MAX_SZCLASS + sizeof(jl_taggedvalue_t)) { | ||
int pool_id = jl_gc_szclass(allocsz); | ||
if (klass != -1){ | ||
assert(alignsz <= GC_MAX_SZCLASS + sizeof(jl_taggedvalue_t)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wrong indentation here (should be two four-space indents rather than a tab)
src/llvm-alloc-opt.cpp
Outdated
continue; | ||
it.first->eraseFromParent(); | ||
std::get<0>(it)->eraseFromParent(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wrong indentation here too
test/threads.jl
Outdated
@@ -280,6 +280,14 @@ let atomic_types = [Int8, Int16, Int32, Int64, Int128, | |||
filter!(T -> sizeof(T)<=8, atomic_types) | |||
end | |||
for T in atomic_types | |||
# Check that alignment is natural alignment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These lines too
OK todos after the recent rebase:
After my discussion with Jameson yesterday I am worried about the difference between stack-alignment and heap-alignment. Currently they both agree at 16, but part of the reason for this whole PR is that I would like to bump heap alignment to 64 so that we can support alignments of 32 and 64 (avx256, avx512) Do we also need to bump the stack alignment? |
11d181f
to
4d0dc1e
Compare
src/llvm-alloc-opt.cpp
Outdated
if (!ignore_tag) { | ||
align = sz <= 8 ? 8 : JL_SMALL_BYTE_ALIGNMENT; | ||
sz += align; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should change alignment to pointer size and increase size by that much.
The memory alignment used by Julia should be 64. No good comes from bifurcating the manner of [co]locating information. Stack and heap should be transparently egalitarian in their conferred alignment. This provides palpable future-proofing and levers performance (which follows less mess). |
4d0dc1e
to
0deb205
Compare
* TODO: select bucket that fits multiple of alignment * TODO: allow alignment up to 64 * TODO: make arrays follow alignment as well * TODO: cleanup the mess that is jl_alignment, julia_alignment, datype_alignment, and jl_gc_alignment * TODO: teach jl_gc_alloc_ to do something with the alignment request
0deb205
to
f1586ad
Compare
Very stale |
I thought we had almost finished this. Just needed a small rebase |
Just doing a lot of heavy lifting here. :) |
Fixes #21918.
The issue was that out gc pools only guaranteed 16 bytes alignment and the pool pages are only aligned by 16 bytes as well.
This PR changes the alignment of the pool pages to 64 bytes (enough for avx512) and takes the alignment information into account when selecting the szclass/pool.
It correctly handles unaligned allocations and allocations that are near the maximum pool size.
This should be squashed a bit before we merge.