You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This issue contains a description of how memory and pointers should work when aliased at compile time. As a task, this issue can be closed once stage 2 implements this behavior.
Definitions
Aggregate Types
Types in the following families are considered "Aggregate Types":
struct
union
optional
slice
array
vector
error union
All other types are not Aggregate Types.
If a type is not an Aggregate Type, then it is a Primitive Type.
Defined Representation
The following types have Defined Representation:
any integer (except comptime_int)
any float (except comptime_float)
bool
void (technically allowed but empty)
enum with specified integer type
any pointer or non-generic fn (but not slice)
extern struct/union
packed struct/union
array of any these types
vector of any of these types
For any given target, instances of these types have a well-defined layout in memory. They can be passed between compilation units safely, without fear of misinterpretation. Additionally, their layout is known and well-defined at compile time. This list may look familiar. It's also the list of values which are allowed in an extern struct.
The following types do not have Defined Representation:
comptime_int, comptime_float
noreturn, (null), (undefined)
slices, optionals, error unions
error sets
bare struct/union (no extern or packed)
enum without specified integer type (TODO maybe defined?)
type, generic fn
any frame, anyframe
enum literals
Top Level Variable
A "Top Level Variable" is a variable which is not part of something larger. In compiler parlance, it has its own provenance. The following are top level variables:
global const
global var
local const
local var
Everything else is not a top level variable:
member of an Aggregate Type
Memory Islands
As described in #7770, memory at compile time is split up into separate "islands". A pointer to one island can only ever point to data in that same island. There are two types of memory islands: Literal and Virtual. Virtual memory islands may have child islands, which can be Literal or Virtual. Literal memory islands have Defined Representation.
Memory Islands are arranged into trees. Each Top Level Variable creates one tree based on its type. The Memory Island Tree for a type is constructed as follows:
If the type has Defined Representation, use a Literal memory island matching the size of the type.
Otherwise, use a Virtual memory island. If the type is an Aggregate Type, add child islands for each member using this algorithm.
Let's look at some examples to make this clearer:
constPrimitive=u32;
// Primitive instances have one Literal island, in this case consisting of four bytes.constPtr=*u32;
// Pointers are also primitives, this type has one Literal island of 8 bytes (on 64 bit targets).constExternStruct=externstruct {
x: u32,
y: *u32,
};
// ExternStruct has one Literal island consisting of 16 bytes (on 64 bit targets).constBareStruct=struct {
x: u32,
y: *u32,
};
// BareStruct has a Virtual island containing two separate Literal islands for x and y.// Note that this means, for some instance b, &b is in a different island from &b.xconstComptimeStruct=struct {
x: u32,
y: comptime_int,
z: BareStruct,
};
// ComptimeStruct has a Virtual island containing a Literal island for x and Virtual islands for y and z.// z's Virtual island contains two separate Literal islands for z.x and z.y.constExternArray= [4]ExternStruct;
// ExternArray has one Literal island consisting of 16*4 bytes.constBareArray= [4]BareStruct// BareArray has a Virtual island containing four Virtual islands which match the layout in BareStruct.
Virtual Islands
Virtual islands are highly constrained. Pointers to virtual islands may be reinterpreted and added to at compile time, but attempting to read from a virtual island pointer that has been offset is a compile error.
comptime {
varbare: *BareArray=...; // BareArray has a Virtual island, so this is a Virtual pointer.varaddress=@ptrToInt(bare); // address is a late value, but is considered comptime knownvaroffset=address+4; // pointers can be offset at compile time, the linker can still resolve this.varas_int_ptr=@intToPtr([*]u32, offset); // again, the linker can still resolve this, not an error.varvalue=as_int_ptr.*; // Compile error! asIntPtr points to a Virtual island, so we don't know how to read this value!// If the offset is at zero and the type is correct, Virtual islands can still be read.varreborn_ptr=@ptrCast(*BareArray, as_int_ptr-1); // restore the original offset and cast back to the original typevarretrieved_value=reborn_ptr[0].x; // This works! We've gone in a circle but restored the original pointer value.
}
When offsetting a pointer to a Virtual island, only the following values are allowed:
comptime known integers
the size of a type with Defined Representation
if the island is an array, a multiple of the size of the element type
offsetting by this value changes the pointer to refer to one of the child elements.
if the island is a struct, the offset of a field of that struct
offsetting by this value changes the pointer to refer to that field.
otherwise, walk up the island tree. If the offset is a multiple of the size of the element type of any array, change the index in that array. If the offset is the negative of the offset of a field index, treat it like fieldParentPtr.
Attempting to offset a pointer to a Virtual island with any other value (e.g. offset of a field in an unrelated struct, or size of an unrelated frame) is a compile error.
@fieldParentPtr can be used to obtain the parent island. If @fieldParentPtr is used to obtain a pointer to a virtual island, and the actual parent is nonexistent or has a different type, it triggers an immediate compile error.
Literal Islands
In contrast, Literal islands behave like runtime memory, and can be fully type punned. You can reinterpret the memory as you see fit.
When offsetting a pointer to a Literal island, only the following values are allowed:
comptime known integers
the size of a type with Defined Representation
otherwise, walk up the island tree. If the offset is a multiple of the size of the element type of any array, change the index in that array. If the offset is the negative of the offset of a field index, treat it like fieldParentPtr.
Comptime memory may contain undefined bits. Undefined can be specified at bit granularity, using packed structs. However, this can only be done in memory itself. If you attempt to load a primitive value (like u32), the entire value is undefined if any bit is undefined. For example:
comptime {
constP=packedstruct { x: u1=undefined, y: u31=0 };
varp=P{}; // p has 1 bit undefined and 31 bits undefinedvarptr=&p; // force p to be in memoryvarx: u32=@ptrCast(*u32, &p).*; // read as u32// x is undefined, because one of the source bits was undefined.// bitcast has the same behavior:vary=@bitCast(u32, p);
// y is undefined because one of the source bits was undefined.
}
An additional consideration is that some values (like pointers and offsets) have Defined Representation, but do not have fully comptime known values. I'll refer to these as Deferred values. In many ways, Deferred values behave like undefined at compile time. You can't do math on them or otherwise inspect their value, but you can copy them from place to place.
Let's consider an example:
constExample=externstruct {
a: u32,
b: *u32,
};
For a 64 bit target, the literal island backing a comptime-known Example instance might look like this:
This operation has created one byte that is partially undefined, and split the lazy value into two pieces. This ability to partially overwrite a deferred value has some pretty cool benefits:
you can do a byte-for-byte copy of a struct containing a pointer at compile time
you can do pointer packing (stick data in zeroed bits) at compile time
Implementation
This section describes a simple but memory inefficient implementation of the above, to show that all the offset rules can be implemented without too much work. Contributors please note that the actual implementation probably will (and should) look very different.
/// A node in the island treeconstIsland=struct {
/// The larger virtual island which contains this islandparent: ?*Island,
/// The layout of this islandtrue_layout: *ZigType,
/// Whether this island stores actual memorykind: enum { literal, virtual },
value: union {
/// sub-islands. The count and virtual/literalness of these/// items is stored in the true_layout.array_or_field_members: ?[*]AnyIsland,
literal_data: struct {
bytes: []constu8,
undefined_regions: []BitSpan,
deferred_regions: []DeferredValue,
},
/// virtual primitives have a special value reference insteadvirtual_value: ...,
},
};
/// A region of bits which comes from another region that has not/// yet been determined, e.g. a pointer or field offset.constDeferredValue=struct {
dst_span: BitSpan,
src_span: BitSpan,
src_value: *DeferredValueInfo,
};
/// A contiguous region of bitsconstBitSpan=struct {
start_bit_offset: u3,
end_bit_offset: u3,
start_byte_offset: msize,
end_byte_offset: msize,
};
/// This special value backs comptime known pointers, and also/// integers derived from `@ptrToInt` on comptime known pointers.constValueSpecialPointer=struct {
/// The narrowest island to which this pointer pointsisland: *Island,
/// The integer offset of this pointer./// If offset is not zero, dereference is a compile error.offset: msize,
};
/// @sizeOf(T) returns this special value if T does not have Defined Representation./// It supports linear transformations.constValueSpecialSize=struct {
of_type: *ZigType,
multiply: isize=1,
add: msize=0,
};
The data structure described above can implement all parts of this proposal, but is not very efficient. An important optimization here comes from realizing that the expensive cases (bit spans, type punning) are actually exceedingly rare in comptime code. We can avoid a lot of work by using more optimized representations for more common cases. For example, we can keep all memory islands virtual up until a write through a type punned pointer is performed. At that point, the island needs to be converted to a literal region. But if the memory is never type punned, it can remain in a faster format for its whole lifetime.
Open Questions
Linking Very Late Values
Pointers are sometimes resolved when the program is loaded, and never known by any part of the compiler. In order to have lazy bit slices of deferred pointer values, we need a way for the program loader to set this up. We need to make sure all loaders can do this.
An alternative, if some loaders can't, is to generate a simplified start code stub. This stub would only copy data, and would not be user programmable. Since it's entirely understood by the compiler, the compiler can check for circular references and issue a compile error in those cases.
Offset Order Dependency
As specified above, there's a peculiar ordering needed for field offsets. If I have struct A contains B contains C, and I take an int pointer to a. I can offset that pointer by the offset of B, and then the offset of C, to get a pointer to the innermost struct. But if I offset by the offset of C first, that's a compile error because the compiler doesn't see how C could be related to A. This is true even if I intend to offset by B next. This might be fine, since adding offsets to each other at compile time is an error. But there might be a way to represent this properly without infinitely expanding memory. If so, a follow-up proposal for this is welcome.
The text was updated successfully, but these errors were encountered:
SpexGuy
added
proposal
This issue suggests modifications. If it also has the "accepted" label then it is planned.
accepted
This proposal is planned.
labels
Aug 29, 2021
When implementing this, note that Clang's way to handle something like (intptr_t)&x + 1 is to replace the cast with a char* cast and then do a ptrtoint cast at the end:
This issue contains a description of how memory and pointers should work when aliased at compile time. As a task, this issue can be closed once stage 2 implements this behavior.
Definitions
Aggregate Types
Types in the following families are considered "Aggregate Types":
All other types are not Aggregate Types.
If a type is not an Aggregate Type, then it is a Primitive Type.
Defined Representation
The following types have Defined Representation:
For any given target, instances of these types have a well-defined layout in memory. They can be passed between compilation units safely, without fear of misinterpretation. Additionally, their layout is known and well-defined at compile time. This list may look familiar. It's also the list of values which are allowed in an
extern struct
.The following types do not have Defined Representation:
Top Level Variable
A "Top Level Variable" is a variable which is not part of something larger. In compiler parlance, it has its own provenance. The following are top level variables:
Everything else is not a top level variable:
Memory Islands
As described in #7770, memory at compile time is split up into separate "islands". A pointer to one island can only ever point to data in that same island. There are two types of memory islands: Literal and Virtual. Virtual memory islands may have child islands, which can be Literal or Virtual. Literal memory islands have Defined Representation.
Memory Islands are arranged into trees. Each Top Level Variable creates one tree based on its type. The Memory Island Tree for a type is constructed as follows:
If the type has Defined Representation, use a Literal memory island matching the size of the type.
Otherwise, use a Virtual memory island. If the type is an Aggregate Type, add child islands for each member using this algorithm.
Let's look at some examples to make this clearer:
Virtual Islands
Virtual islands are highly constrained. Pointers to virtual islands may be reinterpreted and added to at compile time, but attempting to read from a virtual island pointer that has been offset is a compile error.
When offsetting a pointer to a Virtual island, only the following values are allowed:
Attempting to offset a pointer to a Virtual island with any other value (e.g. offset of a field in an unrelated struct, or size of an unrelated frame) is a compile error.
@fieldParentPtr
can be used to obtain the parent island. If@fieldParentPtr
is used to obtain a pointer to a virtual island, and the actual parent is nonexistent or has a different type, it triggers an immediate compile error.Literal Islands
In contrast, Literal islands behave like runtime memory, and can be fully type punned. You can reinterpret the memory as you see fit.
When offsetting a pointer to a Literal island, only the following values are allowed:
Comptime memory may contain undefined bits. Undefined can be specified at bit granularity, using packed structs. However, this can only be done in memory itself. If you attempt to load a primitive value (like u32), the entire value is undefined if any bit is undefined. For example:
An additional consideration is that some values (like pointers and offsets) have Defined Representation, but do not have fully comptime known values. I'll refer to these as Deferred values. In many ways, Deferred values behave like undefined at compile time. You can't do math on them or otherwise inspect their value, but you can copy them from place to place.
Let's consider an example:
For a 64 bit target, the literal island backing a comptime-known
Example
instance might look like this:By using type punning with a packed struct, I could overwrite some bits in the middle of this region:
This operation has created one byte that is partially undefined, and split the lazy value into two pieces. This ability to partially overwrite a deferred value has some pretty cool benefits:
Implementation
This section describes a simple but memory inefficient implementation of the above, to show that all the offset rules can be implemented without too much work. Contributors please note that the actual implementation probably will (and should) look very different.
The data structure described above can implement all parts of this proposal, but is not very efficient. An important optimization here comes from realizing that the expensive cases (bit spans, type punning) are actually exceedingly rare in comptime code. We can avoid a lot of work by using more optimized representations for more common cases. For example, we can keep all memory islands virtual up until a write through a type punned pointer is performed. At that point, the island needs to be converted to a literal region. But if the memory is never type punned, it can remain in a faster format for its whole lifetime.
Open Questions
Linking Very Late Values
Pointers are sometimes resolved when the program is loaded, and never known by any part of the compiler. In order to have lazy bit slices of deferred pointer values, we need a way for the program loader to set this up. We need to make sure all loaders can do this.
An alternative, if some loaders can't, is to generate a simplified start code stub. This stub would only copy data, and would not be user programmable. Since it's entirely understood by the compiler, the compiler can check for circular references and issue a compile error in those cases.
Offset Order Dependency
As specified above, there's a peculiar ordering needed for field offsets. If I have struct A contains B contains C, and I take an int pointer to a. I can offset that pointer by the offset of B, and then the offset of C, to get a pointer to the innermost struct. But if I offset by the offset of C first, that's a compile error because the compiler doesn't see how C could be related to A. This is true even if I intend to offset by B next. This might be fine, since adding offsets to each other at compile time is an error. But there might be a way to represent this properly without infinitely expanding memory. If so, a follow-up proposal for this is welcome.
The text was updated successfully, but these errors were encountered: