Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Round Up Size of All Structures to Structure Alignment #29

Closed
cwfitzgerald opened this issue Jun 28, 2021 · 9 comments · Fixed by #35
Closed

Round Up Size of All Structures to Structure Alignment #29

cwfitzgerald opened this issue Jun 28, 2021 · 9 comments · Fixed by #35

Comments

@cwfitzgerald
Copy link

cwfitzgerald commented Jun 28, 2021

Be ready to split some hairs :).

Take the following struct

struct Bleh {
    mat4 a;
    vec3 b;
}

Crevice interprets this struct as being of size 76, via calling std140_size_static.

This goes against what both the wgsl and GL spec specify, asking for a size of 80 (roundUp(16, 64 + 12)). It also interprets this struct as size 80 instead of the proper size 96.

struct Other {
    Bleh a;
    f32 b;
}

wgsl:

The size of a structure is equal to the offset plus the size of its last member, rounded to the next multiple of the structure’s alignment:

SizeOf(S) = roundUp(AlignOf(S), OffsetOf(S, L) + SizeOf(S, L))
Where L is the last member of the structure 

GL:

The structure may have padding at the end;the base offset of the member following the sub-structure is rounded up to the next multiple of the base alignment of the structure.

This GL spec rules only applies to the context of it being a member. Wgsl specifies this for both being a member and a top level struct. The GL spec does not say the size of the struct when it is a top level struct. Because of this omission, it should be valid to always interpret the size of this struct as 80.

This should make everyone agree, then we can live a happy life away from all this hair splitting and spec reading.

@superdump
Copy link
Contributor

I'm not sure if this works in all cases but this seems to work in at least some cases:

fn padded_size<T: AsStd140>() -> usize {
    (T::std140_size_static() + <T as AsStd140>::Std140Type::ALIGNMENT - 1)
        & !(<T as AsStd140>::Std140Type::ALIGNMENT - 1)
}

@superdump
Copy link
Contributor

I made a PR for the above as it seems to work, though I'm not certain it is fully correct as I just don't know enough about all of this stuff yet. :) #30

@superdump
Copy link
Contributor

superdump commented Jul 1, 2021

Copying from cart/bevy#18

Looking at the wgsl spec: https://www.w3.org/TR/WGSL/#alignment-and-size

For wgsl: it looks like a struct's alignment is the maximum alignment of its members. And a struct's size is the offset of the last member of the struct, plus the size of the last member of the struct (which is what std140_size_static() returns, unless I am mistaken) rounded up to the alignment of the struct, which is what the Std140Type::ALIGNMENT is. So I think this should be correct for structs.

I think implicit-stride arrays work the same way as glsl arrays, except that glsl seems to say things about aligning to vec4 unconditionally in some situations:

  1. If the member is a scalar consuming N basic machine units, the base alignment is N.
  2. If the member is a two- or four-component vector with components consuming N basic machine units, the base alignment is 2N or 4N, respectively.
  3. If the member is a three-component vector with components consuming N basic machine units, the base alignment is 4N.
  4. If the member is an array of scalars or vectors, the base alignment and array stride are set to match the base alignment of a single array element, according to rules (1), (2), and (3), and rounded up to the base alignment of a vec4. The array may have padding at the end; the base offset of the member following the array is rounded up to the next multiple of the base alignment.

(...more rules follow these. Read more here: https://www.khronos.org/registry/OpenGL/specs/gl/glspec45.core.pdf#page=159 )

It's not exactly clear to me what the rounded up to the base alignment of a vec4 means. Is it talking about the size?

@superdump
Copy link
Contributor

superdump commented Jul 1, 2021

Well... it seems it doesn't work in all cases. We're using #27 and with:

#[repr(C)]
#[derive(Copy, Clone, AsStd140, Default, Debug)]
pub struct GpuOmniLight {
    view_proj: Mat4,
    color: Vec4,
    position: Vec3,
    range: f32,
    radius: f32,
    shadow_bias_min: f32,
    shadow_bias_max: f32,
}

#[repr(C)]
#[derive(Copy, Clone, AsStd140, Default, Debug)]
pub struct GpuDirectionalLight {
    view_projection: Mat4,
    color: Vec4,
    dir_to_light: Vec3,
    shadow_bias_min: f32,
    shadow_bias_max: f32,
}

#[repr(C)]
#[derive(Copy, Clone, Debug, AsStd140)]
pub struct GpuLights {
    omni_lights: [GpuOmniLight; MAX_OMNI_LIGHTS],
    directional_lights: [GpuDirectionalLight; MAX_DIRECTIONAL_LIGHTS],
    ambient_color: Vec4,
    n_omni_lights: u32,
    n_directional_lights: u32,
}

pub const MAX_OMNI_LIGHTS: usize = 10;
pub const MAX_DIRECTIONAL_LIGHTS: usize = 1;

I get GpuLights::std140_padded_size_static() = 1264 where wgpu / wgsl wants 1280:

wgpu error: Validation Error

Caused by:
    In Device::create_render_pipeline
    error matching FRAGMENT shader requirements against the pipeline
    shader global ResourceBinding { group: 0, binding: 1 } is not available in the layout pipeline layout
    buffer structure size 1280, added to one element of an unbound array, if it's the last field, ended up greater than the given min_binding_size

I wonder what the difference is...

@superdump
Copy link
Contributor

For reference:

[pipelined/bevy_pbr2/src/render/mod.rs:46] GpuOmniLight::std140_size_static() = 108
[pipelined/bevy_pbr2/src/render/mod.rs:47] GpuOmniLight::std140_padded_size_static() = 112
[pipelined/bevy_pbr2/src/render/mod.rs:48] GpuDirectionalLight::std140_size_static() = 100
[pipelined/bevy_pbr2/src/render/mod.rs:49] GpuDirectionalLight::std140_padded_size_static() = 112
[pipelined/bevy_pbr2/src/render/mod.rs:50] GpuLights::std140_size_static() = 1256
[pipelined/bevy_pbr2/src/render/mod.rs:51] GpuLights::std140_padded_size_static() = 1264

@superdump
Copy link
Contributor

I have tried to do the calculations manually:

#[repr(C)]
#[derive(Copy, Clone, AsStd140, Default, Debug)]
pub struct GpuOmniLight {
    view_proj: Mat4,      // 4x4 x4 = 16 x4 = 64; 64
    color: Vec4,          //   4 x4         = 16; 80
    position: Vec3,       //   3 x4         = 12; 92
    range: f32,           //   1 x4         =  4; 96 // note that this f32 packs straight after the Vec3
    radius: f32,          //   1 x4         =  4; 100
    shadow_bias_min: f32, //   1 x4         =  4; 104
    shadow_bias_max: f32, //   1 x4         =  4; 108
}
// alignment is max(16, alignof(each member)), which is 16 here.
// 108 / 16 = 6.75
// padding is 0.25 * 16 = 4 bytes up to 112
// padded size is 112

#[repr(C)]
#[derive(Copy, Clone, AsStd140, Default, Debug)]
pub struct GpuDirectionalLight {
    view_projection: Mat4, // 4x4 x4 = 16 x4 = 64; 64
    color: Vec4,           //   4 x4         = 16; 80
    dir_to_light: Vec3,    //   3 x4         = 12; 92
    shadow_bias_min: f32,  //   1 x4         =  4; 96
    shadow_bias_max: f32,  //   1 x4         =  4; 100
}
// alignment is max(16, alignof(each member)), which is 16 here.
// 100 / 16 = 6.25
// padding is 0.75 * 16 = 12 bytes up to 112
// padded size is 112

#[repr(C)]
#[derive(Copy, Clone, Debug, AsStd140)]
pub struct GpuLights {
    omni_lights: [GpuOmniLight; MAX_OMNI_LIGHTS],                      // each array member's padded size is 112, 10 * 112 = 1120; 1120
    directional_lights: [GpuDirectionalLight; MAX_DIRECTIONAL_LIGHTS], // each array member's padded size is 112,  1 * 112 =  112; 1232
    ambient_color: Vec4,                                               //                                             4 x4 =   16; 1248
    n_omni_lights: u32,                                                //                                             1 x4 =    4; 1252
    n_directional_lights: u32,                                         //                                             1 x4 =    4; 1256
}
// alignment is max(16, alignof(each member)), which is 16 here.
// 1256 / 16 = 78.5
// padding is 0.5 * 16 = 8 bytes up to 1264
// padded size is 1264

Where am I incorrect such that wgpu/naga says that wgsl expects 1280 and I get 1264?

@superdump
Copy link
Contributor

Ok, the 1280 is a false alarm. It's an old bug in naga pre-0.5.0. bevy depends on wgpu 0.8.1 at the moment, which depends on naga 0.4.2, which reports a size of 1280 for this struct, where naga 0.5.0 and later (I checked master too) report 1264 as expected.

@superdump
Copy link
Contributor

I have updated #30 with what should be recursive padding. The Bleh and Other example given above, using some additional bevy derivations for glam types gives the expected results. The following code:

use bevy_math::{Mat4, Vec3};
use crevice::{
    internal::bytemuck::try_cast_slice,
    std140::{AsStd140, Std140, Std140Convertible},
};

#[derive(AsStd140, Clone, Debug)]
struct Bleh {
    a: Mat4,
    b: Vec3,
}

#[derive(AsStd140, Debug)]
struct Other {
    a: Bleh,
    b: f32,
}

fn main() {
    let bleh = Bleh {
        a: Mat4::from_cols_array(&[1.0f32; 16]),
        b: Vec3::ONE,
    };
    let other = Other {
        a: bleh.clone(),
        b: 1.0,
    };

    println!(
        "Bleh size {} padded size {}",
        Bleh::std140_size_static(),
        Bleh::std140_padded_size_static()
    );
    println!(
        "Other size {} padded size {}",
        Other::std140_size_static(),
        Other::std140_padded_size_static()
    );

    println!("\nbleh: {:?}", bleh);
    println!("other: {:?}", other);

    let bleh_value = bleh.as_padded_std140();
    let bleh_bytes = bleh_value.as_bytes();
    let bleh_f32 = try_cast_slice::<u8, f32>(bleh_bytes).unwrap();
    println!("\nbleh_f32: {}", bleh_f32.len() * 4);
    println!("mat4: {:?}", &bleh_f32[0..(16)]);
    println!("vec3: {:?}", &bleh_f32[(16)..(16 + 3)]);
    println!("padding: {:?}", &bleh_f32[(16 + 3)..(16 + 4)]);

    let other_value = other.as_padded_std140();
    let other_bytes = other_value.as_bytes();
    let other_f32 = try_cast_slice::<u8, f32>(other_bytes).unwrap();
    println!("\nother_f32: {}", other_f32.len() * 4);
    println!("mat4: {:?}", &other_f32[0..(16)]);
    println!("vec3: {:?}", &other_f32[(16)..(16 + 3)]);
    println!("padding: {:?}", &other_f32[(16 + 3)..(16 + 4)]);
    println!("f32: {:?}", &other_f32[(16 + 4)..(16 + 4 + 1)]);
    println!("padding: {:?}", &other_f32[(16 + 4 + 1)..(16 + 4 + 1 + 3)]);
}

gives:

Bleh size 76 padded size 80
Other size 84 padded size 96

bleh: Bleh { a: Mat4 { x_axis: Vec4(1.0, 1.0, 1.0, 1.0), y_axis: Vec4(1.0, 1.0, 1.0, 1.0), z_axis: Vec4(1.0, 1.0, 1.0, 1.0), w_axis: Vec4(1.0, 1.0, 1.0, 1.0) }, b: Vec3(1.0, 1.0, 1.0) }
other: Other { a: Bleh { a: Mat4 { x_axis: Vec4(1.0, 1.0, 1.0, 1.0), y_axis: Vec4(1.0, 1.0, 1.0, 1.0), z_axis: Vec4(1.0, 1.0, 1.0, 1.0), w_axis: Vec4(1.0, 1.0, 1.0, 1.0) }, b: Vec3(1.0, 1.0, 1.0) }, b: 1.0 }

bleh_f32: 80
mat4: [1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0]
vec3: [1.0, 1.0, 1.0]
padding: [0.0]

other_f32: 96
mat4: [1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0]
vec3: [1.0, 1.0, 1.0]
padding: [0.0]
f32: [1.0]
padding: [0.0, 0.0, 0.0]

I haven't implemented more tests, nor the changes to Writers to use the padded variants.

@benmkw
Copy link

benmkw commented Aug 30, 2021

Some discussion around this happened here #20 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants