Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Another array access performance issue. #13938

Open
IntegratedQuantum opened this issue Dec 14, 2022 · 4 comments · Fixed by #13972
Open

Another array access performance issue. #13938

IntegratedQuantum opened this issue Dec 14, 2022 · 4 comments · Fixed by #13972
Milestone

Comments

@IntegratedQuantum
Copy link
Contributor

Zig Version

0.11.0-dev.753+331861161

Steps to Reproduce and Observed Behavior

I thought this was covered by #12215, but now that it is fixed I still get terrible performance.
Here is a simple benchmark:

const std = @import("std");

const len = 32*32*32;

fn getIndex(i: u16) u16 {
	return i;
}

pub const Chunk = struct {
	blocks: [len]u16 = undefined,
};

pub noinline fn regenerateMainMesh(chunk: *Chunk) u32 {
	var sum: u32 = 0;
	var i: u16 = 0;
	while(i < len) : (i += 1) {
		sum += chunk.blocks[getIndex(i)]; // ← workaround: (&chunk.blocks)[...]
	}
	return sum;
}

pub fn main() void {
	var chunk: Chunk = Chunk{};
	for(chunk.blocks) |*block, i| {
		block.* = @intCast(u16, i);
	}
	const start = std.time.nanoTimestamp();
	const sum = regenerateMainMesh(&chunk);
	const end = std.time.nanoTimestamp();
	std.log.err("Time: {} Sum: {}", .{end - start, sum});
}

Even in release-fast the performance is terrible:

$ zig run test.zig -OReleaseFast
error: Time: 104980842 Sum: 536854528

godbolt reveals that, like in #12215, there is a memcpy when accessing the array.

Expected Behavior

When applying the workaround

- sum += chunk.blocks[getIndex(i)]; // ← workaround: (&chunk.blocks)[...]
+ sum += (&chunk.blocks)[getIndex(i)]; // ← workaround: (&chunk.blocks)[...]

the performance is significantly better:

$ zig run test.zig -OReleaseFast
error: Time: 4188 Sum: 536854528
@IntegratedQuantum IntegratedQuantum added the bug Observed behavior contradicts documented or intended behavior label Dec 14, 2022
@Vexu Vexu added this to the 0.11.0 milestone Dec 14, 2022
@shwqf
Copy link
Contributor

shwqf commented Dec 15, 2022

I guess .field_ptr should be generated in fieldAccess in this case.

working on it.

@andrewrk
Copy link
Member

Fix reverted in f93a36c

@andrewrk andrewrk reopened this Jan 29, 2024
@andrewrk andrewrk added optimization and removed bug Observed behavior contradicts documented or intended behavior labels Jan 29, 2024
@andrewrk andrewrk modified the milestones: 0.11.0, 0.12.0, 0.13.0 Jan 29, 2024
@zeroZshadow
Copy link

zeroZshadow commented May 1, 2024

I've been running into this quite often on platforms where llvm isn't great at optimizing (powerpc).

The following snipped worked great as a test example to show this bug happening and not happening even though all examples index into the array.

const std = @import("std");
const E = enum(u8){
  a,
  b,
  c,
  d,
};

var array: std.enums.EnumArray(E, u32) = std.enums.EnumArray(E, u32).initFill(0);

noinline fn handle(id: u32) u32 {
    // Causes odd stack pushing and popping with 4 or less enum members, but optimized out the memcpy
    // LLVM is unable to optimize out the memcopy when the enum has more than 4 members.
    return array.get(@enumFromInt(id));

    // These both cause correct code generation, and do not generate a call to memcpy in llvm ir
    //return array.values[id];
    //return array.getPtrConst(@enumFromInt(id)).*;
}

export fn square(id: u32) u32 {
    array.set(.a, 5);
    array.set(.c, 8);
    return handle(id);
}

I'm compiling this snipped with -O ReleaseSmall -target powerpc-freestanding-eabi to see the issue easily.
I hope it is of some use.

@cdemirer
Copy link

cdemirer commented Jun 30, 2024

A good explanation can be found here: #17580 (comment)

Summary: When an array is indexed, the expression on the left (the array, which is a value) is evaluated before (by design) the expression on the right (the function that returns the index). If there's any possibility that the function call modifies the array (which is the default assumption), we have no choice but to make a copy of the array before calling that function.


    return array.get(@enumFromInt(id));

I guess special cases (as in #13233) can be made for builtin functions like this. (Seems that's not the case as the index expression is the one at here and not the builtin one here)

For the general case, AFAIK possible solutions are:

  1. Redefine the evaluation order of the index and the indexed.
  2. Somehow optimize-out the copy...

IntegratedQuantum added a commit to PixelGuys/Cubyz that referenced this issue Sep 14, 2024
AlexDavies8 pushed a commit to AlexDavies8/Cubyz that referenced this issue Sep 19, 2024
catmeow72 pushed a commit to catmeow72/Cubyz that referenced this issue Sep 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants