Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

faster duplicate_overlapping #69

Merged
merged 1 commit into from
Jan 31, 2023
Merged

faster duplicate_overlapping #69

merged 1 commit into from
Jan 31, 2023

Conversation

PSeitz
Copy link
Owner

@PSeitz PSeitz commented Jan 31, 2023

improve duplicate_overlapping unsafe version. The compiler generates unfavourable assembly for the simple version.
Now we copy 4 bytes, instead of one in every iteration.
Without that the compiler will unroll/auto-vectorize the copy with a lot of branches.
This is not what we want, as large overlapping copies are not that common.

improve duplicate_overlapping unsafe version. The compiler generates unfavourable assembly for the simple version.
Now we copy 4 bytes, instead of one in every iteration.
Without that the compiler will unroll/auto-vectorize the copy with a lot of branches.
This is not what we want, as large overlapping copies are not that common.
@PSeitz PSeitz merged commit febf558 into main Jan 31, 2023
PSeitz added a commit that referenced this pull request Apr 30, 2023
fixes checked decode checks
revert #69 as this leads to out of bounds writes
@PSeitz PSeitz mentioned this pull request Apr 30, 2023
PSeitz added a commit that referenced this pull request Apr 30, 2023
fixes checked decode checks
revert #69 as this leads to out of bounds writes
PSeitz added a commit that referenced this pull request May 27, 2023
This is another attempt to replace the aggressive compiler after the
failed attempt #69 (wrote out of bounds in some cases)

The unrolling is avoided by manually unrolling less aggressive.
Decompression performance is slightly improved by ca 4%, except the
smallest test case.
PSeitz added a commit that referenced this pull request May 27, 2023
This is another attempt to replace the aggressive compiler after the
failed attempt #69 (wrote out of bounds in some cases)

The unrolling is avoided by manually unrolling less aggressive.
Decompression performance is slightly improved by ca 4%, except the
smallest test case.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant