Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Further performance improvements for FP math #905

Merged
merged 6 commits into from
Sep 19, 2024
Merged

Further performance improvements for FP math #905

merged 6 commits into from
Sep 19, 2024

Conversation

heshpdx
Copy link
Contributor

@heshpdx heshpdx commented Sep 9, 2024

I was requested to dig further in the spirit of !852 so I spent time inspecting other parts of the code. This PR unlocks more FDIV->FMUL opportunities. I inspected the assembly and confirmed that each of the changes did result in a transformation of FDIV to FMUL.

Here are the perf outliers, with the majority of other tests in the noise (since likely they do not exercise the changed code).

Grid path cells shows 4.3% upside, as measured on a Ampere Altra in the OCI Cloud, using gcc-10 out of the box. Regular h3 cmake build.
mainline:

$ ./build/bin/benchmarkGridPathCells 
        -- gridPathCellsNear: 39.136827 microseconds per iteration (10000 iterations)
        -- gridPathCellsFar: 1799.592192 microseconds per iteration (1000 iterations)

branch:

$ ./build_fast/bin/benchmarkGridPathCells 
        -- gridPathCellsNear: 37.556445 microseconds per iteration (10000 iterations)
        -- gridPathCellsFar: 1724.603344 microseconds per iteration (1000 iterations)

The experimental polygon to cells overlapping is also an outlier, 1-2%:

mainline:

        -- polygonToCellsSF_Overlapping: 5817.474534 microseconds per iteration (500 iterations)
        -- polygonToCellsAlameda_Overlapping: 13169.533630 microseconds per iteration (500 iterations)
        -- polygonToCellsSouthernExpansion_Overlapping: 381815.775600 microseconds per iteration (10 iterations)

branch:

        -- polygonToCellsSF_Overlapping: 5773.028512 microseconds per iteration (500 iterations)
        -- polygonToCellsAlameda_Overlapping: 12886.969474 microseconds per iteration (500 iterations)
        -- polygonToCellsSouthernExpansion_Overlapping: 377545.843300 microseconds per iteration (10 iterations)

More FDIV->FMUL opportunities unlocked, following in the
spirit of #852
@coveralls
Copy link

coveralls commented Sep 9, 2024

Coverage Status

coverage: 98.824% (-0.002%) from 98.826%
when pulling b7542c3 on heshpdx:master
into a1037eb on uber:master.

Copy link
Collaborator

@nrabinowitz nrabinowitz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for these optimizations!

src/h3lib/lib/localij.c Outdated Show resolved Hide resolved
Co-authored-by: Nick Rabinowitz <public@nickrabinowitz.com>
@heshpdx
Copy link
Contributor Author

heshpdx commented Sep 18, 2024

Looks like the CIFuzz is broken. Do we need to update that runner?

@isaacbrodsky
Copy link
Collaborator

Looks like the CIFuzz is broken. Do we need to update that runner?

I'm addressing it in #907. You may need to rebase or merge that change in too

@isaacbrodsky isaacbrodsky merged commit 4899d29 into uber:master Sep 19, 2024
34 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants