-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf: arm64 performance optimizations #4288
Changes from 2 commits
8ec5ffa
ec46d50
8150684
ee6a431
98d3d44
941c896
01c560c
ba88bf3
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||
---|---|---|---|---|
|
@@ -32,6 +32,10 @@ VW_WARNING_STATE_POP | |||
#include "vw/core/vw_versions.h" | ||||
#include "vw/io/logger.h" | ||||
|
||||
#if defined(__ARM_NEON) | ||||
#include <sse2neon/sse2neon.h> | ||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This has not been made available to the The include should be added here:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I just created a package in Would it be okay if I add the header directly in the source code? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think it is still preferable to use the submodule. I can push a commit to the branch with this change, There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
That would be nice of you There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @jackgerrits FYI, I added There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks for adding that, that's super helpful! There was a tiny issue with the installed path, I went ahead and opened a fix here microsoft/vcpkg#28268. When that gets merged I will update this PR to consume. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'll go ahead and merge this one and we can update to consume vcpkg later. |
||||
#endif | ||||
|
||||
#include <algorithm> | ||||
#include <cassert> | ||||
#include <cmath> | ||||
|
@@ -164,7 +168,7 @@ inline float fastdigamma(float x) | |||
|
||||
#if !defined(VW_NO_INLINE_SIMD) | ||||
|
||||
# if defined(__SSE2__) || defined(__SSE3__) || defined(__SSE4_1__) | ||||
# if defined(__SSE2__) || defined(__SSE3__) || defined(__SSE4_1__) || defined(__ARM_NEON) | ||||
|
||||
namespace | ||||
{ | ||||
|
@@ -186,6 +190,13 @@ inline bool is_aligned16(void* ptr) | |||
# include <smmintrin.h> | ||||
# endif | ||||
|
||||
// Transport SSE intrinsics through sse2neon on ARM: | ||||
#if defined(__ARM_NEON) | ||||
#define __SSE2__ 1 | ||||
#define __SSE3__ 1 | ||||
#define __SSE4_1__ 1 | ||||
#endif | ||||
|
||||
# define HAVE_SIMD_MATHMODE | ||||
|
||||
typedef __m128 v4sf; | ||||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is specific to this CPU, so we should not be adding it whenever we encounter an arm CPU. Generally for specific optimization flags like this it is better to add them when configuring your own build.
So, in this case we should remove this but in your build you would add the following to your command line to achieve the same outcome:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm going to go ahead and push a change removing this piece so we can move forward here.