-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
enable SIMD on power9 (requires gcc>=8) #89
Conversation
Should we check the compiler version? We are (supposedly) shipping manylinux2014 wheels for power9... |
Not easy to get the version of the compiler from Python if I remember well ...
I did it for Blosc but the build is driven by Cmake.
The same directive exists for clang, but it appeared in version 9 (IIRC).
|
There are distutils hacks, but the simplest seems to me to recover the output of` |
On Power9, we are now building manylinux2014 wheels and Ubuntu20.04 packages. It would be best not to modify files under |
On Wed, 30 Sep 2020 01:03:04 -0700 Thomas VINCENT ***@***.***> wrote:
It would be best not to modify files under `src/` since it is subtrees of other projects.
There is no need to change ` src/bitshuffle/setup.py` since it is not used, and change in ` src/bitshuffle/src/bitshuffle_core.c` can be replaced by `"-DUSESSE2"` in the project's `setup.py` as it is already done for `"-DNO_WARN_X86_INTRINSICS"`
I am not sure about that ...
I see no problem in submitting a PR to bitshuffle
|
Yes, a PR to bitshuffle would be best. |
Changing line 361 in your setup.py to extra_compile_args += ["-DNO_WARN_X86_INTRINSICS", "-DUSESSE2"] and not touching any file under src should be enough to allow the desired behavior without waiting for upstream changes. |
You can have a look at this branch: https://github.com/t20100/hdf5plugin/tree/pr89 |
I tried and gcc emits a warning that USESSE2 is already defined ... |
My PR is available upstream to: |
I get no warning with: https://github.com/t20100/hdf5plugin/tree/pr89 |
Closing as supersedes by #90 |
This speeds up the reading of Eiger2 dataset from 36ms to 19ms (on one core) on our power9 computers.
x86 computers are much faster for this task (~12ms)