Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Release 5.0.4 failed to build on GitHub Actions macOS 14.5 arm64 #12693

Closed
dalcinl opened this issue Jul 18, 2024 · 18 comments
Closed

Release 5.0.4 failed to build on GitHub Actions macOS 14.5 arm64 #12693

dalcinl opened this issue Jul 18, 2024 · 18 comments

Comments

@dalcinl
Copy link
Contributor

dalcinl commented Jul 18, 2024

Thank you for taking the time to submit an issue!

Background information

What version of Open MPI are you using? (e.g., v4.1.6, v5.0.1, git branch name and hash, etc.)

5.0.4

Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)

https://github.com/mpi4py/mpi-publish/blob/master/cibw-build-mpi.sh

Please describe the system on which you are running

  • Operating system/version: macOS 14.5
  • Computer hardware: Apple Silicon (arm64)
  • Network type: local, shm

Details of the problem

Full build logs: https://github.com/mpi4py/mpi-publish/actions/runs/9996695568/job/27631626010

...
2024-07-18T18:29:14.8093900Z /Users/runner/work/mpi-publish/mpi-publish/package/source/ompi/mca/op/aarch64/op_aarch64_functions.c:165:5: error: pasting formed 'int8##', an invalid preprocessing token
2024-07-18T18:29:14.8094720Z     OP_AARCH64_FUNC(max, s,  8, 16,   int, max)
2024-07-18T18:29:14.8094980Z     ^
2024-07-18T18:29:14.8095850Z /Users/runner/work/mpi-publish/mpi-publish/package/source/ompi/mca/op/aarch64/op_aarch64_functions.c:122:56: note: expanded from macro 'OP_AARCH64_FUNC'
2024-07-18T18:29:14.8096730Z         OP_CONCAT(OMPI_OP_TYPE_PREPEND, type##type_size####x##type_cnt##_t) vsrc, vdst;       \
2024-07-18T18:29:14.8097160Z                                                        ^
2024-07-18T18:29:14.8097920Z /Users/runner/work/mpi-publish/mpi-publish/package/source/ompi/mca/op/aarch64/op_aarch64_functions.c:165:5: error: use of undeclared identifier 'int8'
2024-07-18T18:29:14.8099220Z /Users/runner/work/mpi-publish/mpi-publish/package/source/ompi/mca/op/aarch64/op_aarch64_functions.c:122:41: note: expanded from macro 'OP_AARCH64_FUNC'
2024-07-18T18:29:14.8100050Z         OP_CONCAT(OMPI_OP_TYPE_PREPEND, type##type_size####x##type_cnt##_t) vsrc, vdst;       \
2024-07-18T18:29:14.8100450Z                                         ^
2024-07-18T18:29:14.8100720Z <scratch space>:133:1: note: expanded from here
2024-07-18T18:29:14.8100980Z int8
2024-07-18T18:29:14.8101110Z ^
2024-07-18T18:29:14.8101750Z /Users/runner/work/mpi-publish/mpi-publish/package/source/ompi/mca/op/aarch64/op_aarch64_functions.c:165:5: error: use of undeclared identifier 'vsrc'
2024-07-18T18:29:14.8102920Z /Users/runner/work/mpi-publish/mpi-publish/package/source/ompi/mca/op/aarch64/op_aarch64_functions.c:124:13: note: expanded from macro 'OP_AARCH64_FUNC'
2024-07-18T18:29:14.8103710Z             vsrc = vld1q##_##type_name##type_size(in);                                        \
2024-07-18T18:29:14.8104060Z             ^
2024-07-18T18:29:14.8104710Z /Users/runner/work/mpi-publish/mpi-publish/package/source/ompi/mca/op/aarch64/op_aarch64_functions.c:165:5: error: use of undeclared identifier 'vdst'
2024-07-18T18:29:14.8105880Z /Users/runner/work/mpi-publish/mpi-publish/package/source/ompi/mca/op/aarch64/op_aarch64_functions.c:125:13: note: expanded from macro 'OP_AARCH64_FUNC'
2024-07-18T18:29:14.8106670Z             vdst = vld1q##_##type_name##type_size(out);                                       \
2024-07-18T18:29:14.8107020Z             ^
2024-07-18T18:29:14.8107670Z /Users/runner/work/mpi-publish/mpi-publish/package/source/ompi/mca/op/aarch64/op_aarch64_functions.c:165:5: error: use of undeclared identifier 'vdst'
2024-07-18T18:29:14.8108800Z /Users/runner/work/mpi-publish/mpi-publish/package/source/ompi/mca/op/aarch64/op_aarch64_functions.c:127:13: note: expanded from macro 'OP_AARCH64_FUNC'
2024-07-18T18:29:14.8109600Z             vdst = OP_CONCAT(OMPI_OP_OP_PREPEND, op##q##_##type_name##type_size)(vdst, vsrc); \
2024-07-18T18:29:14.8109980Z             ^
2024-07-18T18:29:14.8111300Z /Users/runner/work/mpi-publish/mpi-publish/package/source/ompi/mca/op/aarch64/op_aarch64_functions.c:165:5: error: use of undeclared identifier 'vdst'
2024-07-18T18:29:14.8112500Z /Users/runner/work/mpi-publish/mpi-publish/package/source/ompi/mca/op/aarch64/op_aarch64_functions.c:127:82: note: expanded from macro 'OP_AARCH64_FUNC'
2024-07-18T18:29:14.8113330Z             vdst = OP_CONCAT(OMPI_OP_OP_PREPEND, op##q##_##type_name##type_size)(vdst, vsrc); 
...
@bosilca
Copy link
Member

bosilca commented Jul 18, 2024

The 5.x is missing 7c5ef48

@wenduwan wenduwan added the bug label Jul 18, 2024
@wenduwan
Copy link
Contributor

Backport #12694

@wenduwan
Copy link
Contributor

Fix merged. Will be included for the next release.

But this bug is annoying - recently we had a few snags on MacOS. I wonder if/how we should increase our CI coverage for MacOS(x86 + arm).

@jsquyres
Copy link
Member

@wenduwan Does this mean a 5.0.5 in the immediate future?

@wenduwan
Copy link
Contributor

@jsquyres I'm not sure. We are currently targeting 10/18. There are other critical bugfixes as well, and we can do another release once we get them in.

@jsquyres
Copy link
Member

The 4.1 is missing 7c5ef48

@bosilca I don't think this commit is relevant to the v4.1.x branch -- am I missing something?

@bosilca
Copy link
Member

bosilca commented Jul 18, 2024

My comment was incorrect, I was talking about the 5.x. Let me go and fix it.

@jsquyres
Copy link
Member

squyres I'm not sure. We are currently targeting 10/18. There are other critical bugfixes as well, and we can do another release once we get them in.

@wenduwan 5.0.4 fails to build on MacOS out of the box. Doesn't that qualify as an "oh crap!" and mandate an immediate 5.0.5?

@wenduwan
Copy link
Contributor

@jsquyres I need guidance for this. It's a build failure, meaning packagers (MacOS + ARM) are affected; on the other hand, end users won't be affected, i.e. no runtime error. Do we consider this to warrant an immediate release?

@wenduwan
Copy link
Contributor

I also want to hear from @dalcinl what is the impact to you and your user base?

@wenduwan wenduwan added this to the v5.0.5 milestone Jul 19, 2024
@rhc54
Copy link
Contributor

rhc54 commented Jul 19, 2024

Well, building on my Mac/M2, I see the following warnings that I don't grok (may be some missing lines as I only keep the stderr output):

configure: WARNING: -g has been added to CFLAGS (--enable-debug)
configure: WARNING: Could not find pmixcc
configure: WARNING: Your PMIx version is either does not
configure: WARNING: the capabilities feature or does not
configure: WARNING: include the PMIX_CAP_BASE capability flag
configure: WARNING: Ignoring this for now
...
configure: WARNING: UCX version is too old, please upgrade to 1.9 or higher.

There is also a flood of warnings out of OMPI itself, but I'll ignore those for now. However, it [edit: shouldn't have said "built just fine" given all the warnings] "successfully built", so I suspect the failure involves some specific set of aux libraries that activate components, or something else specific to the environment.

@bosilca
Copy link
Member

bosilca commented Jul 19, 2024

As far as I know UCX does not build on OSX. How does it find one installed on your M2 ?

wenduwan added a commit to wenduwan/ompi that referenced this issue Jul 19, 2024
Increase CI coverage to prevent open-mpi#12693

Signed-off-by: Wenduo Wang <wenduwan@amazon.com>
wenduwan added a commit to wenduwan/ompi that referenced this issue Jul 19, 2024
Increase CI coverage to prevent open-mpi#12693

Signed-off-by: Wenduo Wang <wenduwan@amazon.com>
wenduwan added a commit to wenduwan/ompi that referenced this issue Jul 19, 2024
Increase CI coverage to prevent open-mpi#12693

Signed-off-by: Wenduo Wang <wenduwan@amazon.com>
@rhc54
Copy link
Contributor

rhc54 commented Jul 19, 2024

As far as I know UCX does not build on OSX. How does it find one installed on your M2 ?

I have no idea! Certainly nothing I would have installed.

@jsquyres
Copy link
Member

Similar to @dalcinl's report, the Open MPI v5.0.4 tarball fails to compile for me out of the box on my MacOS Sonoma 14.5 M2 Pro:

  CC       liblocal_ops_neon_la-op_aarch64_functions.lo
op_aarch64_functions.c:165:5: error: pasting formed 'int8##', an invalid preprocessing token
    OP_AARCH64_FUNC(max, s,  8, 16,   int, max)
    ^
...lots of other similar errors...

This is a deal-breaker; v5.0.4 is a busted release.

If you have a software package that fails to compile on a major platform, that's a non-starter. It should never have been released.

Granted, real HPC jobs are not typically run on laptops, but macOS is a popular development and debugging platform -- so this is important. I do not buy the argument that Open MPI is only installed via packagers; I think we have a lot of users who download and build Open MPI from source.

@rhc54
Copy link
Contributor

rhc54 commented Jul 19, 2024

Hmmm...that's really weird. I have the exact same machine, same OS version - and it builds for me. I wonder why you are building code elements that don't get built on my machine (or maybe they just successfully build on mine)?

No opinion on your conclusion - just curious as to the difference in behavior and what that might portend, especially combined with the strange warnings I saw.

@jsquyres
Copy link
Member

FWIW: I downloaded the 5.0.4 tarball (not a git clone) and configured with:

./configure --with-prrte=internal --with-pmix=internal --with-libevent=/opt/homebrew --with-hwloc=/opt/homebrew

@dalcinl
Copy link
Contributor Author

dalcinl commented Jul 19, 2024

I also want to hear from @dalcinl what is the impact to you and your user base?

@wenduwan I managed to update my build scripts to include support for patches. Afterwards, I managed to build Python wheels successfully: https://anaconda.org/mpi4py/openmpi/files?version=5.0.4.

This issue will eventually hit conda-forge, https://github.com/conda-forge/openmpi-feedstock, but the fix is trivial, just patching sources.

Unless users want to build from source, they are otherwise not affected, it is only maintainers and distributors that have to deal with the compile issue.

@rhc54
Copy link
Contributor

rhc54 commented Jul 19, 2024

Historically, the vast majority of OMPI installations have been done from source - and not installed via package. Not saying it couldn't have changed, but that's what we've seen.

FWIW: I downloaded the 5.0.4 tarball (not a git clone) and configured with:

That may be the difference - I just checked out v5.0.4 in my git clone. Somewhat odd that this made a difference, though, as the two should be the same - unless the tag is wrong?

Personally, I treat such instances as a busted release, but I also temper it a bit. I rarely do an immediate re-release, but do move up the next release date to be a little sooner. Reason: I don't see any reports of mass suicides or illnesses as a result of having to wait another month or two for a software release on a particular platform. So if it works for the majority, I tend to let it ride for a little while in the expectation that I'm going to see multiple bug reports anyway - as we know, nobody tests these packages until they are released, so we always see a bunch of quick bug reports after release.

Just my $0.00002 🤷‍♂️

wenduwan added a commit to wenduwan/ompi that referenced this issue Jul 19, 2024
Increase CI coverage to prevent open-mpi#12693

Signed-off-by: Wenduo Wang <wenduwan@amazon.com>
wenduwan added a commit to wenduwan/ompi that referenced this issue Jul 22, 2024
Increase CI coverage to prevent open-mpi#12693

Signed-off-by: Wenduo Wang <wenduwan@amazon.com>
wenduwan added a commit to wenduwan/ompi that referenced this issue Jul 22, 2024
Increase CI coverage to prevent open-mpi#12693

Signed-off-by: Wenduo Wang <wenduwan@amazon.com>
wenduwan added a commit to wenduwan/ompi that referenced this issue Jul 22, 2024
Increase CI coverage to prevent open-mpi#12693

Signed-off-by: Wenduo Wang <wenduwan@amazon.com>
wenduwan added a commit to wenduwan/ompi that referenced this issue Jul 22, 2024
Increase CI coverage to prevent open-mpi#12693

Signed-off-by: Wenduo Wang <wenduwan@amazon.com>
wenduwan added a commit to wenduwan/ompi that referenced this issue Jul 22, 2024
Increase CI coverage to prevent open-mpi#12693

Signed-off-by: Wenduo Wang <wenduwan@amazon.com>
(cherry picked from commit fcf7e16)
wenduwan added a commit to wenduwan/ompi that referenced this issue Jul 22, 2024
Increase CI coverage to prevent open-mpi#12693

Signed-off-by: Wenduo Wang <wenduwan@amazon.com>
(cherry picked from commit fcf7e16)
hppritcha added a commit to hppritcha/spack that referenced this issue Jul 24, 2024
needed quick turnaround owing to

open-mpi/ompi#12693

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
tldahlgren pushed a commit to spack/spack that referenced this issue Jul 25, 2024
* Open MPI: add release 5.0.4
* OpenMPI: add release 5.0.5
   needed quick turnaround owing to
   open-mpi/ompi#12693

---------

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
johnwparent pushed a commit to johnwparent/spack that referenced this issue Aug 2, 2024
* Open MPI: add release 5.0.4
* OpenMPI: add release 5.0.5
   needed quick turnaround owing to
   open-mpi/ompi#12693

---------

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
diehlpk pushed a commit to diehlpk/spack that referenced this issue Aug 14, 2024
* Open MPI: add release 5.0.4
* OpenMPI: add release 5.0.5
   needed quick turnaround owing to
   open-mpi/ompi#12693

---------

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
tldahlgren pushed a commit to AcriusWinter/spack that referenced this issue Aug 20, 2024
* Open MPI: add release 5.0.4
* OpenMPI: add release 5.0.5
   needed quick turnaround owing to
   open-mpi/ompi#12693

---------

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
mvlopri pushed a commit to mvlopri/spack that referenced this issue Aug 23, 2024
* Open MPI: add release 5.0.4
* OpenMPI: add release 5.0.5
   needed quick turnaround owing to
   open-mpi/ompi#12693

---------

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
FrederickDeny pushed a commit to FrederickDeny/spack that referenced this issue Aug 26, 2024
* Open MPI: add release 5.0.4
* OpenMPI: add release 5.0.5
   needed quick turnaround owing to
   open-mpi/ompi#12693

---------

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants