Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Portals4: Compiler Error due to missing PtlHandleIsEqual #12705

Closed
nbartelheimer opened this issue Jul 23, 2024 · 9 comments
Closed

Portals4: Compiler Error due to missing PtlHandleIsEqual #12705

nbartelheimer opened this issue Jul 23, 2024 · 9 comments
Assignees
Milestone

Comments

@nbartelheimer
Copy link

Background information

I use OpenMPI and its Portals4 implementation to evaluate the performance of different library im working on.

What version of Open MPI are you using? (e.g., v4.1.6, v5.0.1, git branch name and hash, etc.)

OpenMPI v.5.0.5

Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)

OpenMPI was installed from a source / distribution tarball

Please describe the system on which you are running

OS: Rocky Linux 8.9 (Green Obsidian)
CPU: Intel(R) Xeon(R) Gold 5122 CPU @ 3.60GHz
Network type: BXIv2

Details of the problem

When I try to compile OpenMPI with the --with-portals4 option, I receive a compiler error regarding the comparison of two Portals4 handles. For example in this code snippet (opal/mca/btl/portals4/btl_portals4.c +490):

 if (frag->me_h != PTL_INVALID_HANDLE) {
            frag->me_h = PTL_INVALID_HANDLE;
        }

Portals4 specification proposes the use of PtlHandleIsEqual() in order to compare two handles.

Since I'm working with the BXIv2 interconnect, which is more or less a hardware implementation of Portals4, I'm not sure if this error also arises when using the Portals4 InfiniBand reference implementation from Sandia.

There are several positions in the Portals4 code base where I ran into the same issue. I have modified all the necessary parts so that they meet the Portals4 specification.

The fix for the snippet from above would look like this:

if (!PtlHandleIsEqual(frag->me_h, PTL_INVALID_HANDLE)) {
            frag->me_h = PTL_INVALID_HANDLE;
        }

After my modification I was able to compile OpenMPI and observe decent performance numbers on the OMB.

Can someone maybe comment on my observation and the suggested fix?

@tkordenbrock
Copy link
Member

Hi @nbartelheimer. What compiler and version are you using? I typically use clang 16.0.6. When using the Sandia reference implementation, the v5.0.x portals4 components build error free. There are some warnings that I will fix ASAP.

Note that there is currently a problem with the portals4 components on the main branch. I'm working on fixing that also.

The portals4 components currently use a mix of PtlHandleIsEqual and pointer comparison (older code that was never updated). I'll update those uses to the latest recommended by the standard.

@nbartelheimer
Copy link
Author

Hi @tkordenbrock,
I used the gcc 12.3.0.
Thank you very much for the reply and your work.

@tkordenbrock
Copy link
Member

I will test with gcc 12.3.0, but I expect that the problem is with the underlying types. In the reference implementation, ptl_handle_any_t is simple a uint32_t. Perhaps the BXIv2 implementation of ptl_handle_any_t can't be compared with !=.

I am going to reopen this issue until I can push the PtlHandleIsEqual fixes and you can test them.

@tkordenbrock tkordenbrock reopened this Jul 24, 2024
@nbartelheimer
Copy link
Author

I agree, it is a problem of the underlying types.
The definition in the BXIv2 Portals implementation looks like this:

typedef struct {
        void *handle;
} ptl_handle_any_t;

I'm sorry this is my first issue, sure I will test the fixes once they are ready.

@tkordenbrock
Copy link
Member

@nbartelheimer I opened a PR (#12713) against the main branch. After it is reviewed and merged, I will open a PR against the v5.0.x branch. If you would like to try the fixes (probably identical to what you already have), a patch file is attached.
PtlHandleIsEqual.patch

@nbartelheimer
Copy link
Author

@tkordenbrock I applied your patch to the main branch. It looks like some spots are missing. I created
PtlHandleIsEqual_BXI.patch. Maybe you can check the additional changes.

Hope this is helpful.

@tkordenbrock
Copy link
Member

Gak! I missed the constant first style. And those 0s. Thanks for the patch. I'll get these reviewed and merged ASAP.

@nbartelheimer
Copy link
Author

No worries, maybe I should have sent a patch in the first place. Might have been easier.

Thanks for your help!

@tkordenbrock
Copy link
Member

@nbartelheimer This has been merged into the v5.0.x branch, so it will be in the upcoming v5.0.6 release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants