Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UCP/RMA/FLUSH: Dynamic Selection of Strong vs. Weak Fence #10474

Open
wants to merge 21 commits into
base: master
Choose a base branch
from
Open
Changes from 1 commit
Commits
Show all changes
21 commits
Select commit Hold shift + click to select a range
7d123db
UCP/RMA/FLUSH: Add unflushed_lanes to ucp_ep
michal-shalev Jan 22, 2025
74f5e73
UCP/RMA/FLUSH: Add sequence numbers to EP and worker structs
michal-shalev Feb 9, 2025
8d7bf47
UCP/RMA/FLUSH: Delete auto fence mode and change default to strong
michal-shalev Feb 9, 2025
4b5fbc7
UCP/RMA/FLUSH: Add EP-based fence mode
michal-shalev Feb 9, 2025
c8d8e9f
UCP/RMA/FLUSH: Implement per-EP weak and strong fences
michal-shalev Feb 9, 2025
57e8752
UCP/RMA/FLUSH: Handle fence during one-sided operations
michal-shalev Feb 9, 2025
a33f74d
UCP/RMA/FLUSH: Move unflushed_lanes from the union
michal-shalev Feb 19, 2025
2d459bd
UCP/RMA/FLUSH: Move fence_seq from the union
michal-shalev Feb 19, 2025
6160121
UCP/RMA/FLUSH: Add correctness test
michal-shalev Feb 23, 2025
c306d1f
UCP/RMA/FLUSH: PR fixes
michal-shalev Mar 2, 2025
d139b75
UCP/RMA/FLUSH: PR fixes 2.0
michal-shalev Mar 3, 2025
22aef72
UCP/RMA/FLUSH: PR fixes 3.0
michal-shalev Mar 4, 2025
471cfbe
UCP/RMA/FLUSH: PR fixes 4.0
michal-shalev Mar 4, 2025
0101b84
UCP/RMA/FLUSH: PR fixes 5.0
michal-shalev Mar 5, 2025
df2b83f
UCP/RMA/FLUSH: PR fixes 6.0
michal-shalev Mar 5, 2025
74d5cdb
UCP/RMA/FLUSH: PR fixes 7.0
michal-shalev Mar 5, 2025
09eee92
UCP/RMA/FLUSH: PR fixes 8.0
michal-shalev Mar 6, 2025
9126795
UCP/RMA/FLUSH: PR fixes 9.0
michal-shalev Mar 9, 2025
c28dd5a
UCP/RMA/FLUSH: PR fixes 10.0
michal-shalev Mar 9, 2025
1cfc7d8
UCP/RMA/FLUSH: PR fixes 11.0
michal-shalev Mar 10, 2025
1b09153
UCP/RMA/FLUSH: PR fixes 12.0
michal-shalev Mar 10, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 6 additions & 4 deletions src/ucp/rma/put_offload.c
Original file line number Diff line number Diff line change
Expand Up @@ -27,10 +27,12 @@ static ucs_status_t ucp_proto_put_offload_short_progress(uct_pending_req_t *self
ucs_status_t status;
uct_rkey_t tl_rkey;

if (!(req->flags & UCP_REQUEST_FLAG_PROTO_INITIALIZED)) {
ucp_proto_single_rma_init_func(req);
req->flags |= UCP_REQUEST_FLAG_PROTO_INITIALIZED;
}
/* Use a signed boolean mask (-!!) to update unflushed lanes branchlessly.
* Evaluates to 0xFFFFFFFFFFFFFFFF if initialized, 0 otherwise. */
req->send.ep->ext->unflushed_lanes |=
UCS_BIT(spriv->super.lane) &
-!!(req->flags & UCP_REQUEST_FLAG_PROTO_INITIALIZED);
req->flags |= UCP_REQUEST_FLAG_PROTO_INITIALIZED;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how does the assembly compare? seems more complicated IMO

Copy link
Contributor Author

@michal-shalev michal-shalev Mar 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used Godbolt to check:

#define FLAG 0x8
#define LANE 0x6

void update_with_branch(uint64_t *value, uint64_t flags) {
    if (flags & FLAG) {
        *value |= LANE;
    }
}

void update_branchless(uint64_t *value, uint64_t flags) {
    *value |= LANE & -!!(flags & FLAG);
}
update_with_branch:
        and     esi, 8
        je      .L1
        or      QWORD PTR [rdi], 6
.L1:
        rep ret
update_branchless:
        sal     rsi, 60
        sar     rsi, 63
        and     esi, 6
        or      QWORD PTR [rdi], rsi
        ret

This shows that update_with_branch includes a conditional branch (je .L1),
while update_branchless avoids branching entirely by using bitwise operations.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but actually we have 2 flags, can you check in godbolt something like

#define FLAG1 0x8
#define FLAG2 0x20

void update_with_branch(uint64_t *value, uint64_t flags) {
    if (flags & FLAG1) {
        *value |= FLAG2;
    }
}

void update_branchless(uint64_t *value, uint64_t flags) {
    *value |= FLAG1 & -!!(flags & FLAG2);
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated my original comment @yosefe

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, maybe we could update unflushed_lanes unconditionally?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's the best solution so far, pushed another commit


tl_rkey = ucp_rkey_get_tl_rkey(req->send.rma.rkey, spriv->super.rkey_index);
status = uct_ep_put_short(ucp_ep_get_fast_lane(ep, spriv->super.lane),
Expand Down
Loading