Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement IPC cache for umfOpenIPCHandle function #736

Merged
merged 8 commits into from
Nov 13, 2024

Conversation

vinser52
Copy link
Contributor

@vinser52 vinser52 commented Sep 17, 2024

Description

This PR adds initial set of changes to implement cache for opened IPC handles.
The new third-party data structure (uthash - has map implementation) is added. The same data structure is used by MPI to implement IPC cache. Here is the uthash documentation. Here is the original source repo

Fixes: #403

Checklist

  • Code compiles without errors locally
  • All tests pass locally
  • CI workflows execute properly
  • All newly added source files have a license
  • All newly added source files are referenced in CMake files

src/uthash/uthash.h Outdated Show resolved Hide resolved
@vinser52 vinser52 force-pushed the svinogra_ipc_cache branch 3 times, most recently from ab7b9ee to b969903 Compare October 1, 2024 14:18
@vinser52 vinser52 force-pushed the svinogra_ipc_cache branch 6 times, most recently from 4fd30e3 to 8a96eb0 Compare October 13, 2024 00:00
@vinser52 vinser52 mentioned this pull request Oct 13, 2024
3 tasks
@vinser52 vinser52 force-pushed the svinogra_ipc_cache branch 4 times, most recently from 50b1d4e to 3d012df Compare October 14, 2024 21:25
@vinser52 vinser52 marked this pull request as ready for review October 14, 2024 21:26
@vinser52 vinser52 requested a review from a team as a code owner October 14, 2024 21:26
@bratpiorka bratpiorka requested a review from lplewa October 15, 2024 08:43
@vinser52 vinser52 force-pushed the svinogra_ipc_cache branch 2 times, most recently from 0823eaf to cb6248f Compare October 20, 2024 21:09
src/uthash/uthash.h Outdated Show resolved Hide resolved
@vinser52 vinser52 force-pushed the svinogra_ipc_cache branch 2 times, most recently from fc2f15b to 8b3626e Compare October 21, 2024 14:36
@vinser52 vinser52 requested a review from bratpiorka October 22, 2024 21:15
@vinser52 vinser52 requested a review from ldorau October 25, 2024 13:24
src/ipc.c Show resolved Hide resolved
src/ipc_cache.c Outdated Show resolved Hide resolved
src/ipc_cache.c Show resolved Hide resolved
src/ipc_cache.c Outdated Show resolved Hide resolved
src/ipc_cache.c Show resolved Hide resolved
src/ipc_cache.h Outdated Show resolved Hide resolved
src/ipc_cache.c Outdated Show resolved Hide resolved
src/libumf.c Outdated Show resolved Hide resolved
src/provider/provider_tracking.c Outdated Show resolved Hide resolved
src/provider/provider_tracking.c Outdated Show resolved Hide resolved
src/ipc_cache.c Show resolved Hide resolved
src/ipc_cache.c Outdated Show resolved Hide resolved
src/ipc_cache.h Outdated Show resolved Hide resolved
src/ipc_cache.h Outdated Show resolved Hide resolved
src/ipc_cache.c Show resolved Hide resolved
src/provider/provider_tracking.c Show resolved Hide resolved
src/ipc_cache.c Show resolved Hide resolved
src/ipc_cache.c Show resolved Hide resolved
@vinser52 vinser52 force-pushed the svinogra_ipc_cache branch 4 times, most recently from d1e2e55 to d10b8a8 Compare November 4, 2024 22:56
Copy link
Contributor

@ldorau ldorau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please enable lines no: 356 and 359:

// TODO fix it - it does not work in case of IPC cache hit
// EXPECT_EQ(stat.allocCount, stat.getCount);
EXPECT_EQ(stat.getCount, stat.putCount);
// TODO fix it - it does not work in case of IPC cache hit
// EXPECT_EQ(stat.openCount, stat.getCount);

It seems that the check in the line no 356:

EXPECT_EQ(stat.allocCount, stat.getCount);

still fails in the umf-provider_devdax_memory test:

[ RUN      ] DevDaxProviderDifferentPoolsTest/umfIpcTest.AllocFreeAllocTest/2
/home/ldorau/work/unified-memory-framework/test/ipcFixtures.hpp:354: Failure
Expected equality of these values:
  stat.allocCount
    Which is: 5
  stat.getCount
    Which is: 1

[  FAILED  ] DevDaxProviderDifferentPoolsTest/umfIpcTest.AllocFreeAllocTest/2, where GetParam() = (0x7f3e254f5660, NULL, 0x7f3e254f5200, 0x55cdc299f760, 0x55cdc299f048, false) (0 ms)
...
[  FAILED  ] 1 test, listed below:
[  FAILED  ] DevDaxProviderDifferentPoolsTest/umfIpcTest.AllocFreeAllocTest/2, where GetParam() = (0x7f3e254f5660, NULL, 0x7f3e254f5200, 0x55cdc299f760, 0x55cdc299f048, false)

 1 FAILED TEST

Copy link
Contributor

@lplewa lplewa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please next time do not include half done features. Untested/unused features deceases code coverage, but also they can be detected by static analyze tools as an bugs - in short they might cause troubles after the merge. Also nothing is more permanent that temporary solution, but i trust you that it isn't a case here ;)

@vinser52
Copy link
Contributor Author

vinser52 commented Nov 5, 2024

Please enable lines no: 356 and 359:

// TODO fix it - it does not work in case of IPC cache hit
// EXPECT_EQ(stat.allocCount, stat.getCount);
EXPECT_EQ(stat.getCount, stat.putCount);
// TODO fix it - it does not work in case of IPC cache hit
// EXPECT_EQ(stat.openCount, stat.getCount);

It seems that the check in the line no 356:

EXPECT_EQ(stat.allocCount, stat.getCount);

still fails in the umf-provider_devdax_memory test:

[ RUN      ] DevDaxProviderDifferentPoolsTest/umfIpcTest.AllocFreeAllocTest/2
/home/ldorau/work/unified-memory-framework/test/ipcFixtures.hpp:354: Failure
Expected equality of these values:
  stat.allocCount
    Which is: 5
  stat.getCount
    Which is: 1

[  FAILED  ] DevDaxProviderDifferentPoolsTest/umfIpcTest.AllocFreeAllocTest/2, where GetParam() = (0x7f3e254f5660, NULL, 0x7f3e254f5200, 0x55cdc299f760, 0x55cdc299f048, false) (0 ms)
...
[  FAILED  ] 1 test, listed below:
[  FAILED  ] DevDaxProviderDifferentPoolsTest/umfIpcTest.AllocFreeAllocTest/2, where GetParam() = (0x7f3e254f5660, NULL, 0x7f3e254f5200, 0x55cdc299f760, 0x55cdc299f048, false)

 1 FAILED TEST

Done. The issue was caused by the implementation of a scalable pool. When we do the first umfPoolMalloc the scalable pool internally does five 2MB allocations from the memory provider. So checking EXPECT_EQ(stat.allocCount, stat.getCount) does not make sense. I removed it, but EXPECT_EQ(stat.openCount, stat.getCount); should pass. I tested it with the OS provider and it works. I have no system with devdax, so let's see if CI passes.

@ldorau
Copy link
Contributor

ldorau commented Nov 6, 2024

@vinser52 CI builds fail

@vinser52
Copy link
Contributor Author

vinser52 commented Nov 6, 2024

@vinser52 CI builds fail

Yeah, I saw that. I cannot reproduce it locally. My guess is that in case of CI there are some limits exceed with scalable pool because it uses 2MB slabs. So I can try to decrease memory pressure by decreasing number of allocations in corresponding test or disable test with scalable pool for now. I will investigate today.

@vinser52
Copy link
Contributor Author

vinser52 commented Nov 6, 2024

@ldorau I have moved the test with the scalable pool into a separate PR #860. Now this PR should pass CI validation.
I am going to introduce a config API for a scalable pool. It allows us to set the slab size to something less than 2MB and use such a config for the test.

@vinser52
Copy link
Contributor Author

vinser52 commented Nov 7, 2024

@bratpiorka I just rebased the branch because there was a conflict with main. After CI is passed we can merge.

@ldorau
Copy link
Contributor

ldorau commented Nov 12, 2024

@vinser52 rebase please

@vinser52
Copy link
Contributor Author

@vinser52 rebase please

Done

@bratpiorka bratpiorka merged commit d28cb6b into oneapi-src:main Nov 13, 2024
76 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement cache for IPC handles
4 participants