Skip to content

Conversation

houjun
Copy link
Member

@houjun houjun commented Feb 5, 2025

Related Issues / Pull Requests

#223

Description

Add mutex protection to pointer operation to fix the race condition that may cause memory error when larger than cache max size data is transferred to the server.

What changes are proposed in this pull request?

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality not to work as expected; for instance, examples in this repository must be updated too)
  • This change requires a documentation update

Checklist:

  • My code modifies existing public API, or introduces new public API, and I updated or wrote docstrings
  • I have commented my code
  • My code requires documentation updates, and I have made corresponding changes to the documentation
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

@houjun houjun requested a review from a team as a code owner February 5, 2025 03:45
@jeanbez jeanbez added the type: bug Something isn't working label Feb 5, 2025
@houjun houjun merged commit 4b335f3 into develop Feb 5, 2025
14 checks passed
jeanbez added a commit that referenced this pull request Apr 1, 2025
* Fix cache flush (#226)

* Fix a thread race issue that may cause memory error when larger than cache max size data is transferred

* Add a test that writes more data than server cache size

* Fix CI run command

* Update nersc.yml (#238)

* Since PDCinit returns a uint64_t, 0 should indicate failure (#233)

Co-authored-by: Jean Luca Bez <jlbez@lbl.gov>

* Check the return value of `PDC_Client_init` in `PDC_init` (#230)

* Check that return value of PDC_Client_init in PDC_init

* Change return to 0

This will make is simpler when merging #233 (comment)

---------

Co-authored-by: Jean Luca Bez <jlbez@lbl.gov>

* Change `printf` to PDC logger (#232)

* Changed all printf to use pdc logger

Also removed large blocks of comments and chanegd the pdc logger
to print the file name, function, and line number.

* Change typo of LOG_INFO to LOG_ERROR

* Correct grammar from fail -> failed

* update grammer succesfully close -> successfully closed

* switch type of LOG_INFO to LOG_ERROR

* Add logging docs and fix some LOG_INFO->LOG_JUST_PRINT

* update clang formatting

---------

Co-authored-by: Jean Luca Bez <jlbez@lbl.gov>

* Malloc correct size for pdc_obj_metadata_pkg (#237)

Co-authored-by: Jean Luca Bez <jlbez@lbl.gov>

* PDCregion_transfer_create validate client buf, local region, and remote regions (#236)

Co-authored-by: Jean Luca Bez <jlbez@lbl.gov>

---------

Co-authored-by: Houjun Tang <htang4@lbl.gov>
Co-authored-by: Noah Lewis <47840925+TheAssembler1@users.noreply.github.com>
jeanbez added a commit that referenced this pull request Apr 15, 2025
* Fix cache flush (#226)

* Fix a thread race issue that may cause memory error when larger than cache max size data is transferred

* Add a test that writes more data than server cache size

* Fix CI run command

* Fix restart issue

* Update nersc.yml (#238)

* Since PDCinit returns a uint64_t, 0 should indicate failure (#233)

Co-authored-by: Jean Luca Bez <jlbez@lbl.gov>

* Check the return value of `PDC_Client_init` in `PDC_init` (#230)

* Check that return value of PDC_Client_init in PDC_init

* Change return to 0

This will make is simpler when merging #233 (comment)

---------

Co-authored-by: Jean Luca Bez <jlbez@lbl.gov>

* Change `printf` to PDC logger (#232)

* Changed all printf to use pdc logger

Also removed large blocks of comments and chanegd the pdc logger
to print the file name, function, and line number.

* Change typo of LOG_INFO to LOG_ERROR

* Correct grammar from fail -> failed

* update grammer succesfully close -> successfully closed

* switch type of LOG_INFO to LOG_ERROR

* Add logging docs and fix some LOG_INFO->LOG_JUST_PRINT

* update clang formatting

---------

Co-authored-by: Jean Luca Bez <jlbez@lbl.gov>

* Malloc correct size for pdc_obj_metadata_pkg (#237)

Co-authored-by: Jean Luca Bez <jlbez@lbl.gov>

* PDCregion_transfer_create validate client buf, local region, and remote regions (#236)

Co-authored-by: Jean Luca Bez <jlbez@lbl.gov>

---------

Co-authored-by: Jean Luca Bez <jlbez@lbl.gov>
Co-authored-by: Noah Lewis <47840925+TheAssembler1@users.noreply.github.com>
jeanbez added a commit that referenced this pull request Apr 21, 2025
* Fix cache flush (#226)

* Fix a thread race issue that may cause memory error when larger than cache max size data is transferred

* Add a test that writes more data than server cache size

* Fix CI run command

* checkpoint

* Switch variables such as count_0, start_0, and size0... to arrays

This will reduce code duplication, reduce bugs, and make it easier
to switch to support n-dimnesional data.

* clang format

* checkpoint

* created better function names and documentation

* remove

* Committing clang-format changes

* clang format

* remove file

* change for use helper function

* fix bug with incorrect helper function call

---------

Co-authored-by: Houjun Tang <htang4@lbl.gov>
Co-authored-by: github-actions <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Jean Luca Bez <jlbez@lbl.gov>
jeanbez added a commit that referenced this pull request May 12, 2025
* Fix cache flush (#226)

* Fix a thread race issue that may cause memory error when larger than cache max size data is transferred

* Add a test that writes more data than server cache size

* Fix CI run command

* Grouped commons tests into folders

This commit also changes the src/tests/CmakeLists.txt to build tests
within their new folders

* add deprecated folder remove buf_map folder

* Update run_multiple_mpi_test.sh

* Update dependencies-macos.sh

* Update dependencies-macos.sh

---------

Co-authored-by: Houjun Tang <htang4@lbl.gov>
Co-authored-by: Jean Luca Bez <jlbez@lbl.gov>
Co-authored-by: Jean Luca Bez <jeanlucabez@gmail.com>
jeanbez added a commit that referenced this pull request Jul 21, 2025
* Add pdc_logger.h to installation (#245)

* sync with gitlab (#248)

* Fix restart issue (#228)

* Fix cache flush (#226)

* Fix a thread race issue that may cause memory error when larger than cache max size data is transferred

* Add a test that writes more data than server cache size

* Fix CI run command

* Fix restart issue

* Update nersc.yml (#238)

* Since PDCinit returns a uint64_t, 0 should indicate failure (#233)

Co-authored-by: Jean Luca Bez <jlbez@lbl.gov>

* Check the return value of `PDC_Client_init` in `PDC_init` (#230)

* Check that return value of PDC_Client_init in PDC_init

* Change return to 0

This will make is simpler when merging #233 (comment)

---------

Co-authored-by: Jean Luca Bez <jlbez@lbl.gov>

* Change `printf` to PDC logger (#232)

* Changed all printf to use pdc logger

Also removed large blocks of comments and chanegd the pdc logger
to print the file name, function, and line number.

* Change typo of LOG_INFO to LOG_ERROR

* Correct grammar from fail -> failed

* update grammer succesfully close -> successfully closed

* switch type of LOG_INFO to LOG_ERROR

* Add logging docs and fix some LOG_INFO->LOG_JUST_PRINT

* update clang formatting

---------

Co-authored-by: Jean Luca Bez <jlbez@lbl.gov>

* Malloc correct size for pdc_obj_metadata_pkg (#237)

Co-authored-by: Jean Luca Bez <jlbez@lbl.gov>

* PDCregion_transfer_create validate client buf, local region, and remote regions (#236)

Co-authored-by: Jean Luca Bez <jlbez@lbl.gov>

---------

Co-authored-by: Jean Luca Bez <jlbez@lbl.gov>
Co-authored-by: Noah Lewis <47840925+TheAssembler1@users.noreply.github.com>

* Fix return metadata dtype (#246)

Co-authored-by: Jean Luca Bez <jlbez@lbl.gov>

* Region info transfer struct type and helper functions (#247)

* Fix cache flush (#226)

* Fix a thread race issue that may cause memory error when larger than cache max size data is transferred

* Add a test that writes more data than server cache size

* Fix CI run command

* checkpoint

* Switch variables such as count_0, start_0, and size0... to arrays

This will reduce code duplication, reduce bugs, and make it easier
to switch to support n-dimnesional data.

* clang format

* checkpoint

* created better function names and documentation

* remove

* Committing clang-format changes

* clang format

* remove file

* change for use helper function

* fix bug with incorrect helper function call

---------

Co-authored-by: Houjun Tang <htang4@lbl.gov>
Co-authored-by: github-actions <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Jean Luca Bez <jlbez@lbl.gov>

* Fix issues with PDC tools (#249)

* Fix issues with PDC tools

* Correct LOG_ERROR to LOG_INFO

* Committing clang-format changes

---------

Co-authored-by: github-actions <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Jean Luca Bez <jlbez@lbl.gov>

* Fix printing in `PGOTO_ERROR` and `PGOTO_ERROR_VOID` (#250)

Print new line by default in `PGOTO_ERROR` and `PGOTO_ERROR_VOID`

Co-authored-by: Jean Luca Bez <jlbez@lbl.gov>

* Group Tests Into Folders (#252)

* Fix cache flush (#226)

* Fix a thread race issue that may cause memory error when larger than cache max size data is transferred

* Add a test that writes more data than server cache size

* Fix CI run command

* Grouped commons tests into folders

This commit also changes the src/tests/CmakeLists.txt to build tests
within their new folders

* add deprecated folder remove buf_map folder

* Update run_multiple_mpi_test.sh

* Update dependencies-macos.sh

* Update dependencies-macos.sh

---------

Co-authored-by: Houjun Tang <htang4@lbl.gov>
Co-authored-by: Jean Luca Bez <jlbez@lbl.gov>
Co-authored-by: Jean Luca Bez <jeanlucabez@gmail.com>

* Return the same obj_id if the obj is just created or already opened (#254)

* Return the same obj_id if the obj is just created or already opened

* Committing clang-format changes

* Update doc

* Update dependencies-macos.sh

---------

Co-authored-by: github-actions <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Jean Luca Bez <jlbez@lbl.gov>

* add option to choose interface (#255)

* add option to connect to a given network interface
* Committing clang-format changes
* fix conflict
* include header
* enable output on failure

---------

Co-authored-by: github-actions <github-actions[bot]@users.noreply.github.com>

* Fix multithreading compilation (#259)

* fix multhreading compilation

* Committing clang-format changes

---------

Co-authored-by: github-actions <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Jean Luca Bez <jlbez@lbl.gov>

* Fix segmentation fault of calling `PDCobj_create_mpi` twice with duplicate object name (#262)

* Validate sucess of PDC_obj_create and PDC_find_id in PDCobj_create_mpi

* Committing clang-format changes

---------

Co-authored-by: github-actions <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Jean Luca Bez <jlbez@lbl.gov>

* Use `PDC_malloc`, `PDC_free`, `PDC_calloc`, and `PDC_realloc` (#260)

* checkpoint

* replace free with PDC_free and calloc with PDC_calloc

* Committing clang-format changes

* fix more mallocs to PDC_malloc

* more PDC_free fixes

* Committing clang-format changes

* Update ubuntu-cache.yml

* remove eno1

* fix realloc

* Committing clang-format changes

* Update ubuntu-no-cache.yaml

* Fix several bugs with error checking with object dim allocation

* Committing clang-format changes

* fix bug

* Committing clang-format changes

* Update ubuntu-no-cache.yaml

* Update ubuntu-cache.yml

* Set default value of ndim to 1 in PDCprop_create when using PDC_OBJ_CREATE

* Committing clang-format changes

* Malloc when defaulting to ndim size 1.
Only free hostname when we PDC_malloc the memory
because pointers returned by getenv are not malloced
and could point to static memory.

* Committing clang-format changes

* Update README.md

minor change to trigger the pipeline

---------

Co-authored-by: github-actions <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Jean Luca Bez <jlbez@lbl.gov>
Co-authored-by: Jean Luca Bez <jeanlucabez@gmail.com>

* Fix Sphinx documentation errors and warnings (#265)

* Fix all sphinx warnings and errors. Removed repeat declarations of functions.

* Committing clang-format changes

* remove def of EXTENSION_MAPPING

* gitignore for docs and fix c structs

* Committing clang-format changes

---------

Co-authored-by: github-actions <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Jean Luca Bez <jlbez@lbl.gov>

* Replace `docs/README.md` -> steps to build docs (#268)

* Replace docs/README.md -> steps to build docs

* Update README.md

---------

Co-authored-by: Jean Luca Bez <jlbez@lbl.gov>

* Use `FUNC_ENTER` and `FUNC_LEAVE` (#270)

* use func enter and func leave in all functions

* Committing clang-format changes

* fix infinite recursion between memory managment, hash table, and per function timing

* Committing clang-format changes

* add profiling to CI

---------

Co-authored-by: github-actions <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Jean Luca Bez <jlbez@lbl.gov>

* New test macros and code cleanup (#261)

* checkpoint

* Committing clang-format changes

* some tests

* Committing clang-format changes

* checkpoint

* open_obj uses new test macros

* Committing clang-format changes

* read_obj uses TASSERT

* read_obj uses TASSERT

* Committing clang-format changes

* cont_del and cont_getid use test macros

* convert more tests to use macros

* convert more tests to macros

* Committing clang-format changes

* Committing clang-format changes

* clang format

* use test helper in cont_info and cont_add_del

* more tests use macros

* Committing clang-format changes

* use tests macros in more tests

* use PGOTO* macros instead of goto

* clang format

* more log fixes

* logging cleanup and more usage of test macros

* Committing clang-format changes

* clang format and fix CMakeLists for tests

* use tests macros in transfer overlap 2D/3D

* use TASSERT in more tests

* Committing clang-format changes

* use test asserts

* all tests on the CI use TASSERT

* fix printing and newlines in tests

* print time, file name, function name, and line number in debug print

---------

Co-authored-by: github-actions <github-actions[bot]@users.noreply.github.com>

* Tests logging typo fix (#273)

* Fixed logging typos

* Committing clang-format changes

---------

Co-authored-by: github-actions <github-actions[bot]@users.noreply.github.com>

* Rename pdc_server.exe to pdc_server for consistency (#275)

Co-authored-by: Jean Luca Bez <jlbez@lbl.gov>

* Update vpicio_mts.c (#276)

Co-authored-by: Jean Luca Bez <jlbez@lbl.gov>

* Client Propogate `HG_Finalize` error on `PDCclose` (#263)

* all but 4 close errors are fixed

* Committing clang-format changes

* client side HG_Finalize now passes on serial tests

* Committing clang-format changes

* cleanup

* Committing clang-format changes

* Update pdc_region_transfer.c

* free bulk handles during region transfer close

---------

Co-authored-by: github-actions <github-actions[bot]@users.noreply.github.com>

* Standardize ID Lookup Null Checks and Error Handling (#281)

* cleanup finding id's

* Committing clang-format changes

---------

Co-authored-by: github-actions <github-actions[bot]@users.noreply.github.com>

* Obj open fix (#279)

* Fix seg fault for PDCobj_open on non-existent object

* Committing clang-format changes

* Remove log from NULL check

* Log message when object metadata isn't found.

---------

Co-authored-by: github-actions <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Jean Luca Bez <jlbez@lbl.gov>

* Fix multithread (#274)

* move hash table mutex to hashtable source filse

* Committing clang-format changes

* add multithread compile test

---------

Co-authored-by: github-actions <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Jean Luca Bez <jlbez@lbl.gov>

* Fix seg fault when mercury initialization fails (#283)

* check for NULL paramterse in hash table

* Committing clang-format changes

---------

Co-authored-by: github-actions <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Jean Luca Bez <jlbez@lbl.gov>

---------

Co-authored-by: Noah Lewis <47840925+TheAssembler1@users.noreply.github.com>
Co-authored-by: Houjun Tang <htang4@lbl.gov>
Co-authored-by: github-actions <github-actions[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants