Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performace improvement for vpicio #211

Merged
merged 23 commits into from
Aug 30, 2024
Merged

Performace improvement for vpicio #211

merged 23 commits into from
Aug 30, 2024

Conversation

houjun
Copy link
Member

@houjun houjun commented Aug 23, 2024

Related Issues / Pull Requests

#210

Description

  • Added client-side mercury trigger and progress using a thread for "transfer start all", so that the server can pull the data timely and asynchronously.
  • Implement a better idle status detection method to avoid an early background cache flush that may interleave with ongoing client requests and effectively make things synchronous.
  • Added a new vpicio benchmark code.
  • Disabled “PDC_REGION_DYNAMIC” in the tests.
  • A couple of minor fixes of compile issues.
  • Added MPI_Barrier to region transfer start all, to avoid interleaved metadata and data operations that result in long processing time.

What changes are proposed in this pull request?

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality not to work as expected; for instance, examples in this repository must be updated too)
  • This change requires a documentation update

Checklist:

  • My code modifies existing public API, or introduces new public API, and I updated or wrote docstrings
  • I have commented my code
  • My code requires documentation updates, and I have made corresponding changes to the documentation
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

@houjun houjun marked this pull request as ready for review August 27, 2024 02:08
@houjun houjun marked this pull request as draft August 27, 2024 15:48
@houjun houjun marked this pull request as ready for review August 27, 2024 21:22
@jeanbez jeanbez self-assigned this Aug 29, 2024
@jeanbez jeanbez added type: bug Something isn't working type: enhancement New feature or request priority: high High priority labels Aug 29, 2024
@jeanbez jeanbez added this to the v.0.6 milestone Aug 29, 2024
@jeanbez jeanbez merged commit 14480fa into develop Aug 30, 2024
8 checks passed
@houjun houjun mentioned this pull request Aug 30, 2024
9 tasks
@houjun houjun deleted the vpic_opt branch September 10, 2024 18:52
jeanbez added a commit that referenced this pull request Dec 3, 2024
* Performace improvement for vpicio (#211)

* Paritial fix for the region transfer/wait performance issue

* Committing clang-format changes

* Improve the async processing for vpicio_mts_all, also fix a few compile issues

* Committing clang-format changes

* Minor change

* Continue to optimize start_all performance for vpicio, add a few time related convinient functions

* Committing clang-format changes

* Fix hanging issue in CI testing

* Committing clang-format changes

* Disable debug prints

* Revert back for non-all ops

* Better pthread management

* Better pthread management

* Fix timeout issue with CI testing and clang-formatting

* Committing clang-format changes

* Test

* Trigger test

* Committing clang-format changes

* Trigger CI

* Committing clang-format changes

* Trigger CI

* Switch to static partition for vpicio

* Replace vpicio_mts with new implementation

---------

Co-authored-by: github-actions <github-actions[bot]@users.noreply.github.com>

* Multi-thread fix and request merging (#205)

* Update getting_started.rst (#184)
* Removing gres option for ctest (#182)
* Removing gres option from scripts
* Update check for core

---------

Co-authored-by: Hyunju Oh <hjoh16@login15.chn.perlmutter.nersc.gov>
Co-authored-by: Jean Luca Bez <jlbez@lbl.gov>

* Fix an issue with region transfer request
* Committing clang-format changes
* Merge small requests when they are contiguous and 1D, change srun commands in run scripts to detect Perlmutter compute nodes
* Merge only for REGION_LOCAL partition
* Committing clang-format changes
* Fix a bug that causes some tests to fail
* Fix a couple of issues with start/wait all
* Committing clang-format changes
* Add aggregation support for contiguous read operations
* Committing clang-format changes
* Fix compile issue when multithread is enabled
* Committing clang-format changes
* minor change with test code
* Committing clang-format changes
* Remove metadata mutex for multi threading
* Committing clang-format changes
* Fix mutex
* Committing clang-format changes
* Fix an issue when closing an obj
* Sync develop to stable (v.0.5) (#201)
* Update getting_started.rst (#184)
* Removing gres option for ctest (#182)
* Removing gres option for ctest
* Removing gres option from scripts
* Update check for core

---------

Co-authored-by: Hyunju Oh <hjoh16@login15.chn.perlmutter.nersc.gov>
Co-authored-by: Jean Luca Bez <jlbez@lbl.gov>

* enable cache by default (#187)
* Removing PDC macro (#189)
* Removing gres option for ctest
* Removing gres option from scripts
* Update check for core
* Remove PDC macro
* Committing clang-format changes

---------

Co-authored-by: Hyunju Oh <hjoh16@login15.chn.perlmutter.nersc.gov>
Co-authored-by: Jean Luca Bez <jlbez@lbl.gov>
Co-authored-by: github-actions <github-actions[bot]@users.noreply.github.com>

* BDCATS fix (#193)
* Fix issues with bdcats_batch
* Committing clang-format changes

---------

Co-authored-by: github-actions <github-actions[bot]@users.noreply.github.com>

* Update mpi_test.sh (#197)
* Update .gitlab-ci.yml (#195)
* Updates for latest integration with Jacamar and Gitlab tokens in CI
* VPICIO bugfix (#196)
* Fix VPICIO bug
* Add more checks and error out when no server is selected
* Committing clang-format changes
* Add VPICIO and BDCATS to MPI test

---------

Co-authored-by: github-actions <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Jean Luca Bez <jlbez@lbl.gov>

* Fix vpicio_mts (#199)

---------

Co-authored-by: Houjun Tang <htang4@lbl.gov>
Co-authored-by: Hyunju Oh <oh.693@osu.edu>
Co-authored-by: Hyunju Oh <hjoh16@login15.chn.perlmutter.nersc.gov>
Co-authored-by: github-actions <github-actions[bot]@users.noreply.github.com>

* Committing clang-format changes
* Fix rebase issue
* Add timers
* Committing clang-format changes
* Add explict transfer start (all) with MPI communicator
* Committing clang-format changes
* MPI fix
* remove debug msg
* Committing clang-format changes
* Add function comment for doc
* Revert script changes
* Committing clang-format changes
* Revert script changes
* Committing clang-format changes
* Revert script setting

---------

Co-authored-by: Hyunju Oh <oh.693@osu.edu>
Co-authored-by: Hyunju Oh <hjoh16@login15.chn.perlmutter.nersc.gov>
Co-authored-by: Jean Luca Bez <jlbez@lbl.gov>
Co-authored-by: github-actions <github-actions[bot]@users.noreply.github.com>

* Update CMakeLists.txt to bump version number (#202)

* Update CMakeLists.txt to bump version number

* Update clang-format-check.yml

* IDIOMS Update & BULKI v0.1 (#203)

* fix cmake mercury_util not found issue

* update for Julia support

* fix hdf5.h not found for src/tools

* update container config

* add libhdf5-dev for Github Actions

* update CMake for HDF5 in tools

* update logic for finding HDF5

* update

* remove use system hdf5

* delete useless find library

* update findHDF5

* Feature/dart (#11)

Update to avoid fixing compilation issue on src/tools (due to : HDF5 cannot be found)

* Use cc on Perlmutter (#161)

Dr. Tang fixed a compilation issue in NERSC CI where HDF5 cannot be detected even if the cray-parallel-hdf5 module is loaded on Perlmultter.

* update with fixes on tools and llsm example

* add gitignore for llsm

* update gitignore

* Feature/dart (#12)

* fix formatting

* update clangformat10

* update base dockerfile

* Add clang-format10 to docker container. Also fixed clang-format.

Add clang-format10 to docker container. Also fixed clang-format.

* Fix pdc ls (#154)

* pdc import, export, ls compiled successfully

* removed requested files

* formatting issues

* changed install tools

* gets checkpoint files

* grabbing checkpoint files from within sub-directories, minor comments

* Committing clang-format changes

* Committing clang-format changes

* Fix a few issues with pdc_ls

* Committing clang-format changes

---------

Co-authored-by: nickaruwang <nickwan0318@gmail.com>
Co-authored-by: Nick Wang <66816536+nickaruwang@users.noreply.github.com>
Co-authored-by: github-actions <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Jean Luca Bez <jlbez@lbl.gov>

* update documentation

* update document 

update document

* sync branch 

sync branch

* no UUID module is required

* update document and make UUID an optional package

* update docker repo name

* updating docker repo name and make UUID optional

* Complete support for Docker and Github Codespace  (#157)

Include support for Docker and Github Codespace so we can run our dev environment with the support of Docker.

* SQLite and RocksDB support for KVtags (#165)

SQLite and RocksDB support for KVtags

* fix round for tag delete

* update test

* bulki update

* BULKI base type worked

* BULKI all tests done

* new index code

* update

* update new test

* update csv bench

* update

* update script

* adding python scripts for generating large metadata set for LLSM application

* update json schema

* better json validator

* update importer

* update code for non-MPI compatibility

* update llsm converter

* update LLSM data converter

* split files

* update .gitignore

* update

* add timing info

* update

* update tag size

* detect object creation failure

* update

* update object name with date

* update for robustness

* update

* update JMD_DEBUG option

* update output for overall output

* update inttypes.h

* update

* update extractor

* update inttypes.h

* update converter

* update importer information

* Update getting_started.rst (#184)

* Removing gres option for ctest (#182)

* Removing gres option for ctest
* Removing gres option from scripts
* Update check for core

---------

Co-authored-by: Hyunju Oh <hjoh16@login15.chn.perlmutter.nersc.gov>
Co-authored-by: Jean Luca Bez <jlbez@lbl.gov>

* fix issue

* fixed search issues

* update for infix

* update

* index persistence still needs improvement

* update

* enable cache by default (#187)

* Removing PDC macro (#189)

* Removing gres option for ctest
* Removing gres option from scripts
* Update check for core
* Remove PDC macro
* Committing clang-format changes

---------

Co-authored-by: Hyunju Oh <hjoh16@login15.chn.perlmutter.nersc.gov>
Co-authored-by: Jean Luca Bez <jlbez@lbl.gov>
Co-authored-by: github-actions <github-actions[bot]@users.noreply.github.com>

* update

* range query done'

* range query local test passed

* multi-condition in progress

* clean up code

* add comments

* new benchmark

* update

* update range query test

* update cmake:

* update

* update

* update

* update

* someta range query

* someta range query

* someta range query

* fix value serialization

* update

* update double free

* update

* update

* update

* fixed pointer issue

* rb_tree delete fixed, now need to check index persistence

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* BDCATS fix (#193)

* Fix issues with bdcats_batch

* Committing clang-format changes

---------

Co-authored-by: github-actions <github-actions[bot]@users.noreply.github.com>

* update

* clean up code

* update test sh

* IDIOMS persistence DONE

* update

* remove old kvtag benchmarks

* update

* update

* update changes

* dart info

* update

* multi data type for the same key, supported now

* Monitoring changes from feature/dart to develop (#18)

Major changes: 
* IDIOMS -> affix-based query benchmark
* IDIOMS -> Simulation Test
* IDIOMS -> Multi data type supported for the same key
* IDIOMS -> Range Query and Exact Query for Numeric Values
* IDIOMS -> benchmark for numeric values (exact search and range query)
* IDIOMS -> Index Persistence
* BULKI -> A data serialization and deserialization mechanism.

* fix CMakeLists.txt

* update

* update format

* update BULKI interface order

* BULKI API sorted

* add idioms ci test

* Feature/dart (#20)

1. add documentation about BULKI and IDIOMS query conditions
2. add ci test for IDIOMS
3. optimized BULKI to save space on its metadata fields.

* Feature/dart (#22)

update version

* update

* update

* update

* remove unnecessary .bin file

* update

* update

---------

Co-authored-by: Houjun Tang <htang4@lbl.gov>
Co-authored-by: nickaruwang <nickwan0318@gmail.com>
Co-authored-by: Nick Wang <66816536+nickaruwang@users.noreply.github.com>
Co-authored-by: github-actions <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Jean Luca Bez <jlbez@lbl.gov>
Co-authored-by: Hyunju Oh <oh.693@osu.edu>
Co-authored-by: Hyunju Oh <hjoh16@login15.chn.perlmutter.nersc.gov>

* Fix region transfer with object static partitioning (#214)

* Update pdc_region_transfer.c

* Committing clang-format changes

* Update .gitlab-ci.yml

Fix issue with Perlmutter CI libfabric module

---------

Co-authored-by: github-actions <github-actions[bot]@users.noreply.github.com>

* EQSIM benchmark code and fixes (#213)

* Update getting_started.rst (#184)

* Removing gres option for ctest (#182)

* Removing gres option for ctest
* Removing gres option from scripts
* Update check for core

---------

Co-authored-by: Hyunju Oh <hjoh16@login15.chn.perlmutter.nersc.gov>
Co-authored-by: Jean Luca Bez <jlbez@lbl.gov>

* enable cache by default (#187)

* Benchmark code for EQSIM data

* Committing clang-format changes

* Minor adjustments

* Committing clang-format changes

* Updates

* Committing clang-format changes

* Change vpicio to use local server partitioning, add some debug prints

* Committing clang-format changes

* Add metadata query to benchmark code

* Committing clang-format changes

* Add ZFP compression for read and write

* Committing clang-format changes

* Add a option to use more ranks to read data so total data of each rank is less than the 4GB chunk limit

* Committing clang-format changes

* Add a data query code for EQSIM data

* Committing clang-format changes

* Minor adjustments for the HDF5 read code

* Committing clang-format changes

* Fix an issue with periodic data flush, minor changes to benchmark code

* Committing clang-format changes

* fix an issue with 3d read segfault

* Committing clang-format changes

* Fix compile issue

* Update .gitlab-ci.yml

* Update sleep time

* Replace function

* Replace function

* Minor updates and doc changes

* Committing clang-format changes

* Update

---------

Co-authored-by: Hyunju Oh <oh.693@osu.edu>
Co-authored-by: Hyunju Oh <hjoh16@login15.chn.perlmutter.nersc.gov>
Co-authored-by: Jean Luca Bez <jlbez@lbl.gov>
Co-authored-by: github-actions <github-actions[bot]@users.noreply.github.com>

---------

Co-authored-by: Houjun Tang <htang4@lbl.gov>
Co-authored-by: github-actions <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Hyunju Oh <oh.693@osu.edu>
Co-authored-by: Hyunju Oh <hjoh16@login15.chn.perlmutter.nersc.gov>
Co-authored-by: Zhang Wei <zhangwei217245@lbl.gov>
Co-authored-by: nickaruwang <nickwan0318@gmail.com>
Co-authored-by: Nick Wang <66816536+nickaruwang@users.noreply.github.com>
jeanbez pushed a commit that referenced this pull request Dec 3, 2024
* Paritial fix for the region transfer/wait performance issue

* Committing clang-format changes

* Improve the async processing for vpicio_mts_all, also fix a few compile issues

* Committing clang-format changes

* Minor change

* Continue to optimize start_all performance for vpicio, add a few time related convinient functions

* Committing clang-format changes

* Fix hanging issue in CI testing

* Committing clang-format changes

* Disable debug prints

* Revert back for non-all ops

* Better pthread management

* Better pthread management

* Fix timeout issue with CI testing and clang-formatting

* Committing clang-format changes

* Test

* Trigger test

* Committing clang-format changes

* Trigger CI

* Committing clang-format changes

* Trigger CI

* Switch to static partition for vpicio

* Replace vpicio_mts with new implementation

---------

Co-authored-by: github-actions <github-actions[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority: high High priority type: bug Something isn't working type: enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants