Skip to content

Commit

Permalink
EQSIM benchmark code and fixes (#213)
Browse files Browse the repository at this point in the history
* Update getting_started.rst (#184)

* Removing gres option for ctest (#182)

* Removing gres option for ctest
* Removing gres option from scripts
* Update check for core

---------

Co-authored-by: Hyunju Oh <hjoh16@login15.chn.perlmutter.nersc.gov>
Co-authored-by: Jean Luca Bez <jlbez@lbl.gov>

* enable cache by default (#187)

* Benchmark code for EQSIM data

* Committing clang-format changes

* Minor adjustments

* Committing clang-format changes

* Updates

* Committing clang-format changes

* Change vpicio to use local server partitioning, add some debug prints

* Committing clang-format changes

* Add metadata query to benchmark code

* Committing clang-format changes

* Add ZFP compression for read and write

* Committing clang-format changes

* Add a option to use more ranks to read data so total data of each rank is less than the 4GB chunk limit

* Committing clang-format changes

* Add a data query code for EQSIM data

* Committing clang-format changes

* Minor adjustments for the HDF5 read code

* Committing clang-format changes

* Fix an issue with periodic data flush, minor changes to benchmark code

* Committing clang-format changes

* fix an issue with 3d read segfault

* Committing clang-format changes

* Fix compile issue

* Update .gitlab-ci.yml

* Update sleep time

* Replace function

* Replace function

* Minor updates and doc changes

* Committing clang-format changes

* Update

---------

Co-authored-by: Hyunju Oh <oh.693@osu.edu>
Co-authored-by: Hyunju Oh <hjoh16@login15.chn.perlmutter.nersc.gov>
Co-authored-by: Jean Luca Bez <jlbez@lbl.gov>
Co-authored-by: github-actions <github-actions[bot]@users.noreply.github.com>
  • Loading branch information
5 people authored Dec 2, 2024
1 parent 25db7c1 commit 13fb9af
Show file tree
Hide file tree
Showing 16 changed files with 1,442 additions and 66 deletions.
14 changes: 11 additions & 3 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -388,10 +388,10 @@ endif()
option(PDC_SERVER_CACHE "Enable Server Caching." ON)
if(PDC_SERVER_CACHE)
set(PDC_SERVER_CACHE 1)
set(PDC_SERVER_CACHE_MAX_GB "3" CACHE STRING "Max GB for server cache")
set(PDC_SERVER_CACHE_FLUSH_TIME "30" CACHE STRING "Flush time for server cache")
set(PDC_SERVER_CACHE_MAX_GB "32" CACHE STRING "Max GB for server cache")
set(PDC_SERVER_IDLE_CACHE_FLUSH_TIME "3" CACHE STRING "Idle time to initiate flush from server cache")

add_compile_definitions(PDC_SERVER_CACHE_MAX_GB=${PDC_SERVER_CACHE_MAX_GB} PDC_SERVER_CACHE_FLUSH_TIME=${PDC_SERVER_CACHE_FLUSH_TIME})
add_compile_definitions(PDC_SERVER_CACHE_MAX_GB=${PDC_SERVER_CACHE_MAX_GB} PDC_SERVER_IDLE_CACHE_FLUSH_TIME=${PDC_SERVER_IDLE_CACHE_FLUSH_TIME})
endif()


Expand Down Expand Up @@ -487,6 +487,14 @@ if(PDC_ENABLE_SQLITE3)
set(ENABLE_SQLITE3 1)
endif()

#-----------------------------------------------------------------------------
# ZFP option
#-----------------------------------------------------------------------------
option(PDC_ENABLE_ZFP "Enable ZFP." OFF)
if(PDC_ENABLE_ZFP)
set(ENABLE_ZFP 1)
endif()

# Check availability of symbols
#-----------------------------------------------------------------------------
check_symbol_exists(malloc_usable_size "malloc.h" HAVE_MALLOC_USABLE_SIZE)
Expand Down
17 changes: 16 additions & 1 deletion docs/source/developer-notes.rst
Original file line number Diff line number Diff line change
Expand Up @@ -333,6 +333,11 @@ Server Nonblocking Control

By design, the region transfer request start does not guarantee the finish of data transfer or server I/O. In fact, this function should return to the application as soon as possible. Data transfer and server I/O can occur in the background so that client applications can take advantage of overlapping timings between application computations and PDC data management.

Server Data Cache
---------------------------------------------

PDC supports server-side write data cache and is enabled in the CMake option ``PDC_SERVER_CACHE`` by default. Each time the server receives a region writerequest, it will cache the data in the server's memory without writing it to the file system. The server monitors both the total amount of cached data and how long it has not received any I/O requests to determine when to flush the data from cache to the file system. Two additional CMake options ``PDC_SERVER_CACHE_MAX_GB`` and ``PDC_SERVER_IDLE_CACHE_FLUSH_TIME`` can be set to affect the cache flush behavior. When the cached data size reaches the limit or the server is idle longer than the idle time, the flush operation is triggered. With the idle time trigger, when a new I/O request is received during the flush, PDC will stop flushng the next region and reset the timer to avoid interfering with the client's I/O. Setting ``export PDC_SERVER_CACHE_NO_FLUSH=0`` can disable the flush operation and keep the data in cache.

Server Region Transfer Request Start
---------------------------------------------

Expand All @@ -343,6 +348,11 @@ Then, ``PDC_commit_request`` is called for request registration. This operation

Finally, the server RPC returns a finished code to the client so that the client can return to the application immediately.

Server Region Transfer Data Sieving
---------------------------------------------
When reading a 2D/3D region, PDC server uses data sieving if a subset of a storage region is requested, which would improve the read performance. The entire region is read as a contiguous chunk and the request subset will be extracted before sending the data to the client. Setting ``export PDC_DATA_SIEVING=0`` before running the server will disable this feature.


Server Region Transfer Request Wait
---------------------------------------------

Expand Down Expand Up @@ -373,6 +383,11 @@ However, when a new region is written to an object, it is necessary to scan all

I/O by region will store repeated bytes when write requests contain overlapping parts. In addition, the region update mechanism generates extra I/O operations. This is one of its disadvantages. Optimization for region search (as R trees) in the future can relieve this problem.

Storage Compression (Prototype)
---------------------------------------------

PDC has partial support for storing the compressed data for each storage regions with the ZFP compression library. Currently the compression is hard-coded to the ZFP accuracy mode.

+++++++++++++++++++++++++++++++++++++++++++++
Contributing to PDC project
+++++++++++++++++++++++++++++++++++++++++++++
Expand Down Expand Up @@ -560,4 +575,4 @@ But if you need to debug the server, you can prepend ``srun`` with ``ddt --conne
rm -rf ./pdc_tmp # optional if you need to clean up the PDC tmp directory
ddt --connect srun -N 1 -n 4 -c 2 --mem=25600 --cpu_bind=cores ./bin/pdc_server.exe &
We recommend to use 1 node when debugging PDC, but if memory is not sufficient, you can use more nodes.
We recommend to use 1 node when debugging PDC, but if memory is not sufficient, you can use more nodes.
16 changes: 2 additions & 14 deletions src/api/pdc_region/pdc_region_transfer.c
Original file line number Diff line number Diff line change
Expand Up @@ -1789,25 +1789,13 @@ release_region_buffer(char *buf, uint64_t *obj_dims, int local_ndim, uint64_t *l
if (local_ndim == 2) {
if (access_type == PDC_READ) {
ptr = new_buf;
for (i = 0; i < local_size[0]; ++i) {
memcpy(buf + ((local_offset[0] + i) * obj_dims[1] + local_offset[1]) * unit, ptr,
local_size[1] * unit);
ptr += local_size[1] * unit;
}
memcpy(buf, ptr, local_size[0] * local_size[1] * unit);
}
}
else if (local_ndim == 3) {
if (access_type == PDC_READ) {
ptr = new_buf;
for (i = 0; i < local_size[0]; ++i) {
for (j = 0; j < local_size[1]; ++j) {
memcpy(buf + ((local_offset[0] + i) * obj_dims[1] * obj_dims[2] +
(local_offset[1] + j) * obj_dims[2] + local_offset[2]) *
unit,
ptr, local_size[2] * unit);
ptr += local_size[2] * unit;
}
}
memcpy(buf, ptr, local_size[0] * local_size[1] * local_size[2] * unit);
}
}
if (bulk_buf_ref) {
Expand Down
1 change: 1 addition & 0 deletions src/commons/utils/pdc_timing.c
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
#include "pdc_timing.h"
#include "assert.h"
#include "mpi.h"

#ifdef PDC_TIMING
static double pdc_base_time;
Expand Down
10 changes: 10 additions & 0 deletions src/server/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,13 @@ if(PDC_ENABLE_SQLITE3)
find_package(SQLite3 3.31.0 REQUIRED)
endif()

if(PDC_ENABLE_ZFP)
add_definitions(-DENABLE_ZFP=1)
find_package(ZFP REQUIRED)
# find_path(ZFP_INCLUDE_DIR include/zfp.h)
endif()


include_directories(
${PDC_COMMON_INCLUDE_DIRS}
${PDC_INCLUDES_BUILD_TIME}
Expand All @@ -40,6 +47,7 @@ include_directories(
${MERCURY_INCLUDE_DIR}
${FASTBIT_INCLUDE_DIR}
${ROCKSDB_INCLUDE_DIR}
${ZFP_INCLUDE_DIRS}
)

add_definitions( -DIS_PDC_SERVER=1 )
Expand Down Expand Up @@ -70,6 +78,8 @@ add_library(pdc_server_lib
if(PDC_ENABLE_FASTBIT)
message(STATUS "Enabled fastbit")
target_link_libraries(pdc_server_lib ${MERCURY_LIBRARY} ${PDC_COMMONS_LIBRARIES} -lm -ldl ${PDC_EXT_LIB_DEPENDENCIES} ${FASTBIT_LIBRARY}/libfastbit.so)
elseif(PDC_ENABLE_ZFP)
target_link_libraries(pdc_server_lib ${MERCURY_LIBRARY} ${PDC_COMMONS_LIBRARIES} -lm -ldl ${PDC_EXT_LIB_DEPENDENCIES} zfp::zfp)
elseif(PDC_ENABLE_ROCKSDB)
if(PDC_ENABLE_SQLITE3)
target_link_libraries(pdc_server_lib ${MERCURY_LIBRARY} ${PDC_COMMONS_LIBRARIES} -lm -ldl ${PDC_EXT_LIB_DEPENDENCIES} ${ROCKSDB_LIBRARY} SQLite::SQLite3)
Expand Down
27 changes: 15 additions & 12 deletions src/server/pdc_server.c
Original file line number Diff line number Diff line change
Expand Up @@ -96,6 +96,7 @@ char ** all_addr_strings_g = NULL;
int is_hash_table_init_g = 0;
int lustre_stripe_size_mb_g = 16;
int lustre_total_ost_g = 0;
int pdc_disable_checkpoint_g = 0;

hg_id_t get_remote_metadata_register_id_g;
hg_id_t buf_map_server_register_id_g;
Expand Down Expand Up @@ -719,15 +720,9 @@ PDC_Server_set_close(void)
#ifdef PDC_TIMING
start = MPI_Wtime();
#endif
char *tmp_env_char = getenv("PDC_DISABLE_CHECKPOINT");
if (tmp_env_char != NULL && strcmp(tmp_env_char, "TRUE") == 0) {
if (pdc_server_rank_g == 0) {
printf("==PDC_SERVER[0]: checkpoint disabled!\n");
}
}
else {
if (pdc_disable_checkpoint_g == 0)
PDC_Server_checkpoint();
}

#ifdef PDC_TIMING
pdc_server_timings->PDCserver_checkpoint += MPI_Wtime() - start;
#endif
Expand Down Expand Up @@ -1204,7 +1199,8 @@ PDC_Server_recv_shm_cb(const struct hg_cb_info *callback_info)
hg_return_t
PDC_Server_checkpoint_cb()
{
PDC_Server_checkpoint();
if (pdc_disable_checkpoint_g == 0)
PDC_Server_checkpoint();

return HG_SUCCESS;
}
Expand Down Expand Up @@ -1862,7 +1858,7 @@ PDC_Server_loop(hg_context_t *hg_context)
#ifdef PDC_ENABLE_CHECKPOINT
checkpoint_interval++;
// Avoid calling clock() every operation
if (checkpoint_interval % PDC_CHECKPOINT_CHK_OP_INTERVAL == 0) {
if (pdc_disable_checkpoint_g == 0 && checkpoint_interval % PDC_CHECKPOINT_CHK_OP_INTERVAL == 0) {
cur_time = clock();
double elapsed_time = ((double)(cur_time - last_checkpoint_time)) / CLOCKS_PER_SEC;
/* fprintf(stderr, "PDC_SERVER: loop elapsed time %.2f\n", elapsed_time); */
Expand Down Expand Up @@ -2117,7 +2113,7 @@ PDC_Server_get_env()
data_sieving_g = atoi(tmp_env_char);
}
else {
data_sieving_g = 0;
data_sieving_g = 1;
}

// Get number of OST per file
Expand Down Expand Up @@ -2158,7 +2154,7 @@ PDC_Server_get_env()

tmp_env_char = getenv("PDC_GEN_HIST");
if (tmp_env_char != NULL)
gen_hist_g = 1;
gen_hist_g = atoi(tmp_env_char);

tmp_env_char = getenv("PDC_GEN_FASTBIT_IDX");
if (tmp_env_char != NULL)
Expand All @@ -2184,6 +2180,13 @@ PDC_Server_get_env()
printf("==PDC_SERVER[%d]: using SQLite3 for kvtag\n", pdc_server_rank_g);
}

tmp_env_char = getenv("PDC_DISABLE_CHECKPOINT");
if (tmp_env_char != NULL && strcmp(tmp_env_char, "TRUE") == 0) {
pdc_disable_checkpoint_g = 1;
if (pdc_server_rank_g == 0)
printf("==PDC_SERVER[0]: checkpoint disabled!\n");
}

if (pdc_server_rank_g == 0) {
printf("==PDC_SERVER[%d]: using [%s] as tmp dir, %d OSTs, %d OSTs per data file, %d%% to BB\n",
pdc_server_rank_g, pdc_server_tmp_dir_g, lustre_total_ost_g, pdc_nost_per_file_g,
Expand Down
Loading

0 comments on commit 13fb9af

Please sign in to comment.