-
Notifications
You must be signed in to change notification settings - Fork 16
Closed
Labels
type: bugSomething isn't workingSomething isn't working
Description
Bug Report
When running VPIC I/O code ($PDC_DIR/share/test/bin/vpicio) it will give an error during the server flush.
To Reproduce
How are you building/running PDC?
- version of PDC: develop branch
- installed PDC using: Installed from source
- machine: Perlmutter machine
- version of Mercury: v2.2.0
What did you use to build PDC (cmake command)?
cmake -DBUILD_MPI_TESTING=ON -DBUILD_SHARED_LIBS=ON -DBUILD_TESTING=ON -DCMAKE_INSTALL_PREFIX=$PDC_DIR -DPDC_ENABLE_MPI=ON -DMERCURY_DIR=$MERCURY_DIR -DCMAKE_C_COMPILER=cc -DMPI_RUN_CMD=srun ../
make -j && make install
What is the running setup you use?
salloc --nodes 1 --qos interactive --time 01:00:00 --constraint cpu --account=mXXXX
srun -N 1 -n 63 --ntasks-per-node=63 -c 2 --cpu_bind=cores --overlap $PDC_DIR/share/test/bin/vpicio 8388608
Logs related to the error
Writing 8388608 number of particles with 63 clients.
==PDC_CLIENT: PDC_DEBUG set to 0!
==PDC_CLIENT[0]: Found 1 PDC Metadata servers, running with 63 PDC clients
==PDC_CLIENT: using ofi+tcp
==PDC_CLIENT[0]: using [./pdc_tmp] as tmp dir, 63 clients per server
Obj create time: 1.33706e-03
Transfer create time: 5.87615e-04
==PDC_SERVER[0]: server cache full 3104.0 / 3072.0 MB, will flush to storage
2025-01-23 10:09:38.271679 ==PDC_SERVER[0.0]: PDC_region_cache_flush_by_pointer server flushed 2016.0 / 3104.0 MB to storage
2025-01-23 10:09:39.031939 ==PDC_SERVER[0.0]: PDC_region_cache_flush_by_pointer server flushed 448.0 / 1088.0 MB to storage
2025-01-23 10:09:39.077472 ==PDC_SERVER[0.0]: PDC_region_cache_flush_by_pointer server flushed 32.0 / 640.0 MB to storage
2025-01-23 10:09:39.121829 ==PDC_SERVER[0.0]: PDC_region_cache_flush_by_pointer server flushed 32.0 / 608.0 MB to storage
2025-01-23 10:09:39.166208 ==PDC_SERVER[0.0]: PDC_region_cache_flush_by_pointer server flushed 32.0 / 576.0 MB to storage
2025-01-23 10:09:39.210859 ==PDC_SERVER[0.0]: PDC_region_cache_flush_by_pointer server flushed 32.0 / 544.0 MB to storage
2025-01-23 10:09:39.255421 ==PDC_SERVER[0.0]: PDC_region_cache_flush_by_pointer server flushed 32.0 / 512.0 MB to storage
2025-01-23 10:09:39.300544 ==PDC_SERVER[0.0]: PDC_region_cache_flush_by_pointer server flushed 32.0 / 480.0 MB to storage
2025-01-23 10:09:39.344318 ==PDC_SERVER[0.0]: PDC_region_cache_flush_by_pointer server flushed 32.0 / 448.0 MB to storage
2025-01-23 10:09:39.432205 ==PDC_SERVER[0.0]: PDC_region_cache_flush_by_pointer server flushed 64.0 / 416.0 MB to storage
2025-01-23 10:09:39.478388 ==PDC_SERVER[0.0]: PDC_region_cache_flush_by_pointer server flushed 32.0 / 352.0 MB to storage
2025-01-23 10:09:39.522361 ==PDC_SERVER[0.0]: PDC_region_cache_flush_by_pointer server flushed 32.0 / 320.0 MB to storage
2025-01-23 10:09:39.567070 ==PDC_SERVER[0.0]: PDC_region_cache_flush_by_pointer server flushed 32.0 / 288.0 MB to storage
2025-01-23 10:09:39.611862 ==PDC_SERVER[0.0]: PDC_region_cache_flush_by_pointer server flushed 32.0 / 256.0 MB to storage
2025-01-23 10:09:39.656561 ==PDC_SERVER[0.0]: PDC_region_cache_flush_by_pointer server flushed 32.0 / 224.0 MB to storage
2025-01-23 10:09:39.700323 ==PDC_SERVER[0.0]: PDC_region_cache_flush_by_pointer server flushed 32.0 / 192.0 MB to storage
2025-01-23 10:09:39.744851 ==PDC_SERVER[0.0]: PDC_region_cache_flush_by_pointer server flushed 32.0 / 160.0 MB to storage
2025-01-23 10:09:39.831672 ==PDC_SERVER[0.0]: PDC_region_cache_flush_by_pointer server flushed 64.0 / 128.0 MB to storage
2025-01-23 10:09:39.926937 ==PDC_SERVER[0.0]: PDC_region_cache_flush_by_pointer server flushed 64.0 / 64.0 MB to storage
==PDC_SERVER[0]: server cache full 3104.0 / 3072.0 MB, will flush to storage
2025-01-23 10:09:41.439012 ==PDC_SERVER[0.0]: PDC_region_cache_flush_by_pointer server flushed 64.0 / 3104.0 MB to storage
2025-01-23 10:09:41.483139 ==PDC_SERVER[0.0]: PDC_region_cache_flush_by_pointer server flushed 32.0 / 3040.0 MB to storage
2025-01-23 10:09:41.528184 ==PDC_SERVER[0.0]: PDC_region_cache_flush_by_pointer server flushed 32.0 / 3008.0 MB to storage
2025-01-23 10:09:41.571941 ==PDC_SERVER[0.0]: PDC_region_cache_flush_by_pointer server flushed 32.0 / 2976.0 MB to storage
2025-01-23 10:09:41.617131 ==PDC_SERVER[0.0]: PDC_region_cache_flush_by_pointer server flushed 32.0 / 2944.0 MB to storage
2025-01-23 10:09:41.660783 ==PDC_SERVER[0.0]: PDC_region_cache_flush_by_pointer server flushed 32.0 / 2912.0 MB to storage
2025-01-23 10:09:41.705616 ==PDC_SERVER[0.0]: PDC_region_cache_flush_by_pointer server flushed 32.0 / 2880.0 MB to storage
2025-01-23 10:09:41.749397 ==PDC_SERVER[0.0]: PDC_region_cache_flush_by_pointer server flushed 32.0 / 2848.0 MB to storage
2025-01-23 10:09:41.794762 ==PDC_SERVER[0.0]: PDC_region_cache_flush_by_pointer server flushed 32.0 / 2816.0 MB to storage
2025-01-23 10:09:42.105849 ==PDC_SERVER[0.0]: PDC_region_cache_flush_by_pointer server flushed 224.0 / 2784.0 MB to storage
2025-01-23 10:09:42.150768 ==PDC_SERVER[0.0]: PDC_region_cache_flush_by_pointer server flushed 32.0 / 2560.0 MB to storage
2025-01-23 10:09:42.372964 ==PDC_SERVER[0.0]: PDC_region_cache_flush_by_pointer server flushed 160.0 / 2528.0 MB to storage
2025-01-23 10:09:42.417639 ==PDC_SERVER[0.0]: PDC_region_cache_flush_by_pointer server flushed 32.0 / 2368.0 MB to storage
2025-01-23 10:09:42.462343 ==PDC_SERVER[0.0]: PDC_region_cache_flush_by_pointer server flushed 32.0 / 2336.0 MB to storage
2025-01-23 10:09:42.507523 ==PDC_SERVER[0.0]: PDC_region_cache_flush_by_pointer server flushed 32.0 / 2304.0 MB to storage
2025-01-23 10:09:42.551799 ==PDC_SERVER[0.0]: PDC_region_cache_flush_by_pointer server flushed 32.0 / 2272.0 MB to storage
2025-01-23 10:09:42.596712 ==PDC_SERVER[0.0]: PDC_region_cache_flush_by_pointer server flushed 32.0 / 2240.0 MB to storage
2025-01-23 10:09:42.640784 ==PDC_SERVER[0.0]: PDC_region_cache_flush_by_pointer server flushed 32.0 / 2208.0 MB to storage
2025-01-23 10:09:45.756183 ==PDC_SERVER[0.0]: PDC_region_cache_flush_by_pointer server flushed 2016.0 / 2176.0 MB to storage
2025-01-23 10:09:45.826167 ==PDC_SERVER[0.0]: PDC_region_cache_flush_by_pointer server flushed 32.0 / 160.0 MB to storage
2025-01-23 10:09:45.870537 ==PDC_SERVER[0.0]: PDC_region_cache_flush_by_pointer server flushed 32.0 / 128.0 MB to storage
2025-01-23 10:09:45.915142 ==PDC_SERVER[0.0]: PDC_region_cache_flush_by_pointer server flushed 32.0 / 96.0 MB to storage
2025-01-23 10:09:45.959604 ==PDC_SERVER[0.0]: PDC_region_cache_flush_by_pointer server flushed 32.0 / 64.0 MB to storage
2025-01-23 10:09:46.003990 ==PDC_SERVER[0.0]: PDC_region_cache_flush_by_pointer server flushed 32.0 / 32.0 MB to storage
==PDC_SERVER[0]: server cache full 3104.0 / 3072.0 MB, will flush to storage
2025-01-23 10:09:48.077479 ==PDC_SERVER[0.0]: PDC_region_cache_flush_by_pointer server flushed 64.0 / 3104.0 MB to storage
2025-01-23 10:09:48.296924 ==PDC_SERVER[0.0]: PDC_region_cache_flush_by_pointer server flushed 160.0 / 3040.0 MB to storage
2025-01-23 10:09:48.339411 ==PDC_SERVER[0.0]: PDC_region_cache_flush_by_pointer server flushed 32.0 / 2880.0 MB to storage
2025-01-23 10:09:48.468643 ==PDC_SERVER[0.0]: PDC_region_cache_flush_by_pointer server flushed 96.0 / 2848.0 MB to storage
2025-01-23 10:09:49.544138 ==PDC_SERVER[0.0]: PDC_region_cache_flush_by_pointer server flushed 800.0 / 2752.0 MB to storage
2025-01-23 10:09:50.485252 ==PDC_SERVER[0.0]: PDC_region_cache_flush_by_pointer server flushed 704.0 / 1952.0 MB to storage
2025-01-23 10:09:50.708890 ==PDC_SERVER[0.0]: PDC_region_cache_flush_by_pointer server flushed 32.0 / 1248.0 MB to storage
2025-01-23 10:09:50.750378 ==PDC_SERVER[0.0]: PDC_region_cache_flush_by_pointer server flushed 32.0 / 1216.0 MB to storage
2025-01-23 10:09:50.795697 ==PDC_SERVER[0.0]: PDC_region_cache_flush_by_pointer server flushed 32.0 / 1184.0 MB to storage
2025-01-23 10:09:50.838518 ==PDC_SERVER[0.0]: PDC_region_cache_flush_by_pointer server flushed 32.0 / 1152.0 MB to storage
2025-01-23 10:09:50.880934 ==PDC_SERVER[0.0]: PDC_region_cache_flush_by_pointer server flushed 32.0 / 1120.0 MB to storage
2025-01-23 10:09:50.923457 ==PDC_SERVER[0.0]: PDC_region_cache_flush_by_pointer server flushed 32.0 / 1088.0 MB to storage
2025-01-23 10:09:50.965871 ==PDC_SERVER[0.0]: PDC_region_cache_flush_by_pointer server flushed 32.0 / 1056.0 MB to storage
2025-01-23 10:09:51.008278 ==PDC_SERVER[0.0]: PDC_region_cache_flush_by_pointer server flushed 32.0 / 1024.0 MB to storage
2025-01-23 10:09:51.092305 ==PDC_SERVER[0.0]: PDC_region_cache_flush_by_pointer server flushed 64.0 / 992.0 MB to storage
2025-01-23 10:09:51.221872 ==PDC_SERVER[0.0]: PDC_region_cache_flush_by_pointer server flushed 96.0 / 928.0 MB to storage
2025-01-23 10:09:51.350290 ==PDC_SERVER[0.0]: PDC_region_cache_flush_by_pointer server flushed 96.0 / 832.0 MB to storage
2025-01-23 10:09:51.478893 ==PDC_SERVER[0.0]: PDC_region_cache_flush_by_pointer server flushed 96.0 / 736.0 MB to storage
2025-01-23 10:09:51.953068 ==PDC_SERVER[0.0]: PDC_region_cache_flush_by_pointer server flushed 352.0 / 640.0 MB to storage
2025-01-23 10:09:51.995254 ==PDC_SERVER[0.0]: PDC_region_cache_flush_by_pointer server flushed 32.0 / 288.0 MB to storage
2025-01-23 10:09:52.038401 ==PDC_SERVER[0.0]: PDC_region_cache_flush_by_pointer server flushed 32.0 / 256.0 MB to storage
2025-01-23 10:09:52.080786 ==PDC_SERVER[0.0]: PDC_region_cache_flush_by_pointer server flushed 32.0 / 224.0 MB to storage
2025-01-23 10:09:52.123392 ==PDC_SERVER[0.0]: PDC_region_cache_flush_by_pointer server flushed 32.0 / 192.0 MB to storage
2025-01-23 10:09:52.165785 ==PDC_SERVER[0.0]: PDC_region_cache_flush_by_pointer server flushed 32.0 / 160.0 MB to storage
2025-01-23 10:09:52.207888 ==PDC_SERVER[0.0]: PDC_region_cache_flush_by_pointer server flushed 32.0 / 128.0 MB to storage
2025-01-23 10:09:52.250200 ==PDC_SERVER[0.0]: PDC_region_cache_flush_by_pointer server flushed 32.0 / 96.0 MB to storage
2025-01-23 10:09:52.310602 ==PDC_SERVER[0.0]: PDC_region_cache_flush_by_pointer server flushed 32.0 / 64.0 MB to storage
2025-01-23 10:09:52.352788 ==PDC_SERVER[0.0]: PDC_region_cache_flush_by_pointer server flushed 32.0 / 32.0 MB to storage
corrupted size vs. prev_size
srun: error: nid200063: task 0: Aborted
Metadata
Metadata
Assignees
Labels
type: bugSomething isn't workingSomething isn't working