check the impact of different cudaMallocHost flags #16

fwyzard · 2018-02-19T07:55:49Z

According to the documentation, cudaMallocHost flags can affect the way the memory is allocated, pinned, and shared with the GPU.

We should check the impact these flags have on the time spent in memory copies, on different architectures.

The text was updated successfully, but these errors were encountered:

fwyzard · 2018-03-15T22:29:09Z

Using cudaHostAllocWriteCombined for most of the CPU --> GPU buffers gave a negligible or negative impact; for the moment we do not plan to use it.

cmsbot · 2018-03-15T22:29:25Z

A new Issue was created by @fwyzard Andrea Bocci.

can you please review it and eventually sign/assign? Thanks.

cms-bot commands are listed here

…ms (L1Trigger/TrackFindingTMTT) (cms-sw#29381) * create separate PRs for the two L1TK packages * Improved KF efficiency at high eta * Moved MC data files to cms-data * Removed old file * Removed KF HLS to put instead in external library * Ran scram b code-format * Delete KF4ParamsComb.h.bak * Delete KF4ParamsCombIV.bak * Delete KF4ParamsCombV2.bak * Delete KF5ParamsComb.h.bak * Delete KF4ParamsComb.cc.bak * Delete KF4ParamsCombIV.bak * Delete KF4ParamsCombV2.bak * Delete KF5ParamsComb.cc.bak * L1 tk integration tmtt pre5 (#7) * Added CMS code style fixes * Removed old file * Reapplied stub b code-format * All code review changes (#13) * Fix clang errors (#14) * fixed clang error * directory for MC txt files * Fixed clang warnings + minor simplifications (#15) * tweak * tweak * Fixed clang warnings and small simplifications * Fixed clang warnings and small simplifications * All remaining review comments addressed (#16) * Replaced vector size with empty function * Simplified DegradeBend and StubWindowSuggest * Fixed more review comments * More review comments * code reformat * Ran exhaustive clang tidy * Added library to BuildFile.xml (#17) * Deleted TrackFindingTMT/data/README (#18) * Added library to BuildFile.xml (This was already done yesterday. Not sure why it appears again) * README file in data directory deleted * Fix review comments (#20) Co-authored-by: Louise Skinnari <louise.skinnari@cern.ch>

Addressing some comments by Kevin

fwyzard · 2020-08-06T17:24:52Z

As a follow up, here are some quasi-benchmark of the host to device memory transfer throughput for

pageable memory;
pinned memory;
pinned, write-combined memory.

The measurements are taken with CUDA bandwitdhTest and show the average and standard deviation form 4 repeated measurements.

In all cases:

pinned memory is much faster than pageable memory;
write-combined memory does not show any gains, and is sometimes worse for small transfers.

fwyzard changed the title ~~check the impact of different cudaMalloHost flahs~~ check the impact of different cudaMalloHost flags Feb 27, 2018

fwyzard changed the title ~~check the impact of different cudaMalloHost flags~~ check the impact of different cudaMallocHost flags Mar 1, 2018

fwyzard closed this as completed Mar 15, 2018

cmsbot added the pending-assignment label Mar 15, 2018

fwyzard removed the pending-assignment label Mar 27, 2018

fwyzard self-assigned this May 8, 2018

fwyzard added the fixed label May 8, 2018

fwyzard pushed a commit that referenced this issue Jul 12, 2020

Merge pull request #16 from PFCal-dev/hgcal_eol_pulse_update_112X_bis

97f017e

Addressing some comments by Kevin

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

check the impact of different cudaMallocHost flags #16

check the impact of different cudaMallocHost flags #16

fwyzard commented Feb 19, 2018

fwyzard commented Mar 15, 2018

cmsbot commented Mar 15, 2018

fwyzard commented Aug 6, 2020 •

edited

Loading

check the impact of different cudaMallocHost flags #16

check the impact of different cudaMallocHost flags #16

Comments

fwyzard commented Feb 19, 2018

fwyzard commented Mar 15, 2018

cmsbot commented Mar 15, 2018

fwyzard commented Aug 6, 2020 • edited Loading

fwyzard commented Aug 6, 2020 •

edited

Loading