Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

check the impact of different cudaMallocHost flags #16

Closed
fwyzard opened this issue Feb 19, 2018 · 3 comments
Closed

check the impact of different cudaMallocHost flags #16

fwyzard opened this issue Feb 19, 2018 · 3 comments
Assignees
Labels

Comments

@fwyzard
Copy link

fwyzard commented Feb 19, 2018

According to the documentation, cudaMallocHost flags can affect the way the memory is allocated, pinned, and shared with the GPU.

We should check the impact these flags have on the time spent in memory copies, on different architectures.

@fwyzard fwyzard changed the title check the impact of different cudaMalloHost flahs check the impact of different cudaMalloHost flags Feb 27, 2018
@fwyzard fwyzard changed the title check the impact of different cudaMalloHost flags check the impact of different cudaMallocHost flags Mar 1, 2018
@fwyzard
Copy link
Author

fwyzard commented Mar 15, 2018

Using cudaHostAllocWriteCombined for most of the CPU --> GPU buffers gave a negligible or negative impact; for the moment we do not plan to use it.

@cmsbot
Copy link

cmsbot commented Mar 15, 2018

A new Issue was created by @fwyzard Andrea Bocci.

can you please review it and eventually sign/assign? Thanks.

cms-bot commands are listed here

@fwyzard fwyzard self-assigned this May 8, 2018
@fwyzard fwyzard added the fixed label May 8, 2018
fwyzard pushed a commit that referenced this issue May 23, 2020
…ms (L1Trigger/TrackFindingTMTT) (cms-sw#29381)

* create separate PRs for the two L1TK packages

* Improved KF efficiency at high eta

* Moved MC data files to cms-data

* Removed old file

* Removed KF HLS to put instead in external library

* Ran scram b code-format

* Delete KF4ParamsComb.h.bak

* Delete KF4ParamsCombIV.bak

* Delete KF4ParamsCombV2.bak

* Delete KF5ParamsComb.h.bak

* Delete KF4ParamsComb.cc.bak

* Delete KF4ParamsCombIV.bak

* Delete KF4ParamsCombV2.bak

* Delete KF5ParamsComb.cc.bak

* L1 tk integration tmtt pre5 (#7)

* Added CMS code style fixes

* Removed old file

* Reapplied stub b code-format

* All code review changes (#13)

* Fix clang errors (#14)

* fixed clang error

* directory for MC txt files

* Fixed clang warnings + minor simplifications (#15)

* tweak

* tweak

* Fixed clang warnings and small simplifications

* Fixed clang warnings and small simplifications

* All remaining review comments addressed (#16)

* Replaced vector size with empty function

* Simplified DegradeBend and StubWindowSuggest

* Fixed more review comments

* More review comments

* code reformat

* Ran exhaustive clang tidy

* Added library to BuildFile.xml (#17)

* Deleted TrackFindingTMT/data/README (#18)

* Added library to BuildFile.xml (This was already done yesterday. Not sure why it appears again)

* README file in data directory deleted

* Fix review comments (#20)

Co-authored-by: Louise Skinnari <louise.skinnari@cern.ch>
fwyzard pushed a commit that referenced this issue Jul 12, 2020
@fwyzard
Copy link
Author

fwyzard commented Aug 6, 2020

As a follow up, here are some quasi-benchmark of the host to device memory transfer throughput for

  • pageable memory;
  • pinned memory;
  • pinned, write-combined memory.

The measurements are taken with CUDA bandwitdhTest and show the average and standard deviation form 4 repeated measurements.

image

image

image

image

image

image

In all cases:

  • pinned memory is much faster than pageable memory;
  • write-combined memory does not show any gains, and is sometimes worse for small transfers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants