Nice approach on DL dev scenario #12

pokerfaceSad · 2023-11-20T06:45:45Z

I think nvshare a nice approach on DL develop scenario!

Has there been any testing on the overhead brought by UVM swap in training scenarios?

BTW, I have posted a solution to address the issue of long GPU idle times in dev scenarios by dynamically mounting the GPU.
https://github.com/pokerfaceSad/GPUMounter

grgalex · 2023-11-20T10:19:37Z

@pokerfaceSad Hi, thanks for the feedback!

For the overhead of UVM in and of itself (i.e., when an app runs alone on the system), you can take a look at chapter 11.3 of my diploma thesis [1].

The overhead of the UVM swapping when the GPU lock changes hands, which happens every TQ seconds assuming > 1 apps want to run GPU work, it depends on the PCIe bandwidth and the working set size of the application.

Simple Example

Let's assume a GPU has 32 GB/s PCIe bandwidth and the application that just got the GPU lock uses 32 GB of data, then the UVM swapping overhead is around (2 * 32) / 32 = 2 sec. We multiply the 32 GB of data by a factor of two to account for the swap-out traffic (data of the previous app) in addition to the swap-in traffic (data of the current app).

You can measure the actual PCIe bandwidth of a GPU by using the bandwidthTest CUDA sample [2].

[1] https://dspace.lib.ntua.gr/xmlui/handle/123456789/54290
[2] https://github.com/NVIDIA/cuda-samples/tree/master/Samples/1_Utilities/bandwidthTest

pokerfaceSad · 2023-11-21T07:31:16Z

Thanks for your detailed reply!

Any ideas about GPU migration? I see it in your Future Improvements.

It seems that it is possible to achieve it by UVM , according to https://dl.acm.org/doi/10.1145/3357223.3362714.
Do you have any ideas?

grgalex · 2023-11-21T15:31:17Z

I haven't looked at migration thoroughly yet.

(Though a prerequisite for that is nvshare support for multiple GPUs per node, which is relatively simple and not implemented yet.)

Are you perhaps interested in taking a look?

If you want to talk about something in private, you can send me an e-mail :)

pokerfaceSad · 2023-11-24T06:27:19Z

Sorry for the late reply.

I have sent you an email:)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Nice approach on DL dev scenario #12

Nice approach on DL dev scenario #12

pokerfaceSad commented Nov 20, 2023

grgalex commented Nov 20, 2023

pokerfaceSad commented Nov 21, 2023

grgalex commented Nov 21, 2023 •

edited

Loading

pokerfaceSad commented Nov 24, 2023

Nice approach on DL dev scenario #12

Nice approach on DL dev scenario #12

Comments

pokerfaceSad commented Nov 20, 2023

grgalex commented Nov 20, 2023

Simple Example

pokerfaceSad commented Nov 21, 2023

grgalex commented Nov 21, 2023 • edited Loading

pokerfaceSad commented Nov 24, 2023

grgalex commented Nov 21, 2023 •

edited

Loading