Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add overload for runtime::init/distr_queue ctor that accepts a device selector #113

Merged
merged 3 commits into from
May 17, 2022

Conversation

almightyvats
Copy link
Contributor

This feature adds the option to pass a device selector to the distr_queue, according to which devices get selected. Previously, a sycl::device could be passed (now deprecated). The device or the selector is then passed to the device_queue.h/pick_device function, which accepts a variant for either automatic device selection or the device selector or the sycl::device itself.

The pick_device function has in total 4 branches in which a device can be selected. The first one is choosing the provided sycl::device, the second one is, choosing the device with the CELERITY_DEVICES environment variable. If in case none of those two options are taken, we then have another branch that either chooses the device automatically or uses the selector to choose the device.

The logic for automatic device selection:

  1. try_find_device_per_node function takes in the type of device which should be selected and checks if any platform has enough devices of the same type to assign a unique device to each local node.
    The automatic selection first checks for all the unique devices to be GPUs, if not, it checks for any type of enough devices.
  2. If there's no platform that can provide enough unique devices, the automatic selection tries to find at least one device, which is preferably a GPU, if not then any type of device with try_find_one_device.

The logic for device selection with the selector:
The same two functions as above are called, however, the type of device doesn't matter here.

  1. For try_find_device_per_node all the devices are collected and are sorted using the device selector. If enough devices of the same type belong to one platform, they are selected.
  2. Similarly, for try_find_one_device, all the devices are collected and sorted and the device with the highest score is selected.
  3. A device with a score of -1 is excluded from the selection.

In case, no device gets selected according to the selector, the selection does not fall back to any random device.

Copy link

@BlackMark29A BlackMark29A left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks quite promising, however I have some remarks and questions.

include/celerity.h Outdated Show resolved Hide resolved
include/device_queue.h Outdated Show resolved Hide resolved
include/device_queue.h Outdated Show resolved Hide resolved
include/device_queue.h Outdated Show resolved Hide resolved
include/device_queue.h Show resolved Hide resolved
src/runtime.cc Outdated Show resolved Hide resolved
test/sycl_tests.cc Show resolved Hide resolved
Copy link

@BlackMark29A BlackMark29A left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, I just found one minor detail which should be changed to be in-line with previous changes:

include/distr_queue.h Outdated Show resolved Hide resolved
@almightyvats almightyvats force-pushed the test-device-selection branch from 782deb4 to a8f0237 Compare April 13, 2022 10:03
Copy link
Member

@psalz psalz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I've added some notes!

include/config.h Outdated Show resolved Hide resolved
include/celerity.h Outdated Show resolved Hide resolved
test/device_selection_tests.cc Outdated Show resolved Hide resolved
test/device_selection_tests.cc Outdated Show resolved Hide resolved
test/device_selection_tests.cc Show resolved Hide resolved
test/device_selection_tests.cc Outdated Show resolved Hide resolved
include/device_queue.h Outdated Show resolved Hide resolved
include/device_queue.h Outdated Show resolved Hide resolved
include/device_queue.h Outdated Show resolved Hide resolved
@almightyvats almightyvats force-pushed the test-device-selection branch from a8f0237 to 5d116fd Compare April 20, 2022 10:27
include/device_queue.h Outdated Show resolved Hide resolved
include/device_queue.h Outdated Show resolved Hide resolved
@psalz
Copy link
Member

psalz commented May 11, 2022

As already mentioned offline, while doing another review of this PR I noticed several opportunities for streamlining the tests, as well as some brittle constructs and redundant test cases. As it would've taken me about the same time to write these things down as to actually implement them, I chose the latter approach. In total I was able to remove about 180 lines of code.

Semi-complete changelog:

  • Replaced all manual creations of mock_platform and in particular mock_device with factory functions. This removes the need to manually set IDs, and removes all but the device of interest from the local scope.
  • Fixed all uses of platform/device IDs as parameters to the host_config (cf. set_mock_host_cfg), since those are not the same. Instead of providing the actual hardware ID, the CELERITY_DEVICES environment variable simply expects enumeration-ids.
  • Speaking of which, I removed the apparently unused local_num_cpus field in host_config.
  • Moved a bunch of sections into their own test cases as there was basically no overlap between their setup logic.
  • Reworded several test case and section names to be more precise, please check if they still reflect their original intent.
  • Made some test cases more robust (e.g. testing that all local ranks receive the same GPU device if an insufficient number of other devices is available, instead of just checking for one local rank)
  • I also noticed that there were two tests that said something like "runtime::init/distr_queue provides an overloaded constructor with device selector", but the test cases themselves did not call those functions at all (instead again just calling device_queue::pick_device). One of those tests was the one I mentioned above where I moved all sections into separate test cases, the other is the one testing with real SYCL devices (disabled for ComputeCpp) - I simply renamed that one. What remains however is the questions whether we actually need to test runtime::init and distr_queue::distr_queue.
  • Lastly I removed 2 or so test cases that I thought were redundant.

@psalz psalz force-pushed the test-device-selection branch from 55dc365 to e9514d4 Compare May 17, 2022 11:48
@almightyvats almightyvats force-pushed the test-device-selection branch from e9514d4 to b081340 Compare May 17, 2022 14:01
@psalz psalz force-pushed the test-device-selection branch from b081340 to e9514d4 Compare May 17, 2022 14:46
@psalz psalz merged commit 35c3d0c into master May 17, 2022
@psalz psalz deleted the test-device-selection branch May 17, 2022 14:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants