-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Using SIDs as neighbour tables in functional API #1730
Conversation
Hi there, this is jenkins continuous integration... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Mostly style comments.
launch jenkins |
launch jenkins |
launch perftest |
launch jenkins |
launch perftest |
launch jenkins |
launch jenkins |
launch jenkins |
__device__ auto neighbor_table_neighbors_device(Table const &table, int index) | ||
-> array<std::int32_t, 2> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
__device__ auto neighbor_table_neighbors_device(Table const &table, int index) | |
-> array<std::int32_t, 2> { | |
__device__ auto neighbor_table_neighbors_device(Table const &table, int index) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried, but:
[build] <PATH>/tests/unit_tests/fn/test_fn_sid_neighbor_table.cu(54): error: A non-constexpr __device__ function ("neighbor_table_neighbors_device") with "auto" deduced return type cannot be directly referenced outside the bodies of device functions, except if the reference is absent when __CUDA_ARCH__ is undefined
What's weird is that it first builds it, and complains on rebuild.
const std::int32_t data[num_elements][num_neighbors] = {{0, 1}, {10, 11}, {20, 21}}; | ||
const auto device_data = cuda_util::cuda_malloc<std::int32_t>(num_elements * num_neighbors); | ||
GT_CUDA_CHECK(cudaMemcpy(device_data.get(), &data, sizeof data, cudaMemcpyHostToDevice)); | ||
using dim_hymap_t = hymap::keys<edge_dim_t, edge_to_cell_dim_t>; | ||
auto contents = sid::synthetic() | ||
.set<sid::property::origin>(sid::host_device::simple_ptr_holder(device_data.get())) | ||
.set<sid::property::strides>(dim_hymap_t::make_values(num_neighbors, 1)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably good, unless we want to introduce a dependency on storage
which would simplify this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a way to use a global statically allocated array on the device? I.e. __constant__
? Then we might not need this malloc, memcpy and synthetic stuff.
launch tests |
launch jenkins |
1 similar comment
launch jenkins |
class Strides> | ||
struct sid_neighbor_table { | ||
PtrHolder origin; | ||
Strides strides; | ||
}; | ||
|
||
template <class IndexDimension, | ||
class NeighborDimension, | ||
std::size_t MaxNumNeighbors, | ||
class PtrHolder, | ||
class Strides> | ||
GT_FUNCTION auto neighbor_table_neighbors( | ||
sid_neighbor_table<IndexDimension, NeighborDimension, MaxNumNeighbors, PtrHolder, Strides> const &table, | ||
int index) { | ||
|
||
using namespace gridtools::literals; | ||
|
||
auto ptr = table.origin(); | ||
using element_type = std::remove_reference_t<decltype(*ptr)>; | ||
|
||
gridtools::array<element_type, MaxNumNeighbors> neighbors; | ||
|
||
sid::shift(ptr, sid::get_stride<IndexDimension>(table.strides), index); | ||
for (std::size_t element_idx = 0; element_idx < MaxNumNeighbors; ++element_idx) { | ||
neighbors[element_idx] = *ptr; | ||
sid::shift(ptr, sid::get_stride<NeighborDimension>(table.strides), 1_c); | ||
} | ||
return neighbors; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With explicit return type, friend
works for the case that failed in CI, but we can keep the current version.
class Strides> | |
struct sid_neighbor_table { | |
PtrHolder origin; | |
Strides strides; | |
}; | |
template <class IndexDimension, | |
class NeighborDimension, | |
std::size_t MaxNumNeighbors, | |
class PtrHolder, | |
class Strides> | |
GT_FUNCTION auto neighbor_table_neighbors( | |
sid_neighbor_table<IndexDimension, NeighborDimension, MaxNumNeighbors, PtrHolder, Strides> const &table, | |
int index) { | |
using namespace gridtools::literals; | |
auto ptr = table.origin(); | |
using element_type = std::remove_reference_t<decltype(*ptr)>; | |
gridtools::array<element_type, MaxNumNeighbors> neighbors; | |
sid::shift(ptr, sid::get_stride<IndexDimension>(table.strides), index); | |
for (std::size_t element_idx = 0; element_idx < MaxNumNeighbors; ++element_idx) { | |
neighbors[element_idx] = *ptr; | |
sid::shift(ptr, sid::get_stride<NeighborDimension>(table.strides), 1_c); | |
} | |
return neighbors; | |
} | |
class Strides, | |
class ElementType = std::remove_reference_t<decltype(*std::declval<PtrHolder>()())>> | |
struct sid_neighbor_table { | |
PtrHolder origin; | |
Strides strides; | |
friend GT_FUNCTION gridtools::array<ElementType, MaxNumNeighbors> neighbor_table_neighbors(sid_neighbor_table const &table, int index) { | |
auto ptr = table.origin(); | |
gridtools::array<ElementType, MaxNumNeighbors> neighbors; | |
sid::shift(ptr, sid::get_stride<IndexDimension>(table.strides), index); | |
for (std::size_t element_idx = 0; element_idx < MaxNumNeighbors; ++element_idx) { | |
neighbors[element_idx] = *ptr; | |
sid::shift(ptr, sid::get_stride<NeighborDimension>(table.strides), 1); | |
} | |
return neighbors; | |
} | |
}; | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I kinda got used to the friend method so it would be nice, but I think I'll just leave it and we can move it once we drop support for older GCC.
__device__ auto neighbor_table_neighbors_device(Table const &table, int index) | ||
-> array<int, 2> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
__device__ auto neighbor_table_neighbors_device(Table const &table, int index) | |
-> array<int, 2> { | |
constexpr __device__ auto neighbor_table_neighbors_device(Table const &table, int index) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does this work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, surprisingly, yes, but I don't think it should though, because it's calling non-constexpr functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is some special handling of constexpr functions in nvcc, even if not called in constexpr context it affects how nvcc handles the functions in good and in bad ways...
launch jenkins |
launch perftest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good.
launch perftest |
launch jenkins |
Adds a simple class that wraps an SID and implements the neighbour table concept. This makes it possible to use Python buffers as neighbour tables by first wrapping them into an SID, but any SID is suitable as a neighbour table.
Adds a simple class that wraps an SID and implements the neighbour table concept. This makes it possible to use Python buffers as neighbour tables by first wrapping them into an SID, but any SID is suitable as a neighbour table.