-
Notifications
You must be signed in to change notification settings - Fork 756
CUDA's reduce_by_key fails on sm_50 devices #628
Comments
Narrowed issue down to https://github.com/thrust/thrust/blob/master/thrust/system/cuda/detail/bulk/algorithm/reduce_by_key.hpp#L148 adding assert before malloc fixes issue. keys_first and keys_last do not seem to be initialized properly so the for loop on line #if __CUDA_ARCH__ >= 200
assert(keys_first < keys_last);
size_type *s_flags = reinterpret_cast<size_type*>(bulk::malloc(g, interval_size * sizeof(int)));
value_type *s_values = reinterpret_cast<value_type*>(bulk::malloc(g, interval_size * sizeof(value_type)));
#else #include <thrust/device_vector.h>
#include <thrust/reduce.h>
#include <thrust/sequence.h>
#include <iostream>
int main(int argc, char ** argv)
{
thrust::device_vector<int> indices(10);
thrust::sequence(indices.begin(), indices.end());
thrust::device_vector<int> temp(10, -1);
thrust::counting_iterator<int> iter(0);
thrust::reduce_by_key(thrust::make_zip_iterator(thrust::make_tuple(iter, iter)),
thrust::make_zip_iterator(thrust::make_tuple(iter, iter)) + temp.size(),
thrust::constant_iterator<int>(1),
thrust::make_discard_iterator(),
temp.begin(),
thrust::equal_to<thrust::tuple<int,int> >(),
thrust::plus<int>());
std::copy(temp.begin(), temp.end(), std::ostream_iterator<int>(std::cout, " "));
std::cout << std::endl;
thrust::fill(temp.begin(), temp.end(), -1);
thrust::reduce_by_key(thrust::make_zip_iterator(thrust::make_tuple(indices.begin(), indices.begin())),
thrust::make_zip_iterator(thrust::make_tuple(indices.end(), indices.end())),
thrust::constant_iterator<int>(1),
thrust::make_discard_iterator(),
temp.begin(),
thrust::equal_to<thrust::tuple<int,int> >(),
thrust::plus<int>());
std::copy(temp.begin(), temp.end(), std::ostream_iterator<int>(std::cout, " "));
std::cout << std::endl;
return 0;
} |
Nice work, Steve, thanks! Are you saying the introduction of an I'm trying to figure out if Thrust is not initializing those iterators correctly, or if there is a compiler problem at root. |
Just tried your reproducer program with a nightly
It seems like the problem isn't present in nvcc 8.0, which leads me to believe it was a code generation problem. If you agree, then we can probably close this bug. |
assert() seems to make the issue go away. I think you are right about it being a code generation bug. simply testing the condition and adding a empty printf fixes the error as well even though the code should never be executed. closing this sounds good. #if __CUDA_ARCH__ >= 200
#if __CUDA_ARCH__ >= 500
if(keys_first > keys_last) printf("failed");
#endif
size_type *s_flags = reinterpret_cast<size_type*>(bulk::malloc(g, interval_size * sizeof(int)));
value_type *s_values = reinterpret_cast<value_type*>(bulk::malloc(g, interval_size * sizeof(value_type)));
#else
```: |
Thanks Steve, nice work identifying the root cause. I'll close. |
I believe it was a compiler bug, that was fixed recently. |
No description provided.
The text was updated successfully, but these errors were encountered: