Skip to content
This repository was archived by the owner on Mar 21, 2024. It is now read-only.

CUDA's reduce_by_key fails on sm_50 devices #628

Closed
jaredhoberock opened this issue Feb 5, 2015 · 6 comments
Closed

CUDA's reduce_by_key fails on sm_50 devices #628

jaredhoberock opened this issue Feb 5, 2015 · 6 comments
Labels
type: bug: functional Does not work as intended.

Comments

@jaredhoberock
Copy link
Contributor

No description provided.

@jaredhoberock jaredhoberock added the type: bug: functional Does not work as intended. label Feb 5, 2015
@jaredhoberock jaredhoberock added this to the Release1.9 milestone Feb 5, 2015
@jaredhoberock jaredhoberock changed the title reduce_by_key fails on sm_50 devices CUDA's reduce_by_key fails on sm_50 devices Feb 5, 2015
@sdalton1
Copy link
Contributor

Narrowed issue down to https://github.com/thrust/thrust/blob/master/thrust/system/cuda/detail/bulk/algorithm/reduce_by_key.hpp#L148

adding assert before malloc fixes issue. keys_first and keys_last do not seem to be initialized properly so the for loop on line
https://github.com/thrust/thrust/blob/master/thrust/system/cuda/detail/bulk/algorithm/reduce_by_key.hpp#L159
is never executed.

#if __CUDA_ARCH__ >= 200
  assert(keys_first < keys_last);
  size_type *s_flags = reinterpret_cast<size_type*>(bulk::malloc(g, interval_size * sizeof(int)));
  value_type *s_values = reinterpret_cast<value_type*>(bulk::malloc(g, interval_size * sizeof(value_type)));
#else
#include <thrust/device_vector.h>
#include <thrust/reduce.h>
#include <thrust/sequence.h>
#include <iostream>

int main(int argc, char ** argv)
{
    thrust::device_vector<int> indices(10);
    thrust::sequence(indices.begin(), indices.end());

    thrust::device_vector<int> temp(10, -1);

    thrust::counting_iterator<int> iter(0);
    thrust::reduce_by_key(thrust::make_zip_iterator(thrust::make_tuple(iter, iter)),
                          thrust::make_zip_iterator(thrust::make_tuple(iter, iter)) + temp.size(),
                          thrust::constant_iterator<int>(1),
                          thrust::make_discard_iterator(),
                          temp.begin(),
                          thrust::equal_to<thrust::tuple<int,int> >(),
                          thrust::plus<int>());

    std::copy(temp.begin(), temp.end(), std::ostream_iterator<int>(std::cout, " "));
    std::cout << std::endl;

    thrust::fill(temp.begin(), temp.end(), -1);

    thrust::reduce_by_key(thrust::make_zip_iterator(thrust::make_tuple(indices.begin(), indices.begin())),
                          thrust::make_zip_iterator(thrust::make_tuple(indices.end(), indices.end())),
                          thrust::constant_iterator<int>(1),
                          thrust::make_discard_iterator(),
                          temp.begin(),
                          thrust::equal_to<thrust::tuple<int,int> >(),
                          thrust::plus<int>());

    std::copy(temp.begin(), temp.end(), std::ostream_iterator<int>(std::cout, " "));
    std::cout << std::endl;

    return 0;
}

@jaredhoberock
Copy link
Contributor Author

Nice work, Steve, thanks!

Are you saying the introduction of an assert() causes the issue to go away, or that your assert() seems to reveal the root cause of the issue?

I'm trying to figure out if Thrust is not initializing those iterators correctly, or if there is a compiler problem at root.

@jaredhoberock
Copy link
Contributor Author

Just tried your reproducer program with a nightly nvcc-8.0 and nvcc-7.5 and ran it on a GTX Titan X:

$ cat repro.cu 
#include <thrust/device_vector.h>
#include <thrust/reduce.h>
#include <thrust/sequence.h>
#include <iostream>

int main(int argc, char ** argv)
{
    thrust::device_vector<int> indices(10);
    thrust::sequence(indices.begin(), indices.end());

    thrust::device_vector<int> temp(10, -1);

    thrust::counting_iterator<int> iter(0);
    thrust::reduce_by_key(thrust::make_zip_iterator(thrust::make_tuple(iter, iter)),
                          thrust::make_zip_iterator(thrust::make_tuple(iter, iter)) + temp.size(),
                          thrust::constant_iterator<int>(1),
                          thrust::make_discard_iterator(),
                          temp.begin(),
                          thrust::equal_to<thrust::tuple<int,int> >(),
                          thrust::plus<int>());

    std::copy(temp.begin(), temp.end(), std::ostream_iterator<int>(std::cout, " "));
    std::cout << std::endl;

    thrust::fill(temp.begin(), temp.end(), -1);

    thrust::reduce_by_key(thrust::make_zip_iterator(thrust::make_tuple(indices.begin(), indices.begin())),
                          thrust::make_zip_iterator(thrust::make_tuple(indices.end(), indices.end())),
                          thrust::constant_iterator<int>(1),
                          thrust::make_discard_iterator(),
                          temp.begin(),
                          thrust::equal_to<thrust::tuple<int,int> >(),
                          thrust::plus<int>());

    std::copy(temp.begin(), temp.end(), std::ostream_iterator<int>(std::cout, " "));
    std::cout << std::endl;

    return 0;
}
$ /usr/local/cuda-7.5/bin/nvcc -arch=sm_52 -I. -run repro.cu 
1 1 1 1 1 1 1 1 1 1 
1 -1 -1 -1 -1 -1 -1 -1 -1 -1 
$ /usr/local/cuda-8.0/bin/nvcc -arch=sm_52 -I. -run repro.cu 
1 1 1 1 1 1 1 1 1 1 
1 1 1 1 1 1 1 1 1 1

It seems like the problem isn't present in nvcc 8.0, which leads me to believe it was a code generation problem. If you agree, then we can probably close this bug.

@sdalton1
Copy link
Contributor

assert() seems to make the issue go away. I think you are right about it being a code generation bug.

simply testing the condition and adding a empty printf fixes the error as well even though the code should never be executed.

closing this sounds good.

  #if __CUDA_ARCH__ >= 200
 #if __CUDA_ARCH__ >= 500
   if(keys_first > keys_last) printf("failed");
 #endif
    size_type *s_flags = reinterpret_cast<size_type*>(bulk::malloc(g, interval_size * sizeof(int)));
    value_type *s_values = reinterpret_cast<value_type*>(bulk::malloc(g, interval_size * sizeof(value_type)));
  #else
```:

@jaredhoberock
Copy link
Contributor Author

Thanks Steve, nice work identifying the root cause. I'll close.

@3gx
Copy link
Contributor

3gx commented Feb 26, 2016

I believe it was a compiler bug, that was fixed recently.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
type: bug: functional Does not work as intended.
Projects
None yet
Development

No branches or pull requests

3 participants