-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
faster cluster, now does not requires to know number of modules #68
faster cluster, now does not requires to know number of modules #68
Conversation
|
Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
auto niter = (msize-first)/blockDim.x; | ||
assert(niter<10); | ||
for (int i=0; i<niter+1; ++i) jmax[i]=msize; | ||
// for (int k=0; k<niter+1; ++k) jmin[k]=first+k*blockDim.x+1; | ||
for (int k=0; k<niter+1; ++k) jmax[k]=msize; | ||
|
||
while (go) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know it was like this before, but I'm wondering - is it actually safe to use __syncthreads()
inside a while
loop?
Isn't it a problem if some threads run a different number of loops, or take different branches (i.e. continue
) ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cuda-memcheck did not detect any race
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All threads goes through the while loop together.
the continue is in the for and there is no __syncthreads()
in the for-loop.
So no divergencies besides the for loop
// if (h_clus[i]==i) seeds.push_back(i); // only if no renumbering | ||
} | ||
|
||
std::cout << "found " << std::accumulate(nclus,nclus+MaxNumModules,0) << ' ' << clids.size() << " clusters" << std::endl; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove?
d_clus.get(), | ||
d_debug.get(), | ||
n | ||
); | ||
|
||
cuda::memory::copy(&nModules,d_moduleStart.get(),sizeof(uint32_t)); | ||
std::cout << "found " << nModules << " Modules active" << std::endl; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't mind keeping the cout
since this is a just a test file, not one used in production.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's a test!
d_clus.get(), | ||
d_debug.get(), | ||
n | ||
); | ||
|
||
cuda::memory::copy(&nModules,d_moduleStart.get(),sizeof(uint32_t)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this async?
threadsPerBlock = 256; | ||
blocksPerGrid = MaxNumModules; //nModules; | ||
|
||
|
||
|
||
std::cout |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove?
// Launch CUDA Kernels | ||
|
||
|
||
int threadsPerBlock = 256; | ||
int blocksPerGrid = (numElements + threadsPerBlock - 1) / threadsPerBlock; | ||
std::cout |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a test; it is normal to print something!
uint32_t nModules=0; | ||
cuda::memory::copy(d_moduleStart.get(),&nModules,sizeof(uint32_t)); | ||
|
||
|
||
cuda::memory::copy(d_id.get(), h_id.get(), size16); | ||
cuda::memory::copy(d_x.get(), h_x.get(), size16); | ||
cuda::memory::copy(d_y.get(), h_y.get(), size16); | ||
cuda::memory::copy(d_adc.get(), h_adc.get(), size8); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are these copies async?
cuda::memory::copy(d_id.get(), h_id.get(), size16); | ||
cuda::memory::copy(d_x.get(), h_x.get(), size16); | ||
cuda::memory::copy(d_y.get(), h_y.get(), size16); | ||
cuda::memory::copy(d_adc.get(), h_adc.get(), size8); | ||
cuda::memory::device::zero(d_debug.get(),size32); | ||
cuda::memory::device::zero(d_debug.get(),size32); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a cuda memset to zero? If yes, is there a async version with stream?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can leave the synchronous memcpy/memset and the cout in a .../test file.
Co-authored-by: Anders <aryd@cern.ch>
addresses #66 as well