Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

faster cluster, now does not requires to know number of modules #68

Merged
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 15 additions & 6 deletions RecoLocalTracker/SiPixelClusterizer/plugins/gpuClustering.h
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
#ifndef RecoLocalTracker_SiPixelClusterizer_plugins_gpuClustering_h
#define RecoLocalTracker_SiPixelClusterizer_plugins_gpuClustering_h

#include <cassert>
#include <cstdint>
#include <cstdio>
#include<cassert>

namespace gpuClustering {

Expand Down Expand Up @@ -46,6 +46,8 @@ namespace gpuClustering {

__shared__ int msize;

if (blockIdx.x >=moduleStart[0]) return;

auto first = moduleStart[1 + blockIdx.x];

auto me = id[first];
Expand Down Expand Up @@ -79,9 +81,11 @@ namespace gpuClustering {
if (first>=msize) return;

int jmax[10];
// int jmin[10];
auto niter = (msize-first)/blockDim.x;
assert(niter<10);
for (int i=0; i<niter+1; ++i) jmax[i]=msize;
// for (int k=0; k<niter+1; ++k) jmin[k]=first+k*blockDim.x+1;
for (int k=0; k<niter+1; ++k) jmax[k]=msize;

while (go) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know it was like this before, but I'm wondering - is it actually safe to use __syncthreads() inside a while loop?
Isn't it a problem if some threads run a different number of loops, or take different branches (i.e. continue) ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cuda-memcheck did not detect any race

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All threads goes through the while loop together.
the continue is in the for and there is no __syncthreads() in the for-loop.
So no divergencies besides the for loop

__syncthreads();
Expand All @@ -94,11 +98,16 @@ namespace gpuClustering {
if (id[i]==InvId) continue; // not valid
assert(id[i]==me); // break; // end of module
++debug[i];
auto js = i+1;
// auto js = jmin[k];
auto jm = jmax[k];
for (int j=i+1; j<jm; ++j) {
jmax[k]=i+1;
// bool first = true;
for (int j=js; j<jm; ++j) {
if (id[j]==InvId) continue; // not valid
if (std::abs(int(x[j])-int(x[i]))>1) continue;
if (std::abs(int(y[j])-int(y[i]))>1) continue;
if (std::abs(int(x[j])-int(x[i]))>1 |
std::abs(int(y[j])-int(y[i]))>1) continue;
// if (first) {jmin[k] = j; first=false;}
auto old = atomicMin(&clus[j],clus[i]);
if (old!=clus[i]) go=true;
atomicMin(&clus[i],old);
Expand Down Expand Up @@ -167,4 +176,4 @@ namespace gpuClustering {

} //namespace gpuClustering

#endif // RecoLocalTracker_SiPixelClusterizer_plugins_gpuClustering_h
#endif
11 changes: 5 additions & 6 deletions RecoLocalTracker/SiPixelClusterizer/test/gpuClustering.cu
Original file line number Diff line number Diff line change
Expand Up @@ -198,13 +198,8 @@ int main(void)
d_id.get(), d_moduleStart.get() ,d_clus.get(),n
);

cuda::memory::copy(&nModules,d_moduleStart.get(),sizeof(uint32_t));

std::cout << "found " << nModules << " Modules active" << std::endl;


threadsPerBlock = 256;
blocksPerGrid = nModules;
blocksPerGrid = MaxNumModules; //nModules;



Expand All @@ -226,6 +221,10 @@ int main(void)
);


cuda::memory::copy(&nModules,d_moduleStart.get(),sizeof(uint32_t));

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this async?

std::cout << "found " << nModules << " Modules active" << std::endl;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't mind keeping the cout since this is a just a test file, not one used in production.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's a test!



uint32_t nclus[MaxNumModules], moduleId[nModules];
cuda::memory::copy(h_clus.get(), d_clus.get(), size32);
cuda::memory::copy(&nclus,d_clusInModule.get(),MaxNumModules*sizeof(uint32_t));
Expand Down