-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
faster cluster, now does not requires to know number of modules #68
Changes from 1 commit
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -198,13 +198,8 @@ int main(void) | |
d_id.get(), d_moduleStart.get() ,d_clus.get(),n | ||
); | ||
|
||
cuda::memory::copy(&nModules,d_moduleStart.get(),sizeof(uint32_t)); | ||
|
||
std::cout << "found " << nModules << " Modules active" << std::endl; | ||
|
||
|
||
threadsPerBlock = 256; | ||
blocksPerGrid = nModules; | ||
blocksPerGrid = MaxNumModules; //nModules; | ||
|
||
|
||
|
||
|
@@ -226,6 +221,10 @@ int main(void) | |
); | ||
|
||
|
||
cuda::memory::copy(&nModules,d_moduleStart.get(),sizeof(uint32_t)); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. is this async? |
||
std::cout << "found " << nModules << " Modules active" << std::endl; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. remove? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't mind keeping the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. it's a test! |
||
|
||
|
||
uint32_t nclus[MaxNumModules], moduleId[nModules]; | ||
cuda::memory::copy(h_clus.get(), d_clus.get(), size32); | ||
cuda::memory::copy(&nclus,d_clusInModule.get(),MaxNumModules*sizeof(uint32_t)); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know it was like this before, but I'm wondering - is it actually safe to use
__syncthreads()
inside awhile
loop?Isn't it a problem if some threads run a different number of loops, or take different branches (i.e.
continue
) ?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cuda-memcheck did not detect any race
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All threads goes through the while loop together.
the continue is in the for and there is no
__syncthreads()
in the for-loop.So no divergencies besides the for loop