-
Notifications
You must be signed in to change notification settings - Fork 99
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Graph: Deterministic coloring #249
Conversation
@@ -181,6 +184,8 @@ void run_experiment( | |||
kh.set_verbose(true); | |||
} | |||
|
|||
std::cout << "algorithm: " << algorithm << std::endl; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assume this is temporary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Never mind previous comment, this is in testing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added guards and now you need to request these outputs explicitly, it might actually make sense to have some special debugging flag to get that level of output details?
|
||
bool _ticToc; //if true print info in each step | ||
int _chunkSize; //the size of the minimum work unit assigned to threads. Changes the convergence on GPUs | ||
char _use_color_set; //the VBD algorithm type. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought this sort of this would be in the handle ..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For some reason it is being kept separate in the other GraphColor classes, so I am using the same pattern. I could look into why/if this design is appropriate?
|
||
/** \brief Function to color the vertices of the graphs. Performs a vertex-based coloring. | ||
* \param colors is the output array corresponding the color of each vertex. Size is this->nv. | ||
* Attn: Color array must be nonnegative numbers. If there is no initial colors, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"are no"
* Attn: Color array must be nonnegative numbers. If there is no initial colors, | ||
* it should be all initialized with zeros. Any positive value in the given array, will make the | ||
* algorithm to assume that the color is fixed for the corresponding vertex. | ||
* \param num_phases: The number of iterations (phases) that algorithm takes to converge. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
num_loops
KOKKOS_LAMBDA(const size_type frontierIdx) { | ||
|
||
size_type frontierNode = frontier(frontierIdx); | ||
int* bannedColors = new int[maxColors]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This new seems to be trouble.
update = ( (valueType) score_(i) < update ? update : (valueType) score_(i) ); | ||
} | ||
}; // functorScoreCalcution() | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This appears like a good baseline to start with and optimize (except the new for bannedcolors above).
@srajama1 let me know what you think of this new cut, I found a bug and fixed it, I also added a bit array variant for the bannedColors. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we also add a unit test and run on bowman and white ?
|
||
template <class score_type, class max_type, class execution_space> | ||
struct functorScoreCalcution { | ||
typedef typename Kokkos::Experimental::Max<max_type, execution_space>::value_type valueType; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@crtrott : Why is this Max still experimental ? I assume we can use it.
@@ -157,7 +159,7 @@ class GraphColor { | |||
|
|||
|
|||
//create a ban color array to keep track of | |||
//which colors have been taking by the neighbor vertices. | |||
//which colors have been taken by the neighbor vertices. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This banned_color need to be deleted ?
src/batched/KokkosBatched_Util.hpp
Outdated
@@ -3,7 +3,7 @@ | |||
|
|||
/// \author Kyungjoo Kim (kyukim@sandia.gov) | |||
|
|||
#define __KOKKOSBATCHED_PROMOTION__ 1 | |||
//#define __KOKKOSBATCHED_PROMOTION__ 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These changes are already in master, right ? How will this merge work ? I am just asking as I don't know.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that some commits were put in my branch by mistake when I rebased my code to a more recent version of develop.
I will investigate how to clean this.
I also have a problem with running the unit-test with Cuda and Serial instantiated at the same time. The Cuda unit-tests run fine on there own and OpenMP+Serial runs fine too but combinations of Cuda+OpenMP or Cuda+Serial are crashing...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The crash do not happen in the perf-test though for some reason?
1ae61c1
to
3b9575e
Compare
@srajama1, I removed the rogue commit from history, it should look better now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most of my comments are related to performance. There is one related to allocating arrays. I am ok as long as these can be addressed before next master promotion so approving this PR so it can be merged.
@@ -76,7 +76,7 @@ struct TeamNrm2<TeamType, XV, false> { | |||
typedef Kokkos::Details::ArithTraits<typename IPT::mag_type> AT; | |||
|
|||
static KOKKOS_INLINE_FUNCTION mag_type team_nrm2 (const TeamType& team, const XV& X) { | |||
mag_type result; | |||
mag_type result = 0.0; //Kokkos::Details::ArithTraits<mag_type>zero(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a TBD later for using ArithTraits ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had issues compiling the code when result
is not initialized, I attempted using the ArithTraits
as it seemed to be the cleanest option but the compiler complained about it so I left it as a comment in order to look at it again later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, Thanks ! Just a TBD that needs clean up in the next round.
} | ||
}); | ||
|
||
Kokkos::deep_copy(host_newFrontierSize, newFrontierSize); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is something we need to explore. It would be better to avoid this copy every time and launch another kernel later. May be just a TBD for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think that ultimately the whole while loop probably needs to be pushed on the device but I am not a big fan of writing "fake" parallel_for
functions to push code on device although at the moment that is probably the only option?
Short of putting the while loop on the device we need to transfer some data back and forth at each iteration to know if the graph has been explored completely or if there is still a new frontier to explore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wish there is a way to return a value or a device initiated copy back to host. I am fine with this for now.
frontierSize() = newFrontierSize(); | ||
newFrontierSize() = 0; | ||
}); | ||
Kokkos::deep_copy(host_frontierSize, frontierSize); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another host to device copy, TBD for later
KOKKOS_LAMBDA(const size_type frontierIdx) { | ||
|
||
size_type frontierNode = frontier(frontierIdx); | ||
int* bannedColors = new int[maxColors]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still don't like this array. This is a crash waiting to happen and something that cannot perform well due to global memory access. Can we check the node_type early in the code and issue a warning. Doesn't have to be in this PR, but better to do it before next master promotion.
If spot check on white and bowman is clean I can click the merge button. |
colors(frontierNode) = myColor; | ||
}); // Loop over current frontier | ||
} | ||
Kokkos::deep_copy(host_newFrontierSize, newFrontierSize); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a fence missing here before ? How are we sure newFrontierSize is updated ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a fence missing here before ? How are we sure newFrontierSize is updated ?
See my latest comment. Other than that I am willing to merge if spot checks are passing. |
@srajama1, here is what I am getting from the spot check on white:
The ibm compiler crashes for the two
The Cuda build actually never starts, probably because of the |
@lucbv don't worry about the xlC failure, there is an internal compiler issue with this particular compiler getting tested. |
@ndellingwood, thanks, I just got results from spot-check for cuda on white:
still waiting for bowman to complete and that should be it... |
@srajama1 @ndellingwood here are my results from bowman:
Here are the tests that failed during the spot-check sorted by build:
|
This is weird. I don't know why we don't see in our jenkins jobs and why we don't see them with Intel 18. Any way this is not due to your PR. If you are ok I can merge it. Can you file one issues for batched trsm (@kyungjoo-kim) and another one for replaceSumInto (@crtrott) ? |
@lucbv You did not update your personal repo. Merge your repo with the develop branch then test it again. |
@kyungjoo-kim sure I can fetch from kokkos-kernels and rebase on develop, did you already fix the errors I have been seeing in my spot-check? |
8e9e5df
to
4aaf45e
Compare
4aaf45e
to
eda1653
Compare
It seems to work OK, will check more tomorrow.
over score array in order to compute the maximum number of colors in the graph.
This is a first draft and it is buggy, it seems that some conflicts lead to nodes missing their color. Will need to look at the atomics carefully and also at the bit array logic...
I am adding a new version of the algorithm called VBDBIT which implements a 64 bits, bit array to store the banned colors. I also found a bug related to decrementing the dependency list, it is now done atomically and that atomic decrement is more rigorously guarded to happen only when necessary.
@srajama1, the code has been reworked to use functors instead of lambdas and is passing the spot check on white and bowman, so you can merge it in at your convenience : ) |
@srajama1
Adding deterministic graph coloring based on dependency list.
This is a simple version that parallelizes using range policies and uses only one atomic to update
newFrontierSize
.In a next step I am planning on switching
bannedColors
to use a bit array that will be more compact in memory.