Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HashmapAccumulator has several unused members, misnamed parameters #508

Closed
brian-kelley opened this issue Dec 4, 2019 · 11 comments · Fixed by #731
Closed

HashmapAccumulator has several unused members, misnamed parameters #508

brian-kelley opened this issue Dec 4, 2019 · 11 comments · Fixed by #731
Assignees
Labels
Cleanup Code maintenance that isn't a bugfix or new feature InDevelop
Milestone

Comments

@brian-kelley
Copy link
Contributor

brian-kelley commented Dec 4, 2019

The HashmapAccumulator has three members (hash_key_size, max_value_size, and used_size) that are not used internally, but they are set by the constructors. used_size_ and max_value_size_ (with trailing underscores) are instead passed to the insert() routines.

hash_key_size_ is not passed to any insert routines. It is the number of possible hash outputs, aka 1 plus the bit mask that forms the hash function: hash(value) = value & (hash_key_size - 1).

I think these changes would make sense and make things a lot more readable and debuggable:

  • max_value_size is the length of the keys and nexts arrays, so it should be an immutable member. max_value_size_ should not be passed into the insert routines. In fact, what SPGEMM symbolic does now is read out the member from the hashmap and pass it in.
  • max_value_size should really be called capacity or something similar - it's meaning is the length of the two arrays (keys, nexts).
  • used_size is an atomically incremented counter that measures how full the hashmap is. It is updated in parallel, so it usually lives in shared/global memory. It should either not be a member and just be passed by ref to insert, OR kept as a member, but be a pointer.
  • 3 ways to deal with hash_key_size:
    • Remove the member, since it is unused
    • Keep it as an immutable member, and have KOKKOS_FORCEINLINE_FUNCTION key_type hash() be a member of Hashmap. The SPGEMM code wouldn't need to keep the hash mask as a local variable anymore.
    • Keep as immutable member, and instead of passing the both the key and the hash to insert(), just pass the key, and compute the hash internally.
@ndellingwood
Copy link
Contributor

I'll mark this cleanup enhancement since this isn't a bug

@brian-kelley brian-kelley added Cleanup Code maintenance that isn't a bugfix or new feature and removed enhancement labels Apr 7, 2020
@e10harvey e10harvey self-assigned this May 14, 2020
@e10harvey
Copy link
Contributor

e10harvey commented May 20, 2020

Keep as immutable member, and instead of passing the both the key and the hash to insert(), just pass the key, and compute the hash internally.

@brian-kelley: Although removing hash_key_size is the simplest solution, I will attempt to refactor all the code to compute the hash internally (within HashmapAccumulator). Computing the hash internally is a step in the right direction. There is one special use-case of

//used in kkmem's numeric phase to insert to first level hashmaps.
//function to be called from device.
//Accumulation is Add operation. It is not atomicAdd, as this
//is for the cases where we know that none of the simultanous
//insertions will have the same key.
//Insertion is simulteanous for the vector lanes of a thread.
//used_size should be a shared pointer among the thread vectors
template <typename team_member_t>
KOKKOS_INLINE_FUNCTION
int vector_atomic_insert_into_hash_mergeAdd (
const team_member_t & /* teamMember */,
const int /* vector_size */,
size_type &hash,
const key_type key,
const value_type value,
volatile size_type *used_size_,
const size_type max_value_size_ )
{
if (hash != -1) {
size_type i = hash_begins[hash];
for (; i != -1; i = hash_nexts[i]) {
if (keys[i] == key) {
values[i] = values[i] + value;
return INSERT_SUCCESS;
}
}
} else {
return INSERT_SUCCESS;
}
if (used_size_[0] >= max_value_size_) {
return INSERT_FULL;
}
size_type my_write_index = Kokkos::atomic_fetch_add(used_size_, size_type(1));
if (my_write_index >= max_value_size_) {
return INSERT_FULL;
} else {
keys[my_write_index] = key;
values[my_write_index] = value;
#if defined(KOKKOS_ARCH_VOLTA) || defined(KOKKOS_ARCH_VOLTA70) || defined(KOKKOS_ARCH_VOLTA72)
//this is an issue on VOLTA because warps do not go in SIMD fashion anymore.
//while some thread might insert my_write_index into linked list, another
//thread in the warp might be reading keys in above loop.
//before inserting the new value in liked list -- which is done with atomic exchange below,
//we make sure that the linked is is complete my assigning the hash_next to current head.
//the head might be different when we do the atomic exchange.
//this would cause temporarily skipping a key in the linkedlist until
//hash_nexts is updated second time as below.
//but this is okay for spgemm,
//because no two keys will be inserted into hashmap at the same time, as rows have unique columns.
hash_nexts[my_write_index] = hash_begins[hash];
#endif
size_type hashbeginning = Kokkos::atomic_exchange(hash_begins+hash, my_write_index);
hash_nexts[my_write_index] = hashbeginning;
return INSERT_SUCCESS;
}
}
in
if (overall_num_unsuccess){
nnz_lno_t hash_ = -1;
if (num_unsuccess) {
hash_ = b_col_ind % global_memory_hash_size;
}
//int insertion =
hm2.vector_atomic_insert_into_hash_mergeAdd(
teamMember, vector_size,
hash_,b_col_ind,b_val,
used_hash_sizes + 1, hm2.max_value_size
);
which calculates the hash using modulo instead of bit-wise AND.

This same use-case is making completely removing max_value_size as a parameter challenging.

@srajama1
Copy link
Contributor

Keep in mind this is used at the very low level. Performance of this data structure is important for the performance of the kernels.

@e10harvey
Copy link
Contributor

Keep in mind this is used at the very low level. Performance of this data structure is important for the performance of the kernels.

Thanks, Siva. What are the reasons for calculating the hash outside of the inlined HashmapAccumulator insertion routines?

@srajama1
Copy link
Contributor

I don't see a particular use case for that.

@e10harvey
Copy link
Contributor

I mean, we are currently calculating the hash outside of the inlined HashmapAccumulator insertion routines. Why are we doing this?

@brian-kelley
Copy link
Contributor Author

@e10harvey I guess you could make bool powerOfTwoHashSize a template parameter, and that way no flags need to be stored and there's no runtime overhead. However you do it, we definitely need to use bitwise-AND masking where possible because integer div/mod are expensive on CUDA.

@e10harvey
Copy link
Contributor

@e10harvey I guess you could make bool powerOfTwoHashSize a template parameter, and that way no flags need to be stored and there's no runtime overhead. However you do it, we definitely need to use bitwise-AND masking where possible because integer div/mod are expensive on CUDA.

Thanks @brian-kelley. This is exactly what I've started.

@e10harvey
Copy link
Contributor

@e10harvey I guess you could make bool powerOfTwoHashSize a template parameter, and that way no flags need to be stored and there's no runtime overhead. However you do it, we definitely need to use bitwise-AND masking where possible because integer div/mod are expensive on CUDA.

Thanks @brian-kelley. This is exactly what I've started.

@brian-kelley: Actually, thinking about this more, is there any reason against adding the following to the constructor bodies:

compute_hash_func = compute_hash_func_table[__hash_op];

Then the instertion routines will invoke:

hash = compute_hash_func(arg1, arg2);

@brian-kelley
Copy link
Contributor Author

@e10harvey Function pointers/first-class functions incur overhead on CPUs and I don't think they can be used at all in CUDA. I would just go with the template.

@brian-kelley
Copy link
Contributor Author

Actually function pointers are supported in all CUDA versions supported by Kokkos, but they still have a performance cost.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Cleanup Code maintenance that isn't a bugfix or new feature InDevelop
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants