Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Only report runtime isa support if we also have compiler support #200

Merged
merged 9 commits into from
Dec 9, 2019

Conversation

lgarrison
Copy link
Collaborator

If the CPU supports AVX-512 but Corrfunc was only compiled with AVX support (for example), we end up using the fallback kernel by default. The correct behavior is to use the highest instruction set that has both compile-time and runtime support.

In this PR, I implemented that behavior by modifying instrset_detect(). That fixes the problem globally. But if we really want instrset_detect() to remain a pure runtime detector, we could fix the dispatch functions in each of the kernels, or wrap instrset_detect() in another function.

Should we warn the user about having runtime support but not compile-time support? That's a strange situation that's usually caused by using an old compiler. But with the binutils bug, we have a legitimate situation in which we do not compile against the maximum runtime instruction set.

@lgarrison lgarrison requested a review from manodeep October 7, 2019 15:26
@lgarrison lgarrison added this to the v2.3.2 milestone Oct 7, 2019
@manodeep
Copy link
Owner

manodeep commented Oct 9, 2019

@lgarrison We should leave instr_detect() as purely a runtime feature detector and perhaps add a separate function that finds the highest isa with both runtime and compile-time support. That should solve the problem of offloading to the fallback kernel instead of the next highest isa.

We should also warn the user if the compile time support is lower than runtime support if we are not working around the GAS bug.

@lgarrison
Copy link
Collaborator Author

Okay, how about I rename instr_detect() to instr_detect_runtime() and make a new instr_detect() that checks both runtime and compile-time support?

@manodeep
Copy link
Owner

@lgarrison That sounds good. Would it make sense to rename this new instr_detect to get_max_usable_isa()?

@manodeep
Copy link
Owner

@lgarrison Once this is merged in, are you fine with releasing v2.3.2?

@lgarrison
Copy link
Collaborator Author

lgarrison commented Oct 11, 2019 via email

@manodeep
Copy link
Owner

@lgarrison Do you want to add in fallback_offset = highest_isa; as well -- that way the code will default to the highest supported ISA rather than the fallback kernel? That would also mean replacing the last default option with num_functions - 1 (rather than fallback_option).

@lgarrison
Copy link
Collaborator Author

Yes, that's a good idea. But the fallback_offset index isn't just the isa number, unfortunately. We need to know the index in the function array. I'll have to think about whether there's a clean way to do that.

@manodeep
Copy link
Owner

manodeep commented Oct 13, 2019

The highest compiled kernel will always be at index 0. However, since the compiled kernels might not correspond to the runtime available isa, the required index will be different. Would something like this work:

int highest_available_isa = fallback_offset;
int num_available_isa = 0;
#ifdef __SSE4_2__
if (iset >= 6) {
    highest_available_isa = sse_offset;
    num_available_isa++;
}
#endif
#ifdef __AVX__
if (iset >=7) {
    highest_available_isa = avx_offset;
    num_available_isa++;
}
#endif 
#ifdef __AVX2__
if (iset >=8) {
    highest_available_isa = avx2_offset;
    num_available_isa++;
}
#endif 
#ifdef __AVX512F__
if (iset >=9) {
    highest_available_isa = avx512_offset;
    num_available_isa++;
}
#endif 

if(num_available_isa != num_functions) {
    fprintf(stderr,"Warning: Number of kernels with both runtime and compile time support = %d while the number of compiled kernels =%d\n", num_available_isa, num_functions);
}

Completely untested, uncompiled code but you get the idea.

@lgarrison
Copy link
Collaborator Author

Just pushed another version, does it look okay? I will port it to the other CFs if so. I tested it, and it seems to work.

Copy link
Owner

@manodeep manodeep left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

The default isa is set in defs.h to the highest isa available at compile time. If the runtime cpu is different, will the code still use the next available isa?

@manodeep manodeep modified the milestones: v2.3.2, v2.4.0 Oct 22, 2019
@manodeep
Copy link
Owner

@lgarrison Pinging. We should be able to release v2.3.2 with this and the few small changes in the docs.

@lgarrison
Copy link
Collaborator Author

Agreed! The porting to the other CFs will take an hour or so, I'll try to finish it on Thursday or Friday.

* Use ISA enums instead of magic numbers
* Fix "instrinsics" typo
* Port new ISA dispatch to all CFs
@lgarrison
Copy link
Collaborator Author

@manodeep All done! Can you look over the diff one more time? If Travis passes, I think we're ready to merge & release.

@manodeep
Copy link
Owner

manodeep commented Dec 3, 2019

@lgarrison I left a couple of minor comments.

Also, will you please confirm that get_max_usable_isa returns the fastest available function (or prints appropriate warning) for the following 4 combinations between (compiler_unsupported_isa, compiler_supported_isa) and (runtime_unsupported_isa, runtime_supported_isa)?

@lgarrison
Copy link
Collaborator Author

// cpu_features_test.c
#include <stdio.h>
#include "utils/cpu_features.c"

int main(void){
	printf("get_max_usable_isa(): %d\n", get_max_usable_isa());
	return 0;
}

Compiler supported, runtime supported:

$ gcc -mavx2 -std=c99 cpu_features_test.c -o cpu_features_test -I$HOME/corrfunc
$ ./cpu_features_test 
get_max_usable_isa(): 8

Compiler supported, runtime unsupported:

$ gcc -mavx512f -std=c99 cpu_features_test.c -o cpu_features_test -I$HOME/corrfunc
$ ./cpu_features_test 
get_max_usable_isa(): 8

This is right because -mavx512f implies -mavx2 and lower.

Compiler unsupported, runtime supported:

$ gcc -mavx -std=c99 cpu_features_test.c -o cpu_features_test -I$HOME/corrfunc
$ ./cpu_features_test 
[Warning] The CPU supports AVX2 but the compiler does not.  Can you try another compiler?
get_max_usable_isa(): 7

Look okay?

@manodeep
Copy link
Owner

manodeep commented Dec 8, 2019

@lgarrison This is on my list for today

@manodeep
Copy link
Owner

manodeep commented Dec 8, 2019

@lgarrison Everything looks good. I have one suggestion - we should move the instruction set enums into cpu_features.h and include cpu_features.h within defs.h. That way cpu_features.h does not need to include defs.h.

We should probably also rename defs.h to corrfunc.h :) (not as part of this PR, perhaps for v3.0)

@lgarrison
Copy link
Collaborator Author

Done!

And I agree with the renaming suggestion for v3.0.

@manodeep
Copy link
Owner

manodeep commented Dec 9, 2019

@lgarrison Should I go ahead and "squash-merge"? Was there anything else to update on this PR?

@lgarrison
Copy link
Collaborator Author

Yes, I think it is ready to squash & merge!

@manodeep manodeep merged commit ae2e4ba into master Dec 9, 2019
@manodeep manodeep deleted the isa_dispatch branch December 18, 2019 22:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants