-
Notifications
You must be signed in to change notification settings - Fork 184
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve performance by changing default machine tuning options #25
Comments
Btw, the ugly hack that I am using:
If anyone has an idea of a less ugly hack, i.e. something that can be applied within meta-sunxi scope, and preferably within machine.conf, please let me know! Kristof |
Hi kristof, A year ago i had hardfp enabled for the whole layer by default. But when I present my meta to the angstrom mailing list, I get the answer that the hardfp/softfp must be a decision of the distro. And that i have to remove this option. So what we do in calaos distro is : https://github.com/calaos/calaos-os/blob/master/conf/local.conf#L48 I would prefer to enable that option by default instead of redefining it for each machine. But the argument of angstrom guys seems also correct. I don't really know what to do here. |
@naguirre Would be nice to document this somewhere (e.g. in the README), as people might not be aware of those options (I was not until very recently). Btw, is there any reason why you don't have "t" (thumb) enabled? |
I just read in
Thought I had read somewhere that that was also a speed improvement, but apparently not. EDIT:
[1] https://github.com/openembedded/oe-core/blob/master/meta/conf/machine/include/arm/feature-arm-thumb.inc |
Btw, good thread discussing thumb performance - http://stackoverflow.com/questions/1198176/arm-vs-thumb-performance-on-iphone-3gs-non-floating-point-code Guess that it indeed boils down to "what works best in my specific use case" - will need to do some tests :) |
I ran the linpack benchmark referenced at [1]
I do not get the quite dramatic improvements listed at [1] though - those results were obtained with more aggressive compiler options. Still, nice 20% improvement in this case, and likely to be relevant in more general use cases. Kristof |
In fact, even with the exact same compiler options as listed at [1], I still get only
Slightly better, but nowhere near the performance reported at [1]. Have others tried replicating those results? |
In fact, it seems that a more significant (and easier) change is just to change the CPU governor settings, as explained at [1]. With the recommended settings there, and neon-vfpv4 (but without the aggressive compiler options):
Nice! :) |
With:
|
With:
|
I'm happy to announce that a patch that includes the new tuning options supporting 'neon-vfpv4' has been merged upstream in oe-core, see [1]. This allows you to specify Kristof [1] http://git.yoctoproject.org/cgit.cgi/poky/commit/?id=e65422f0f79d6069a3312cb4a3d110ec809017ad |
I just noticed that I actually never really used thumb instructions. So yes, this reinforces the argument that, in practice, probably almost nobody uses thumb instructions (even when they might think they do). The trick to enforce thumb is to also set:
I might experiment with this later. [1] http://article.gmane.org/gmane.comp.handhelds.openembedded.core/47005 |
(1) DEFAULTTUNE = cortexa7hf-neon-vfpv4 & performance governor at 1080Mhz:
Image size: 147 MB (2) DEFAULTTUNE = cortexa7thf-neon-vfpv4 & ARM_INSTRUCTION_SET = thumb & performance governor at 1080Mhz (i.e. real thumb):
Image size: 149 MB Conclusion: thumb performance in this simple test is 0.3% slower, and size is 1.3% smaller. So the expected tendencies described earlier (minimal performance loss, more dense) are there, but are not significant (at least not in this simple linpackc test). Note: I ran these linpackc benchmarks multiple times, and posted one "representative" one - typically I had about 0.1% variation (200 KFlops) among consecutive runs. EDIT: corrected percentages |
README.md: added performance options (resolves #25)
Does anybody know how to solve this "bug": https://bugzilla.yoctoproject.org/show_bug.cgi?id=7275 |
It seems to be a problem, could you please open an issue ? |
Note: I am focusing on Cubieboard2 in this post, as that is the board I own and can test on, but this should be relevant to other boards as well
Currently our cubieboard2 machine.conf file [1] falls back on the default tuning option specified in arch-armv7a.inc [2]:
This boils down to the following compiler options:
That is really the lowest performance option, and cubieboard2 is capable of more than that.
Specifically, it supports NEONv2, VFPv4 and Thumb-2 (see [3] and [4] ) - but the default tuning file does not take advantage of that.
I'd propose to set the default tune to something that supports all the capabilities of the Allwinner A20 chip, i.e.:
resulting in
Note that this still does not take advantage of the NEONv2/VFPv4 capabilities of the Allwinner A20 - for that we'd need
-mfpu=neon-vfpv4
[5]. I am currently using an ugly hack to force this compile option in my builds, and opened a request upstream to add this ([6]).I'll try to run some benchmarks comparing the default with the proposed tuning options above, to put some data behind this, and get an idea of how big the difference is really.
In the meantime, all comments welcome.
Thanks!
Kristof
[1] https://github.com/linux-sunxi/meta-sunxi/blob/master/conf/machine/cubieboard2.conf
[2] https://github.com/openembedded/oe-core/blob/master/meta/conf/machine/include/tune-cortexa7.inc
[3] http://linux-sunxi.org/Allwinner_SoC_Family
[4] http://wits-hep.blogspot.fr/2013/12/fftw-benchmarks-on-cortex-a7.html
[5] http://gcc.gnu.org/onlinedocs/gcc/ARM-Options.html
[6] https://bugzilla.yoctoproject.org/show_bug.cgi?id=5710
The text was updated successfully, but these errors were encountered: