-
Notifications
You must be signed in to change notification settings - Fork 264
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
More efficient and unlimited implementation of fluid_ct2hz() #569
Conversation
This reverts commit ef9e081. "Although ldexp, scalbn and scalbln are specified to perform the operation efficiently, on many implementations they are less efficient than multiplication or division by a power of two using arithmetic operators." Which was proven by using VTune.
Thanks for this implementation. Just some details: if(FLUID_UNLIKELY(res.quot I wonder if removing else branch will enhance performance ? |
Oh, you're right, thanks. And no, removing the else keyword does not affect performance, as it results in the same assembly being generated. |
Something now sound wrong. |
Sorry, please ignore my previous comment. All sound fine. (I had compiled a wrong gen_conv.c) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! I've tried to measure the performance gain with profiling on my target platform (ARM Coretex A7 - 1GHz) but could not notice any speed improvements with the built-in profiling feature. The range of "max estimated voices" between different profiling runs spans 146 - 150, both on master and this branch. Would be interesting to see what this does to less capable platforms.
I'm not so sure about the fallback to on-the-fly calculation though. From a sound perspective, I don't think we gain anything from calculating those extremely high frequencies. But they could still happen if a Soundfont designer adds a huge amount into the (mod|vib)-lfo-to-frequency generator. Now I know that doing so would not make any sense, but neither does calculating those frequencies to exact values using lots of CPU cycles. So I would vote to choose a high cutoff and return a fixed high frequency instead.
Sorry, need to revise this (I worked in the wrong directory and accidentally tested an older master checkout in both cases). Now with the proper branches, these are the real profiling outputs: estimated maxVoices
Below are some more details. I chose the slowest run on master and the fastest run on new-ct2hz. Not sure if there are any other performance improvements on current master compared to new-ct2hz, but at least on my platform, profiling suggests that it is a slight performance loss rather than an improvement. Maybe relevant: I'm compiling with master
new-ct2hz
|
The culprit seems to be the |
Thanks for testing on ARM.
That CPU has a proper FPU and dynamic branch prediction. Doesn't surprise me that your test don't show a real difference. @carlo-bramini for instance has a Cortex M4, maybe things look different for him.
Oh, you are right.
Quite disappointing that stdlib calls like
Thinking about it again, I agree. With this new implementation we have a maximum of 38100 absolute cents, which is 29,527,900,160 Hz . Nothing will ever reach this. Inaudible anyway, so yeah, use an upper limit... Now the question is where to check that upper limit?
|
turned out to be expensive
Yes, why not? It's not any worse than the current 1.0Hz fallback, and due to the higher cents cutoff actually a little better. |
Excuse me, since
Since the divisor is a constant value known at compile time, the compiler simplifies the calculation of the divisor and the reminder by just multiplying for its reciprocal. |
Should be fine now. Unless smb. objects, I would merge this. |
👍 |
Please, as |
If fac becomes 32 or more, the result of the shift would be zero. The |
Oh yes, ofc. |
Maybe it would be good to extend the comment above that line to explain this? |
Done. |
I forgot that |
While working on an unlimited implementation of
fluid_ct2hz()
as discussed in #568, I'm now providing a less branchy and therefore more instruction-cache-friendly version,which has no upper limit (like the old one had). Also, it significantly reduces the number of floating-point comparisons, which is esp. beneficial for non-FPU systems.Using Intel's VTune Amplifier gives the following results.
ldexp()
ldexp()
scalbn()
andldexp()
turn out to be less efficient than arithmetic operatorsJust added a
test_ct2hz
unit test to master to make sure this change doesn't cause trouble.Are lookup tables even contemporary? Yes, they are!