-
Notifications
You must be signed in to change notification settings - Fork 82
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[deltaE] Speedup of 7th powers in DeltaE2000 #340
Conversation
That’s very interesting. What's the cause of the slowdown? E.g. would |
The slowdown is due to the general purpose nature of Additionally, if you specialize the operation to reduce how many times you call multiply, you could potentially speed up performance a bit more. Some people don't bother as they don't require the absolute fastest, most optimized solution, but if it is something you care about, I imagine it would be useful. Usually, if I do care about performance, I often request people provide, data backing their performance claims. Like how they tested, and the results. Generally, I am inclined to believe this claim is faster just because I am familiar with this type of optimization, the question is whether you value the performance or the simplicity of just using |
I'm not suggesting saving nanoseconds. |
Funny, a couple weeks ago I happened to run some benchmarks out of curiosity, regarding the simplified Phi calc, and ran across this difference, and was rather surprised at the difference between Math.pow() and the ** operator. Safari and Chrome are similar, Safari shown: Clearly, the However, the thing I found most surprising, and I mean shocking, is that not only is the difference between methods not apparent in Firefox, Firefox is overall two orders of magnitutde faster in terms of ops per second: However, in terms of actual performance on web apps, I don't find Firefox to "feel" that much faster. It makes me wonder a bit about the way the operations per second is being reported out of Firefox, which I am guessing is what the benchmark site is using. In other words, is the Firefox ops per second reporting each individual CPU instruction per second (i.e. a fetch is one, a shift is one, etc etc), and Safari/Chrome are reporting completed JS operations per second (i.e. the start to finish I am curious, as that's how it appears... |
Hi @dom1n1k
But the |
Hi Issac @facelessuser
I would argue that |
I see, so it's about keeping multiplications down. Yeah, I couldn't get it below 4 with any other combination 😁
Yeah, and DE2000 is used iteratively a lot, e.g. in gamut mapping, so I'm totally on board with optimizing it. Presumably this is a temporary fix until browsers optimize this properly though. I wonder if it would be feasible to do this as a build process plugin? |
** is fastestI just did a benchmark of the function pow7() as written in the PR, and other methods. //The pow7() function
function pow7 (x) {
const x2 = x * x;
const x7 = x2 * x2 * x2 * x;
return x7;
} The benchmark: https://www.measurethat.net/Benchmarks/Show/28008/0/power-to-the-7 The results: |
I prefer |
It's probably not just about keeping multiplication combinations down. I imagine things like |
Further BenchmarksJust to evaluate completely, I optimized some of the methods, such as eliminating unneeded assignments. While it helped improve the function speed slightly, function pow7 (x) {
const x2 = x * x;
const x7 = x2 * x2 * x2 * x;
return x7;
}
// Slightly more optimized versions:
function pow27 (x) {
const x2 = x * x;
return x2 * x2 * x2 * x
}
function pow37 (x) {
const x3 = x * x * x;
return x3 * x3 * x
}
function powMult7 (x) {
return x*x*x*x*x*x*x
}
Edit to add the URL for this bench is https://www.measurethat.net/Benchmarks/Show/28008/1/to-the-7th-power-v2 |
So, this is weird, because on my machine, the But I am also wondering re the methodology used that returned the entire ∆E2000 function 20% faster, when this is a small part of the total function, and I'm not seeing anything that would indicate such a savings? |
I don't recommend trusting microbenchmarks. They often lie because the compiler removes "dead" code. My code is below. import Color from "./src/index.js";
import { deltaE2000 } from "./src/deltaE/index.js";
const k = 11;
const n = k ** 3;
const colors = [];
let accum = 0;
// generate array of 11 ** 3 = 1331 colors
for (let r = 0; r < k; r++) {
for (let g = 0; g < k; g++) {
for (let b = 0; b < k; b++) {
const rgbColor = new Color("prophoto", [
r / (k - 1),
g / (k - 1),
b / (k - 1)
]);
const labColor = rgbColor.to("lab");
colors.push(labColor);
}
}
}
// start
const t0 = performance.now();
// each to each color, 1331 ** 2 = 1771561 pairs
for (let i = 0; i < n; i++) {
const color1 = colors[i];
for (let j = 0; j < n; j++) {
const color2 = colors[j];
// we must use the result
// otherwise the compiler will remove the "dead" code
accum += deltaE2000(color1, color2);
}
}
// finish
const t1 = performance.now();
// we must use the result
// then the compiler knows it's working code
console.log(`result: ${accum}`);
console.log(`time: ${t1 - t0} ms`); |
Hi @dom1n1k
I had assignments so the code would be exercised... But it does make me wonder if somehow that is what's going on with FireFox—I don't use FF so haven't really looked into those differences. I'm going to try running in Node instead, node handles things differently than browsers & could be the reason for the discrepancy. Also, I did not test with all the machines I have, so there are a number of influences to look into. |
Hi @Myndex Meanwhile, if you want to see isolated benchmark, we can. But we need to consider a few conditions:
I got difference about 15-25 times in FF and 40-50 times in Chrome/Node <!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>pow7() benchmark</title>
<script type="text/javascript" defer src="pow7-bench.js"></script>
</head>
<body>
<h1>Math.pow(), ** and pow7() benchmark</h1>
<p>Please see console</p>
</body>
</html> const n = 1e7; // number of operations
const x1 = 0;
const x2 = 10;
const dx = (x2 - x1) / n;
function pow7 (x) {
const x2 = x * x;
const x7 = x2 * x2 * x2 * x;
return x7;
}
function testStarStar () {
let x = x1;
let accum = 0;
while (x < x2) {
accum += x ** 7;
x += dx;
}
return accum;
}
function testMathPow () {
let x = x1;
let accum = 0;
while (x < x2) {
accum += Math.pow(x, 7);
x += dx;
}
return accum;
}
function testCustomPow7 () {
let x = x1;
let accum = 0;
while (x < x2) {
accum += pow7(x);
x += dx;
}
return accum;
}
function testMuls () {
let x = x1;
let accum = 0;
while (x < x2) {
accum += x * x * x * x * x * x * x;
x += dx;
}
return accum;
}
const funcs = [
testStarStar,
testMathPow,
testCustomPow7,
testMuls
];
const F = funcs.length; // number of functions
const R = 5; // number of runs
const accums = Array(F);
const times = Array(F);
// benchmarks
for (let f = 0; f < F; f++) {
const func = funcs[f];
accums[f] = 0;
times[f] = [];
for (let r = 0; r < R; r++) {
const t0 = performance.now();
const result = func();
const t1 = performance.now();
accums[f] += result;
times[f].push(t1 - t0);
}
}
// print median times
for (let f = 0; f < F; f++) {
times[f].sort((a, b) => a - b)
const medianTime = times[f][Math.floor(R / 2)];
console.log(f);
console.log(`result: ${accums[f]}`); // we must use results
console.log(`times: ${ Math.round(100 * medianTime) / 100 } ms`);
} |
WoWwwww....@dom1n1k Thank you for your patience and the extra info... I am surprised at the number of variations here... All done on MacOS, intel macbookpro Here is Safari And here indeed your function is fastest. But the part that is jaw dropping is Chrome Srsly wut? Here I can see how you had better overall results with the total DE.... Firefox is the slowest That's not good... And then Opera, predictably like Chrome: Surprising variation... and whats the deal with the console? just not optimized? because running that script in the console returns very different speeds. This is also true for BBEdit even when running as the webpage... Thank you again for your extra time, appreciated. |
Highly likely yes. Code in console often goes to different pipeline. |
interesting that |
Normally |
Interesting, I seem to remember reading something some years ago indicating otherwise, or some ambiguity. |
The current version uses the
Math.pow
function and the**
operator, which translates into same function.We could hope that JIT optimizes integer powers. But, as practical experiments show, it doesn't work at the moment.
Math.pow
with a float argument is quite expensive. Мuch more expensive thansqrt
orsin
e.g.I wrote a small helper function
pow7
and applied it in only 2 places.On my computer it gives a speedup from ~10% (FF 118) to ~20% (Chromium 118, Node 21) — meaning total time of
deltaE2000
function, not isolated power.