You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I just read the original paper introducing Delta Debugging. It is a very good read and surprising that it is not more well known. It can be coerced into doing exactly what FLiT bisect sets out to accomplish. It looks for a minimal set of items that still causes the test function to return false. This can be potentially used in one of the following ways (and perhaps more):
Have a test that returns false only when you get the exact bad answer (all bad files)
Have a test hat returns false only when you get the exact good answer (all good files)
Have a test that returns false if there is any difference from the good answer, find the minimal set, remove, rinse, and repeat (similar to the bisection algorithm used now)
Numbers 1 and 2 will not necessarily produce the same files. In fact, if there is a change that requires the optimization to be done to two files for it to be noticeable, then the 2nd approach will have one of the two bad files in the list of good files, which would be misleading.
I'm thinking number 1 will be the most performant to find all necessary optimization effects. But we can compare numbers 1 and 3 together and with the existing bisect implementation. The advantage of doing number 3 is that you will identify which files are independent (i.e. cause a problem by themselves), and which ones need to be optimized together (in pairs of 2 or 3 or more).
The Delta Debugging algorithm can be defined by the following:
Let C be the set of full input that causes a function test() to return failure, where test(<empty>) returns success. The Delta Debugging algorithm determines an approximation to the smallest c that is a subset of C that causes test() to still fail, in such a way that if any element of c were removed, the test would not fail.
set ddmin(set C, function test) {
returnddmin_recursive(C, test, 2);
}
set ddmin_recursive(set C, function test, int n) {
list<set> deltas = split_set(C, n); // split into n almost equal pieces// Try reduce to subsetfor (auto delta : deltas) {
if (test(delta) == false) {
returnddmin_recursive(delta, test, 2);
}
}
// Try reduce to complementfor (auto delta : deltas) {
set complement = C.set_minus(delta);
if (test(delta) == false) {
returnddmin_recursive(complement, test, max(n-1, 2));
}
}
// Try increase granularityif (n < C.size()) {
returnddmin_recursive(C, test, min(C.size(), 2*n))
}
// Otherwise, donereturn C;
}
An optimization mentioned in the paper is that the test() function should memoize the return values for specific inputs as it may end up testing the same thing more than once. For those who do not know what memoization is, it is caching previous input/output pairs and returning the previously returned output for an already seen input.
The text was updated successfully, but these errors were encountered:
I just read the original paper introducing Delta Debugging. It is a very good read and surprising that it is not more well known. It can be coerced into doing exactly what FLiT bisect sets out to accomplish. It looks for a minimal set of items that still causes the test function to return false. This can be potentially used in one of the following ways (and perhaps more):
Numbers 1 and 2 will not necessarily produce the same files. In fact, if there is a change that requires the optimization to be done to two files for it to be noticeable, then the 2nd approach will have one of the two bad files in the list of good files, which would be misleading.
I'm thinking number 1 will be the most performant to find all necessary optimization effects. But we can compare numbers 1 and 3 together and with the existing bisect implementation. The advantage of doing number 3 is that you will identify which files are independent (i.e. cause a problem by themselves), and which ones need to be optimized together (in pairs of 2 or 3 or more).
The Delta Debugging algorithm can be defined by the following:
Let
C
be the set of full input that causes a functiontest()
to return failure, wheretest(<empty>)
returns success. The Delta Debugging algorithm determines an approximation to the smallestc
that is a subset ofC
that causestest()
to still fail, in such a way that if any element ofc
were removed, the test would not fail.An optimization mentioned in the paper is that the
test()
function should memoize the return values for specific inputs as it may end up testing the same thing more than once. For those who do not know what memoization is, it is caching previous input/output pairs and returning the previously returned output for an already seen input.The text was updated successfully, but these errors were encountered: