home |
copyright ©2016, tim@menzies.us
overview |
syllabus |
src |
submit |
chat
Sooner or later, you are going to ask "is GA/DE/PSO better than GA/DE/PSO" etc. The answer will be model dependent so, for each new model you look at, you are going to have to do some kind of analysis.
How?
Well, first you need:
- Models to act as case studies;
- Optimizers, to be compared;
- A performance measure collected on running an optimizer on a model.
- Some way to compare measures. Note that, to be defensible, this comaprison method has to be approved by the international community.
- A rig that runs the models and optimizers multiple times (why?) which collects the performance measure.
- Tools to run all the above.
This talk is about 1,2,3.
See stats.md for notes on 4,5.
We have
GA, SA,PSO, DE, etc etc.
In the following 2d optimization space, which optimizer do you like?
- Hint: we want to minimize dollars and maximize Q(W)
The volume inside the paretor frontiers is called the hypervolume and the optimizer we like has the greatest hypervolume (in this case, the green). Note that there many hypervolume calculators, some of which has problems when the number of objectives get very large.
Note that the more the hypervolume, the better sicne this means this optimizer is most approaching "heaven".
In the following 2d optimization space, which optimizer do you like?
- Hint: we want to minimize all objectives
- The left hand side plot seems to have lower hypervolumes. But is there anything we do not like about the middle and right-hand-side plots?
The middle and right-hand side solutions are not very spread (huge gaps in the frontier).
Calculating spread:
- Ignoring everything except the Pareto frontier.
- Find the distances to the last two most distance points to their nearest neighbor: df and dl
- Find the distance between all pints and their nearest neighbor di and
their nearest neighbor
- Then:
- If all data is maximally spread, then all distances di are near mean d which would make Δ=0 ish.
Note that less the spread of each point to its neighbor, the better since this means the optimiser is offering options across more of the frontier.
Which is better? Spread or hypervolume? What if they conclude different things? What if they are insanely slow to calculate? What if there was a better measure?
IGD = inter-generational distance; i.e. how good are you compared to the best known?
- Find a reference set (the best possible solutions)
- For each optimizer - For each item in its final Pareto frontier - Find the nearest item in the reference set
Note that the less the mean IGD, the better the optimizer since this means its solutions are closest to the best of the best.
Pragmatics:
- I like expressing IGD as a ratio of the raw IGD scores of population0 from generation0 to the reference point. This gives a nice baseline effect for understanding the numbers.
Details:
-
Problem1: Optimal reference set may be unobtainable (if the model is very nasty). - Solution1: Let every optimizer work on populations of size "N" - Let the combined Pareto frontier from "a" optimizers, removing duplicates. - Down select those "aN" items to the the best "N" best ones. - Use the resulting space as the reference set
-
Problem2: How to remove duplicates? - Solution2a: exact match on decisions (may not be v.useful for real-valued decisions) - Solution2b: from the business users, find the minimum value ε that they can control each decision. Declare two decisions same if they are within ε.
-
Problem3: How to down select? - Solution3: count how many times each item in "aN" dominates something else. - Keep just the "N" items with highest domination count.
-
Problem3a: with binary domination, many things may have the highest domination count, especially when dealing with high dimensional objections. - Solution 3a1: Delete at random from most crowded parts of the Pareto frontier. Why? Cause in crowded spaces, many decisions give rise to the same objective scores. - Solution 3a2: Don't use binary domination. Instead, use continuous domination since, usually, cdom rejects one item in the comparison. So in this approach, sort each item by the sum of how much it losses to everyone else. They pick the "N" that lose least.
-
Problem 3a1a: How to compute "crowded" - Select all candidates that dominate the most number of other candidates. - For that set, sort each candidate separately on each objective. - On each objective Oi, compute the distance left and right to its nearest neighbor - Let the cuboid around a candidate Vx be the product Vx = ∏iOi - Sort the candidates descending by Vx. - Return the left-most "N" items in that sort.
Summary:
optimal known?
yes: use it
no:
combine frontiers from all optimizers
remove duplications
epsilon known?
yes: use near match
no: use exact match
downSelect to "N" items
use binary domination
yes:
count how often each one dominates another
select candidates that dominate the most
selection > "N"
no: use selection
yes:
sort descending by cubiod distance around them
use first 1.."N"
no:
sort each, ascending, from the sum of its losses to all other
use first "N"
Candidate one dominates candidate two:
- if at lease one objective score is better;
- and none are worse.
Note that in the following, each objective knows if it wants to be minimized or maximized.
def more(x,y): return x > y
def less(x,y): return x < y
e.g. objective1.better = more
or objective2.better = less
then call the following.
def bdom(x, y, abouts):
"multi objective"
x = abouts.objs(x)
y = abouts.objs(y)
betters = 0
for obj in abouts._objs:
x1,y1 = x[obj.pos], y[obj.pos]
if obj.better(x1,y1) : betters += 1
elif x1 != y1: return False # must be worse, go quit
return betters > 0
Binary domination never reports that that one candidate is waaaaaay more dominated that the other. It only says "true". Not the most informative!
So that as the number of objectives increase, bdom losses to cdom.
What cdom does is that it takes the differences between each objective, then raises it to a exponential factor (so those differences SHOUT louder). From this we compute the mean loss as travel from this to that versus that to this (and the one we prefer is the one that _loss_es least).
Formally, this is a domination test across the Pareto frontier.
- First, we normalize x,y to 0..1
- Then we adjust the direction of the comparison depending on whether or not we are minimizing that objective.
- Third, we raise the differences x - y to some exponential (i.e. the larger the difference, the louder we shout!)
- Lastly, we return the mean loss over all objectives.
def (i): # return less for minimize and more for maximize
def norm(i,x): # returns (x - lo) / (hi - lo) where lo and hi
# are the min,max values for objective i
def better(this, that):
x = scores[ id(this) ]
y = scores[ id(that) ]
l1 = loss(x,y)
l2 = loss(y,x)
return l1 < l2 # this is better than that if this losses least.
def loss(x, y):
losses= 0
n = min(len(x),len(y))
for i,(x1,y1) in enumerate(zip(x,y)):
x1 = norm(i,x1) # normalization
y1 = norm(i,y1) # normalization
losses += expLoss( i,x1,y1,n )
return losses / n # return mean loss
def expLoss(i,x1,y1,n):
"Exponentially shout out the difference"
w = -1 if minimizing(i) else 1 # adjust for direction of comparison
return -1*math.e**( w*(x1 - y1) / n ) # raise the differences to some exponent