Skip to content

Latest commit

 

History

History
268 lines (199 loc) · 9.24 KB

perform.md

File metadata and controls

268 lines (199 loc) · 9.24 KB

home | copyright ©2016, tim@menzies.us

overview | syllabus | src | submit | chat


What Optimizer is Best?

Sooner or later, you are going to ask "is GA/DE/PSO better than GA/DE/PSO" etc. The answer will be model dependent so, for each new model you look at, you are going to have to do some kind of analysis.

How?

Well, first you need:

  1. Models to act as case studies;
  2. Optimizers, to be compared;
  3. A performance measure collected on running an optimizer on a model.
  4. Some way to compare measures. Note that, to be defensible, this comaprison method has to be approved by the international community.
  5. A rig that runs the models and optimizers multiple times (why?) which collects the performance measure.
  • Tools to run all the above.

This talk is about 1,2,3.

See stats.md for notes on 4,5.

Models

We have

Optimizers

GA, SA,PSO, DE, etc etc.

Performance Measures

HyperVolume

In the following 2d optimization space, which optimizer do you like?

  • Hint: we want to minimize dollars and maximize Q(W)

The volume inside the paretor frontiers is called the hypervolume and the optimizer we like has the greatest hypervolume (in this case, the green). Note that there many hypervolume calculators, some of which has problems when the number of objectives get very large.

Note that the more the hypervolume, the better sicne this means this optimizer is most approaching "heaven".

Spread

In the following 2d optimization space, which optimizer do you like?

  • Hint: we want to minimize all objectives
  • The left hand side plot seems to have lower hypervolumes. But is there anything we do not like about the middle and right-hand-side plots?

The middle and right-hand side solutions are not very spread (huge gaps in the frontier).

Calculating spread:

  • Ignoring everything except the Pareto frontier.
  • Find the distances to the last two most distance points to their nearest neighbor: df and dl
  • Find the distance between all pints and their nearest neighbor di and their nearest neighbor
    • Then:

  • If all data is maximally spread, then all distances di are near mean d which would make Δ=0 ish.

Note that less the spread of each point to its neighbor, the better since this means the optimiser is offering options across more of the frontier.

IGD

Which is better? Spread or hypervolume? What if they conclude different things? What if they are insanely slow to calculate? What if there was a better measure?

IGD = inter-generational distance; i.e. how good are you compared to the best known?

  • Find a reference set (the best possible solutions)
  • For each optimizer - For each item in its final Pareto frontier - Find the nearest item in the reference set

Note that the less the mean IGD, the better the optimizer since this means its solutions are closest to the best of the best.

Pragmatics:

  • I like expressing IGD as a ratio of the raw IGD scores of population0 from generation0 to the reference point. This gives a nice baseline effect for understanding the numbers.

Details:

  • Problem1: Optimal reference set may be unobtainable (if the model is very nasty). - Solution1: Let every optimizer work on populations of size "N" - Let the combined Pareto frontier from "a" optimizers, removing duplicates. - Down select those "aN" items to the the best "N" best ones. - Use the resulting space as the reference set

  • Problem2: How to remove duplicates? - Solution2a: exact match on decisions (may not be v.useful for real-valued decisions) - Solution2b: from the business users, find the minimum value ε that they can control each decision. Declare two decisions same if they are within ε.

  • Problem3: How to down select? - Solution3: count how many times each item in "aN" dominates something else. - Keep just the "N" items with highest domination count.

  • Problem3a: with binary domination, many things may have the highest domination count, especially when dealing with high dimensional objections. - Solution 3a1: Delete at random from most crowded parts of the Pareto frontier. Why? Cause in crowded spaces, many decisions give rise to the same objective scores. - Solution 3a2: Don't use binary domination. Instead, use continuous domination since, usually, cdom rejects one item in the comparison. So in this approach, sort each item by the sum of how much it losses to everyone else. They pick the "N" that lose least.

  • Problem 3a1a: How to compute "crowded" - Select all candidates that dominate the most number of other candidates. - For that set, sort each candidate separately on each objective. - On each objective Oi, compute the distance left and right to its nearest neighbor - Let the cuboid around a candidate Vx be the product Vx = ∏iOi - Sort the candidates descending by Vx. - Return the left-most "N" items in that sort.

Summary:

optimal known?
  yes: use it
  no:
    combine frontiers from all optimizers
    remove duplications
      epsilon known?
        yes: use near match
        no: use exact match
    downSelect to "N" items
      use binary domination
        yes:
          count how often each one dominates another
          select candidates that dominate the most
          selection > "N"
            no: use selection
            yes:
              sort descending by cubiod distance around them
              use first 1.."N"
        no:
          sort each, ascending, from the sum of its losses to all other
          use first "N"

Binary Domination

Candidate one dominates candidate two:

  • if at lease one objective score is better;
  • and none are worse.

Note that in the following, each objective knows if it wants to be minimized or maximized.

def more(x,y): return x > y
def less(x,y): return x < y

e.g. objective1.better = more or objective2.better = less then call the following.

def bdom(x, y, abouts):
  "multi objective"
  x = abouts.objs(x)
  y = abouts.objs(y)
  betters = 0
  for obj in abouts._objs:
    x1,y1 = x[obj.pos], y[obj.pos]
    if obj.better(x1,y1) : betters += 1
    elif x1 != y1: return False # must be worse, go quit
  return betters > 0

Continuous Domination

Binary domination never reports that that one candidate is waaaaaay more dominated that the other. It only says "true". Not the most informative!

So that as the number of objectives increase, bdom losses to cdom.

What cdom does is that it takes the differences between each objective, then raises it to a exponential factor (so those differences SHOUT louder). From this we compute the mean loss as travel from this to that versus that to this (and the one we prefer is the one that _loss_es least).

Formally, this is a domination test across the Pareto frontier.

  • First, we normalize x,y to 0..1
  • Then we adjust the direction of the comparison depending on whether or not we are minimizing that objective.
  • Third, we raise the differences x - y to some exponential (i.e. the larger the difference, the louder we shout!)
  • Lastly, we return the mean loss over all objectives.
def (i):      # return less for minimize and more for maximize
def norm(i,x): # returns (x - lo) / (hi - lo) where lo and hi
               # are the min,max values for objective i

def better(this, that):
  x  = scores[ id(this) ]
  y  = scores[ id(that) ]
  l1 = loss(x,y)
  l2 = loss(y,x)
  return l1 < l2 # this is better than that if this losses least.

def loss(x, y):
  losses= 0
  n = min(len(x),len(y))
  for i,(x1,y1) in enumerate(zip(x,y)):
    x1 = norm(i,x1) # normalization
    y1 = norm(i,y1) # normalization
    losses += expLoss( i,x1,y1,n )
  return losses / n  # return mean loss

def expLoss(i,x1,y1,n):
  "Exponentially shout out the difference"
  w = -1 if minimizing(i) else 1 # adjust for direction of comparison
  return -1*math.e**( w*(x1 - y1) / n ) # raise the differences to some exponent