Skip to content

symbolic_regression_part2

Manlio Morini edited this page May 30, 2024 · 13 revisions

Symbolic regression - Custom evaluator

...that is great BUT my problem needs a particular evaluator / requires a unique data access technique / has a peculiar way of doing things.

No problem at all, you can customize the evaluator!

Toy problem

Given a, b and c find a function f such that $a = b * f(c)$.

Probably this isn't of immediate interest, yet it's useful to illustrate a trait that may be shared by other, more complicated, problems and as a way to explain a more general problem solving technique.

Setting up code

const double a = ultra::random::between(-10.0, 10.0);
const double b = ultra::random::between(-10.0, 10.0);

a and b get two fixed, random values.

c is somewhat different: it's a terminal. Terminal and function sets are the alphabet of the to-be-evolved-program (f). The terminal set consists of the variables and the constants.

For our problem c is the only terminal required (in general we also add some numbers):

class c : public ultra::terminal
{
public:
  c() : ultra::terminal("c") {}

  [[nodiscard]] value_t instance() const noexcept final
  {
    static const double val(ultra::random::between(-10.0, 10.0));
    return val;
  }
};

The constructor (c() : ultra::terminal("c") {}) sets the name of the terminal (used for displaying purpose).

The instance function returns a fixed random value.


int main()
{
  using namespace ultra;

  problem prob;

  // SETTING UP SYMBOLS
  prob.insert<c>();          // terminal
  prob.insert<real::add>();  // functions
  prob.insert<real::sub>();
  prob.insert<real::mul>();

  // ...
}

Note how the base problem class is used instead of the derived src::problem. src::problem has a lot of ready-to-be-used functionalities (dataframes for training and validation, evaluator functions for scoring a candidate solution...) but problem is more general and adaptable to different tasks (not only symbolic regression / classification).

Besides the terminal c we use the functions add, sub, mul as building blocks (function set).


https://xkcd.com/534/

Now only the evaluator (aka fitness function) is missing:

using candidate_solution = ultra::gp::individual;

// Given an individual (i.e. a candidate solution of the problem), returns a
// score measuring how good it is.
[[nodiscard]] double my_evaluator(const candidate_solution &x)
{
  using namespace ultra;

  const auto ret(run(x));

  const double f(has_value(ret) ? std::get<D_DOUBLE>(ret) : 0.0);

  const double model_output(b * f);

  const double delta(std::fabs(a - model_output));

  return -delta;
}

candidate_solution is just an alias for gp::individual; gp::individual is a linear representation (Straight Line Program) used in genetic programming.

A line by line description of the evaluation process follows:

const auto ret(run(x));

Simply gets and stores the output of the candidate_solution.

ret is a std::variant (see value_t for further details).

Variants allow efficient manipulation of different data types: here we're working with real numbers but Ultra also supports integers and strings.

const double f(has_value(ret) ? std::get<D_DOUBLE>(ret) : 0.0);

std::get<D_DOUBLE>(ret) extracts the real number from the variant.

The user must check the variant for empty state (has_value(ret)): it's required since the evolution process generates many nefarious individuals that could blow up for specific input values.

const double model_output(b * f);

const double delta(std::fabs(a - model_output));

delta is a measure of the error based on the absolute value. Different norms may give better results (problem dependent).

return -delta;

The last instruction can be confusing: -delta is used since Ultra uses standardized fitness (greater is better) not raw fitness. See the comments in fitness.h.


All that remains is to put the pieces together:

int main()
{
  // ...

  // AD HOC EVALUATOR
  search s(prob, my_evaluator);

  // SEARCHING
  const auto result(s.run());

  std::cout << "\nCANDIDATE SOLUTION\n"
            << out::c_language << result.best_individual
            << "\n\nFITNESS\n" << *result.best_measurements.fitness << '\n';
}

The search object (s) is instructed to use our evaluator before being launched (s.run()).

(for your ease all the code is in the examples/symbolic_regression/symbolic_regression03.cc file)

PROCEED TO PART 3 →

Ultra

Highlights

Clone this wiki locally