Skip to content
Eric Pugh edited this page Feb 13, 2020 · 5 revisions

The following picture illustrates the RRE Domain Model, which as you can see has been organized into a composite/tree-like structure where the relationship between each entity is always 1 to many.

domain_model

Apart the top level entity, which represents an evaluation instance and it acts just as a container, the other entities are:

Corpus

An evaluation process can involve more than one dataset, targeting a given search platform. Within the RRE context the following terms are considered synonyms: corpus, dataset, test collection.
Each corpus must be located under the corpora configuration folder and it is then referenced in one or more ratings file.
The internal format depends on the target search platform (see What We Need To Provide for details about the format).

Topic

Within a corpus, we can have zero or more topics which map the user information need we want to satisfy with the search system. This is a logical, business-level entity which usually doesn't correspond to what we know as a "query".

This element is optional: if your domain model doesn't organise query groups in topics, then you can omit the topic level; RRE will create an "unnamed" topic node which will act as logical parent of all query groups.

Query Group

Instead of modelling the query level as a direct child of topics, RRE provides a further abstraction layer called "Query Group", which basically is a group of queries that are supposed to produce the same results. So here we can group a source query with several variants, for testing things like lowercasing, diacritics normalisation, stemming.

This element is optional: if your domain model doesn't organise queries in query groups and topics, then you can omit these levels; RRE will create an "unnamed" topic node and an "unnamed" query group node, which will act as logical parents of all queries.

Query

At query level we have to declare the query shape which will get executed on the target search platform. It's possible to define several variants/forms of the same query.

For example, you can define at this level several queries like "Weather Report", "WEATHER REPORT", "Weather reports" because they are supposed to produce the same result (i.e. their results will be computed against the same relevant documents set).

Version

A query will be executed n times, where n is the number of versions of our system. For example, if we have three version (v1.1, v1.2 and v1.3), the same query will be executed three times, once for each configuration version. Metrics are primarily computed at this level and as consequence of that, each version will have one or more metrics associated.

The RRE engine allows to declare which version should be included or excluded from the evaluation process. Those constraints can be passed to the engine programmatically (if you're using RRE in your code) or through the runtime container (e.g. Maven plugin).

Metric

The leaf entity is a metric, computed after executing a given query against a given system version.

Metrics (see the vertical dashed lines) are primarily bound at query/version level but RRE aggregates their values also at upper levels (at query group, at topic and at corpus level), using an aggregation function (at the moment the arithmetic mean). So at the end, each entity will have, a multivalued metric (one for each version). The benefit of having a composite structure is clear: we can see a metric value at different levels (e.g. a query, all queries belonging to a query group, all queries belonging to a topic or at corpus level)

domain_model_with_metrics