Skip to content

Commit

Permalink
Add ignore-case for text tolerance documentation (#415)
Browse files Browse the repository at this point in the history
  • Loading branch information
Cédric L. Charlier committed Oct 26, 2018
1 parent 8cf2452 commit bd7b7f0
Showing 1 changed file with 21 additions and 2 deletions.
23 changes: 21 additions & 2 deletions _documentation/compare-tolerances-roundings.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,11 +6,13 @@ next_section: compare-intervals
permalink: /docs/compare-tolerances-roundings/
---
# Tolerances

A tolerance can only be assigned to column defined as a *value* (meaning that a tolerance can't be applied to *key* columns) and, for the moment, is only supported for column defined with types *numeric*, *dateTime* and *text*.

## Tolerance for numeric values

### Absolute tolerance

If you apply a tolerance of 0.001 to a numeric column, two rows (with same keys) will be considered as equal if the absolute difference between them is less or equal to 0.001.

This is especially helpful in Business Intelligence when you need to compare two queries and you don’t bother about small differences. Another case in Business Intelligence is also when you customers give you some hints about values expected in reports, they are usually a bit imprecise.
Expand All @@ -34,6 +36,7 @@ If you want you can combine both notations. The definition provided in an xml el
must be interpreted as: columns 0, 1 and 4 are keys, column 2 must be ignored, column 3 is a value with a tolerance of 0.001 and column 5 is also a value but with a tolerance of 10.

### Relative tolerance

You can express a tolerance, relative to the expected value, by the means of the *%* symbol. This only applies to *numeric* values. When comparing the expected and actual value, the comparer will apply a tolerance of the percentage defined.

To illustrate this, if you've two rows with values of 40 and 100, an absolute tolerance of 10 will allow the actual values to be respectively in the intervals [30;50] and [90;110]. A relative tolerance of 10% will change the intervals to [36;44] and [90;110].
Expand All @@ -44,6 +47,7 @@ The xml syntax requires a % in the attribute *tolerance*.
{% endhighlight %}

### Absolute and relative tolerance

Sometimes, you want to use a relative tolerance but you also want to bound thsi tolerance and express that this tolerance cannot be more (or less) than a specific value. It's possible to achieve this by specifying the value in percentage and also the bound (min or max) between brackets for the xml attribute *tolerance*.

{% highlight xml %}
Expand All @@ -58,6 +62,7 @@ Sometimes, you want to specify that the tolerance should only be applied on the
{% endhighlight %}

## Tolerance for type "date and time"

If you want to specify the tolerance for *dateTime* columns, you must express the tolerance in days, hours, minutes, seconds and milliseconds. The correct syntax for two days and an half is
{% highlight xml %}
<column index="3" role="value" type="dateTime" tolerance="2.12:00:00" />
Expand All @@ -72,7 +77,18 @@ At the moment only *absolute tolerance* is supported and its not possible to app
Reminder, tolerances are only applied to *value* columns and never to *key* columns!

## Tolerance for type "text"
You can define a tolerance for a text. Following algorithms are supported:

### Ignore case

You can specify that two values defined in a column with type *text* should be compared without taking care of the case. To achieve this, you must define an attribute *tolerance* and set it to *ignore-case*.
{% highlight xml %}
<column index="1" role="value" type="text" tolerance="ignore-case"/>
{% endhighlight %}

### Fuzzy Matching

You can define that two values of type 'text' should be similar and not exactly the same. Following algorithms are supported:

* Hamming Distance
* Jaccard Distance
* Jaro Distance
Expand All @@ -99,6 +115,7 @@ If you wish, you can combine the different algorithms by seperating them with a
{% endhighlight %}

# Roundings

The roundings are another set of tools to express that two values are equal if they are close to each other. At the opposite of *tolerance*, *rounding* is applied to both expected and actual values. If after the rounding's operation, the two values are strictly equal then the comparison will be positive (and else negative).

The roundings are related to .Net methods [Round](http://msdn.microsoft.com/en-us/library/wyk4d9cy.aspx), [Floor](http://msdn.microsoft.com/en-us/library/e0b5f0xb.aspx) and [Ceiling](http://msdn.microsoft.com/en-us/library/zx4t0t48.aspx). The rounding rules will be the same that their corresponding equivalent in .Net.
Expand All @@ -109,6 +126,7 @@ The method used must be specified in the column xml definition by the means of v
{% endhighlight %}

### Numeric columns

Nevertheless, the rounding methods are different in same points to their .Net equivalent. Each of them require a _step_. If the step is less than 1, the rounding will be applied to the decimal part of the value.

**Example 1**: For a value of 10.52912 with a step of 0.1, the rounding will return
Expand Down Expand Up @@ -142,6 +160,7 @@ If the step is greater than 1, the rounding will be applied to the integer part
{% endhighlight %}

### Date and time columns

It's also possible to apply the same kind of roundings for dateTime columns. The rounding must be less than or equal to one day.

**Example 5**: For a value of 2013-10-17 14:47:00 with a step of "1" (day), the rounding will return
Expand All @@ -154,4 +173,4 @@ It's also possible to apply the same kind of roundings for dateTime columns. The

* with a _floor_ style : 2013-10-17 14:45:00
* with a _round_ style : 2013-10-17 14:45:00
* with a _ceiling_ style : 2013-10-17 15:00:00
* with a _ceiling_ style : 2013-10-17 15:00:00

0 comments on commit bd7b7f0

Please sign in to comment.