Covering both missing rows gaps and invalid values anomalies
Table of Contents
Anomaly Detection Tool Covers the Following Scenarios;
- Gaps Anomaly Detection : When tables should have row each
dt
time, and we want to discover missing rows - Range Anomaly Detection : When specific column should have values within a given range, and we are interested at what time invalid values is reported (e.g. negative
CPU
due to overflow)
After successful demonstration in multiple use cases that discovered edge cases and saved manual work, and interest from outside of the team, we release this tool so you can benefit from it too.
Input is:
tableName
: The name of the table to search for gaps anomaliesdt
: The interval on which a new row should arriveanomaly_type
: String placeholder for display the anomaly type in the output tabletime_col
: The time columnunique_id_col
: The unique id column (e.g. ID number)endtime
: The end time, that we shall look before it to search for anomalieswindow
: Used to aggregate amount of rows based on a common time slotwindow_records_pct
: The percent from the actual amount of expected records thresold, that below it is considered an anomaly (e.g. if less than 60% of exepcted 100 records, is considered an anomaly)
The window
parameter is a tunning parameter;
gaps might occur because of a transient network issue of the input source to the table, so you want to consider a bigger window and tune the percent threshold
let endtime = ago(4h);
let tableName = 'Table';
let timeColumnName = 'TimeColumn';
let uniqueColumnName = 'uniqueColumn';
let window = 4h;
let threshold = 90.0;
detect_anomalies_details(table = tableName, dt = 10m, anomaly_type = "Missing", time_col = timeColumnName, unique_id_col = uniqueColumnName, endtime = endtime, window = window, window_records_pct = threshold)
In case we want to check if ranges are below or above certain value.
// You can keep range validation it as is, if no validation is required (and no redundant calculation will occur)
| extend RangeAnomalyDetails_pColumn1 = in_range_anomaly_reals('Column1', s_pColumn1, s_pTimestampColumn)
// Or pass lower and upper parameters (or either pass just one of them) if validation is required
| extend RangeAnomalyDetails_pColumn1 = in_range_anomaly_reals('Column1', s_pColumn1, s_pTimestampColumn, lower = 0, upper = 100)
Aviv Yaniv
Email
Site
Blog
StackOverflow
GitHub
Project Euler