Skip to content

Getting started (tutorial)

landauermax edited this page Apr 29, 2023 · 61 revisions

The aminer (logdata-anomaly-miner) allows to create log analysis pipelines to analyze log data streams and detect violations or anomalies in it. It can be run from console, as daemon with e-mail alerting or embedded as library into own programs. It was designed to run analysis with limited resources and lowest possible permissions to make it suitable for production server use.

Since analysis of log lines depends on the parser, the configuration of the aminer can be a bit overwhelming. This tutorial introduces the aminer by using a very simple configuration for apache access logs.

Requirements

We will setup the aminer on a fresh installation of Ubuntu Focal:

alice@ubuntu2004:~$ lsb_release -a
No LSB modules are available.
Distributor ID:	Ubuntu
Description:	Ubuntu 20.04.2 LTS
Release:	20.04
Codename:	focal
alice@ubuntu2004:~$

In this tutorial we want to find anomalies in Apache access.logs. So let's install apache2:

alice@ubuntu2004:~$ sudo apt-get install apache2
Reading package lists... Done
Building dependency tree       
Reading state information... Done
The following additional packages will be installed:
  apache2-bin apache2-data apache2-utils libapr1 libaprutil1 libaprutil1-dbd-sqlite3 libaprutil1-ldap libjansson4 liblua5.2-0 ssl-cert
Suggested packages:
  apache2-doc apache2-suexec-pristine | apache2-suexec-custom www-browser openssl-blacklist
The following NEW packages will be installed:
  apache2 apache2-bin apache2-data apache2-utils libapr1 libaprutil1 libaprutil1-dbd-sqlite3 libaprutil1-ldap libjansson4 liblua5.2-0 ssl-cert
0 upgraded, 11 newly installed, 0 to remove and 0 not upgraded.
Need to get 1,865 kB of archives.
After this operation, 8,080 kB of additional disk space will be used.
Do you want to continue? [Y/n] y
...
...
...
Created symlink /etc/systemd/system/multi-user.target.wants/apache2.service → /lib/systemd/system/apache2.service.
Created symlink /etc/systemd/system/multi-user.target.wants/apache-htcacheclean.service → /lib/systemd/system/apache-htcacheclean.service.
Processing triggers for ufw (0.36-6) ...
Processing triggers for systemd (245.4-4ubuntu3.6) ...
Processing triggers for man-db (2.9.1-1) ...
Processing triggers for libc-bin (2.31-0ubuntu9.2) ...
alice@ubuntu2004:~$ 

We can try to send HTTP-requests to our Apache:

alice@ubuntu2004:~$ wget -qO- http://localhost

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
  <!--
    Modified from the Debian original for Ubuntu
    Last updated: 2016-11-16
    See: https://launchpad.net/bugs/1288690
  -->
...
...

Now we should have at least one line in /var/log/apache2/access.log:

alice@ubuntu2004:~$ sudo cat /var/log/apache2/access.log
127.0.0.1 - - [17/May/2021:11:25:14 +0000] "GET / HTTP/1.1" 200 11229 "-" "Wget/1.20.3 (linux-gnu)"

Installation

Even though Debian packages exist for the aminer, this tutorial works with a simple installation script that utilizes Ansible and installs the aminer from sources.

alice@ubuntu2004:~$ wget https://raw.githubusercontent.com/ait-aecid/logdata-anomaly-miner/main/scripts/aminer_install.sh
--2021-05-17 11:26:48--  https://raw.githubusercontent.com/ait-aecid/logdata-anomaly-miner/main/scripts/aminer_install.sh
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1541 (1.5K) [text/plain]
Saving to: ‘aminer_install.sh’

aminer_install.sh                                               100%[=====================================================================================================================================================>]   1.50K  --.-KB/s    in 0.001s  

2021-05-17 11:26:48 (1.68 MB/s) - ‘aminer_install.sh’ saved [1541/1541]

alice@ubuntu2004:~$ chmod +x aminer_install.sh
alice@ubuntu2004:~$ ./aminer_install.sh
Hit:1 http://at.archive.ubuntu.com/ubuntu focal InRelease
Hit:2 http://at.archive.ubuntu.com/ubuntu focal-updates InRelease
Hit:3 http://at.archive.ubuntu.com/ubuntu focal-backports InRelease
Hit:4 http://at.archive.ubuntu.com/ubuntu focal-security InRelease
Reading package lists... Done
Reading package lists...
Building dependency tree...
Reading state information...
...
...
...
PLAY RECAP ***************************************************************************************************************************************************************************************************************************************************
localhost                  : ok=27   changed=23   unreachable=0    failed=0    skipped=18   rescued=0    ignored=0   

alice@ubuntu2004:~$ 

First very simple configuration

Now let us add an Apache parsermodel to the aminer-config:

alice@ubuntu2004:~$ sudo ln -s /etc/aminer/conf-available/generic/ApacheAccessModel.py /etc/aminer/conf-enabled/
alice@ubuntu2004:~$

In previous versions of the aminer we had to write the config-files in python. In current versions we can use configurations written in yaml. Now create and edit the file /etc/aminer/config.yml:

LearnMode: True

LogResourceList:
        - 'file:///var/log/apache2/access.log'

Parser:
        - id: 'START'
          start: True
          type: ApacheAccessModel
          name: 'apache'

Input:
        timestamp_paths: "/accesslog/time"

Analysis:
        - type: "NewMatchPathValueDetector"
          paths: ["/accesslog/status"]
          output_logline: True

EventHandlers:
        - id: "stpe"
          type: "StreamPrinterEventHandler"

If we start the aminer now, it will read the access.log and learn all the parser-paths. We will use the "-C" parameter to clear the persistency before we start aminer. (Please note that you can terminate the aminer with CTRL+c)

alice@ubuntu2004:~$ sudo cat /var/log/apache2/access.log
127.0.0.1 - - [17/May/2021:11:25:14 +0000] "GET / HTTP/1.1" 200 11229 "-" "Wget/1.20.3 (linux-gnu)"
alice@ubuntu2004:~$ sudo aminer -C --config /etc/aminer/config.yml
2021-05-17 12:12:36 New path(es) detected
NewMatchPathDetector: "DefaultNewMatchPathDetector" (1 lines)
  /accesslog: 127.0.0.1 - - [17/May/2021:11:25:14 +0000] "GET / HTTP/1.1" 200 11229 "-" "Wget/1.20.3 (linux-gnu)"
  /accesslog/host: 127.0.0.1
  /accesslog/sp0:  
  /accesslog/ident: -
  /accesslog/sp1:  
  /accesslog/user: -
  /accesslog/sp2:  
  /accesslog/time: 1621250714
  /accesslog/sp3: ] "
  /accesslog/fm/request: GET / HTTP/1.1
    /accesslog/fm/request/method: 0
    /accesslog/fm/request/sp5:  
    /accesslog/fm/request/request: /
    /accesslog/fm/request/sp6:  
    /accesslog/fm/request/version: HTTP/1.1
  /accesslog/sp6: " 
  /accesslog/status: 200
  /accesslog/sp7:  
  /accesslog/size: 11229
  /accesslog/combined:  "-" "Wget/1.20.3 (linux-gnu)"
    /accesslog/combined/combined:  "-" "Wget/1.20.3 (linux-gnu)"
      /accesslog/combined/combined/sp9:  "
      /accesslog/combined/combined/referer: -
      /accesslog/combined/combined/sp10: " "
      /accesslog/combined/combined/user_agent: Wget/1.20.3 (linux-gnu)
      /accesslog/combined/combined/sp11: "
['/accesslog', '/accesslog/host', '/accesslog/sp0', '/accesslog/ident', '/accesslog/sp1', '/accesslog/user', '/accesslog/sp2', '/accesslog/time', '/accesslog/sp3', '/accesslog/fm/request', '/accesslog/sp6', '/accesslog/status', '/accesslog/sp7', '/accesslog/size', '/accesslog/combined', '/accesslog/combined/combined', '/accesslog/combined/combined/sp9', '/accesslog/combined/combined/referer', '/accesslog/combined/combined/sp10', '/accesslog/combined/combined/user_agent', '/accesslog/combined/combined/sp11', '/accesslog/fm/request/method', '/accesslog/fm/request/sp5', '/accesslog/fm/request/request', '/accesslog/fm/request/sp6', '/accesslog/fm/request/version']
127.0.0.1 - - [17/May/2021:11:25:14 +0000] "GET / HTTP/1.1" 200 11229 "-" "Wget/1.20.3 (linux-gnu)"

2021-05-17 12:12:36 New value(s) detected
NewMatchPathValueDetector: "NewMatchPathValueDetector2" (1 lines)
  {'/accesslog/status': 200}
127.0.0.1 - - [17/May/2021:11:25:14 +0000] "GET / HTTP/1.1" 200 11229 "-" "Wget/1.20.3 (linux-gnu)"

After we trained the aminer (with just one single log line) we can now switch off the "LearnMode":

LearnMode: False

LogResourceList:
        - 'file:///var/log/apache2/access.log'

Parser:
        - id: 'START'
          start: True
          type: ApacheAccessModel
          name: 'apache'

Input:
        timestamp_paths: "/accesslog/time"

Analysis:
        - type: "NewMatchPathValueDetector"
          paths: ["/accesslog/status"]
          output_logline: True

EventHandlers:
        - id: "stpe"
          type: "StreamPrinterEventHandler"

Next we will simply generate an anomaly. In order to do that, we have to understand the "Analysis"-section of the config file:

Analysis:
        - type: "NewMatchPathValueDetector"
          paths: ["/accesslog/status"]
          output_logline: True

We use the "NewMatchPathValueDetector" at the path /accesslog/status. This detector will take action as soon as we create a log that holds a different value at the given path (/accesslog/status) as it was trained.

So, how do we know which path to take? For this we inspect our parser-model. We can find out which parser-model we use by having a look into the configuration file:

Parser:
        - id: 'START'
          start: True
          type: ApacheAccessModel
          name: 'apache'

It seems that we configured the "ApacheAccessModel". This is a custom model that we can find in /etc/aminer/conf-enabled/ApacheAccessModel.py:

from aminer.parsing.DateTimeModelElement import DateTimeModelElement
from aminer.parsing.DecimalIntegerValueModelElement import DecimalIntegerValueModelElement
from aminer.parsing.FixedDataModelElement import FixedDataModelElement
from aminer.parsing.SequenceModelElement import SequenceModelElement
from aminer.parsing.VariableByteDataModelElement import VariableByteDataModelElement
from aminer.parsing.FixedWordlistDataModelElement import FixedWordlistDataModelElement
from aminer.parsing.OptionalMatchModelElement import OptionalMatchModelElement
from aminer.parsing.DelimitedDataModelElement import DelimitedDataModelElement
from aminer.parsing.FirstMatchModelElement import FirstMatchModelElement


def get_model():
    """Return a parser for apache2 access.log."""
    alphabet = b"ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789._-:"
    new_time_model = DateTimeModelElement("time", b"[%d/%b/%Y:%H:%M:%S%z")
    host_name_model = VariableByteDataModelElement("host", alphabet)
    identity_model = VariableByteDataModelElement("ident", alphabet)
    user_name_model = VariableByteDataModelElement("user", b"0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz.-")
    request_method_model = FirstMatchModelElement("fm", [
        FixedDataModelElement("dash", b"-"),
        SequenceModelElement("request", [
            FixedWordlistDataModelElement("method", [
                b"GET", b"POST", b"PUT", b"HEAD", b"DELETE", b"CONNECT", b"OPTIONS", b"TRACE", b"PATCH"]),
            FixedDataModelElement("sp5", b" "),
            DelimitedDataModelElement("request", b" ", b"\\"),
            FixedDataModelElement("sp6", b" "),
            DelimitedDataModelElement("version", b'"'),
            ])
        ])
    status_code_model = DecimalIntegerValueModelElement("status")
    size_model = DecimalIntegerValueModelElement("size")

    whitespace_str = b" "
    model = SequenceModelElement("accesslog", [
        host_name_model,
        FixedDataModelElement("sp0", whitespace_str),
        identity_model,
        FixedDataModelElement("sp1", whitespace_str),
        user_name_model,
        FixedDataModelElement("sp2", whitespace_str),
        new_time_model,
        FixedDataModelElement("sp3", b'] "'),
        request_method_model,
        FixedDataModelElement("sp6", b'" '),
        status_code_model,
        FixedDataModelElement("sp7", whitespace_str),
        size_model,
        OptionalMatchModelElement(
            "combined", SequenceModelElement("combined", [
                FixedDataModelElement("sp9", b' "'),
                DelimitedDataModelElement("referer", b'"', b"\\"),
                FixedDataModelElement("sp10", b'" "'),
                DelimitedDataModelElement("user_agent", b'"', b"\\"),
                FixedDataModelElement("sp11", b'"')
            ]))
        ])
    return model

Even though this is python code, it is quite simple to understand. The first element of the model is a SequenceModelElement with the name 'accesslog':

model = SequenceModelElement('accesslog', [

The SequenceModelElement is a container that holds other model elements. This sequence describes the log line of the access.log. Let us have a look at the access.log:

127.0.0.1 - - [17/May/2021:11:25:14 +0000] "GET / HTTP/1.1" 200 11229 "-" "Wget/1.20.3 (linux-gnu)"

Now we compare this log line with the output of the aminer:

2021-05-17 12:12:36 New path(es) detected
NewMatchPathDetector: "DefaultNewMatchPathDetector" (1 lines)
  /accesslog: 127.0.0.1 - - [17/May/2021:11:25:14 +0000] "GET / HTTP/1.1" 200 11229 "-" "Wget/1.20.3 (linux-gnu)"
  /accesslog/host: 127.0.0.1
  /accesslog/sp0:  
  /accesslog/ident: -
  /accesslog/sp1:  
  /accesslog/user: -
  /accesslog/sp2:  
  /accesslog/time: 1621250714
  /accesslog/sp3: ] "
  /accesslog/fm/request: GET / HTTP/1.1
    /accesslog/fm/request/method: 0
    /accesslog/fm/request/sp5:  
    /accesslog/fm/request/request: /
    /accesslog/fm/request/sp6:  
    /accesslog/fm/request/version: HTTP/1.1
  /accesslog/sp6: " 
  /accesslog/status: 200
  /accesslog/sp7:  
  /accesslog/size: 11229
  /accesslog/combined:  "-" "Wget/1.20.3 (linux-gnu)"
    /accesslog/combined/combined:  "-" "Wget/1.20.3 (linux-gnu)"
      /accesslog/combined/combined/sp9:  "
      /accesslog/combined/combined/referer: -
      /accesslog/combined/combined/sp10: " "
      /accesslog/combined/combined/user_agent: Wget/1.20.3 (linux-gnu)
      /accesslog/combined/combined/sp11: "
['/accesslog', '/accesslog/host', '/accesslog/sp0', '/accesslog/ident', '/accesslog/sp1', '/accesslog/user', '/accesslog/sp2', '/accesslog/time', '/accesslog/sp3', '/accesslog/fm/request', '/accesslog/sp6', '/accesslog/status', '/accesslog/sp7', '/accesslog/size', '/accesslog/combined', '/accesslog/combined/combined', '/accesslog/combined/combined/sp9', '/accesslog/combined/combined/referer', '/accesslog/combined/combined/sp10', '/accesslog/combined/combined/user_agent', '/accesslog/combined/combined/sp11', '/accesslog/fm/request/method', '/accesslog/fm/request/sp5', '/accesslog/fm/request/request', '/accesslog/fm/request/sp6', '/accesslog/fm/request/version']
127.0.0.1 - - [17/May/2021:11:25:14 +0000] "GET / HTTP/1.1" 200 11229 "-" "Wget/1.20.3 (linux-gnu)"

2021-05-17 12:12:36 New value(s) detected
NewMatchPathValueDetector: "NewMatchPathValueDetector2" (1 lines)
  {'/accesslog/status': 200}
127.0.0.1 - - [17/May/2021:11:25:14 +0000] "GET / HTTP/1.1" 200 11229 "-" "Wget/1.20.3 (linux-gnu)"

As we can see, the aminer parses the log line accordingly to the parser-model. Nothing is unparsed. Every value of the log line fits into the parser-model. We can also see that the path to the HTTP-status with the value "200" is "/accesslog/status". The aminer learned that "/acesslog/status" has to be "200". We learned only on a single log line. So, any other value than 200 would be an anomaly. We can create a log line with a different HTTP-status by requesting a page that does not exist:

alice@ubuntu2004:~$ wget -qO- http://localhost/doesntexist
alice@ubuntu2004:~$ sudo cat /var/log/apache2/access.log
127.0.0.1 - - [17/May/2021:11:25:14 +0000] "GET / HTTP/1.1" 200 11229 "-" "Wget/1.20.3 (linux-gnu)"
127.0.0.1 - - [17/May/2021:12:21:16 +0000] "GET /doesntexist HTTP/1.1" 404 488 "-" "Wget/1.20.3 (linux-gnu)"

Make sure that the "LearnMode" is set to "False" in /etc/aminer/config.yml and fire up aminer again. But this time without clearing the persistency:

alice@ubuntu2004:~$ sudo aminer --config /etc/aminer/config.yml
2021-05-17 12:22:18 New value(s) detected
NewMatchPathValueDetector: "NewMatchPathValueDetector2" (1 lines)
  {'/accesslog/status': 404}
127.0.0.1 - - [17/May/2021:12:21:16 +0000] "GET /doesntexist HTTP/1.1" 404 488 "-" "Wget/1.20.3 (linux-gnu)"

The aminer output shows that a new value was detected using the "NewMatchPathValueDetector" and that /accesslog/status holds now the value "404". Since we restarted the aminer after we turned off the "LearnMode", the aminer iterated through all log lines and ignored everything that has already been learned.

Great! We detected our first anomaly.

Detecting anomalies in combinations of different log line fields

Before we start we will generate two more loglines with different user-agent strings:

alice@ubuntu2004:~$ wget --user-agent="Mozilla/5.0 (X11; Linux x86_64; rv:74.0) Gecko/20100101 Firefox/74.0" -qO- http://localhost
alice@ubuntu2004:~$ wget --user-agent="Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.1b3) Gecko/20090305 Firefox/3.1b3 GTB5" -qO- http://localhost

In the previous section we simply detected anomalies when the HTTP-status of the log line changed. We used the "NewMatchPathValueDetector" for that. Another very powerful detector is the "NewMatchPathValueComboDetector". This analysis-module detects anomalies that occur in combinations of log line fields. In order to make that more clear, lets have a look at the following log lines:

127.0.0.1 - - [17/May/2021:11:25:14 +0000] "GET / HTTP/1.1" 200 11229 "-" "Wget/1.20.3 (linux-gnu)"
127.0.0.1 - - [17/May/2021:12:21:16 +0000] "GET /doesntexist HTTP/1.1" 404 488 "-" "Wget/1.20.3 (linux-gnu)"
127.0.0.1 - - [17/May/2021:12:23:48 +0000] "GET / HTTP/1.1" 200 11229 "-" "Mozilla/5.0 (X11; Linux x86_64; rv:74.0) Gecko/20100101 Firefox/74.0"
127.0.0.1 - - [17/May/2021:12:23:55 +0000] "GET / HTTP/1.1" 200 11229 "-" "Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.1b3) Gecko/20090305 Firefox/3.1b3 GTB5"

For this example we are going to inspect the HTTP-method, the path and the user-agent. In all three lines we have the "GET" method. In 3 lines we have "/" as path and in one we have "doesntexist" in the path. For the user-agent we have two identical "wget" and two Firefox with different platforms (Linux and Mac). If we learn those lines, every other combination of those three fields will be an anomaly. Let us create a config first:

LearnMode: True

LogResourceList:
        - 'file:///var/log/apache2/access.log'

Parser:
        - id: 'START'
          start: True
          type: ApacheAccessModel
          name: 'apache'
Input:
        timestamp_paths: "/accesslog/time"

Analysis:
        - type: "NewMatchPathValueDetector"
          paths: ["/accesslog/status"]
          output_logline: True
        - type: "NewMatchPathValueComboDetector"
          paths: ["/accesslog/fm/request/method","/accesslog/fm/request/request","/accesslog/combined/combined/user_agent"]
          output_logline: True

EventHandlers:
        - id: "stpe"
          type: "StreamPrinterEventHandler"

We kept the "NewMatchPathValueDetector" from the previous section to illustrate the use of many different detectors. The "NewMatchPathValueComboDetector" has to be configured with all the paths to monitor. In the previous section we already learned about the output of the aminer and how to find out the paths of a parser-model. This time we have we have to delete the persistence files by using the parameter "-C" (because we still have them from the previous example):

After we cleaned up the persistence files of the aminer, we can start to learn:

alice@ubuntu2004:~$ sudo aminer -C --config /etc/aminer/config.yml
2021-05-17 12:27:50 New path(es) detected
NewMatchPathDetector: "DefaultNewMatchPathDetector" (1 lines)
  /accesslog: 127.0.0.1 - - [17/May/2021:11:25:14 +0000] "GET / HTTP/1.1" 200 11229 "-" "Wget/1.20.3 (linux-gnu)"
  /accesslog/host: 127.0.0.1
  /accesslog/sp0:  
  /accesslog/ident: -
  /accesslog/sp1:  
  /accesslog/user: -
  /accesslog/sp2:  
  /accesslog/time: 1621250714
  /accesslog/sp3: ] "
  /accesslog/fm/request: GET / HTTP/1.1
    /accesslog/fm/request/method: 0
    /accesslog/fm/request/sp5:  
    /accesslog/fm/request/request: /
    /accesslog/fm/request/sp6:  
    /accesslog/fm/request/version: HTTP/1.1
  /accesslog/sp6: " 
  /accesslog/status: 200
  /accesslog/sp7:  
  /accesslog/size: 11229
  /accesslog/combined:  "-" "Wget/1.20.3 (linux-gnu)"
    /accesslog/combined/combined:  "-" "Wget/1.20.3 (linux-gnu)"
      /accesslog/combined/combined/sp9:  "
      /accesslog/combined/combined/referer: -
      /accesslog/combined/combined/sp10: " "
      /accesslog/combined/combined/user_agent: Wget/1.20.3 (linux-gnu)
      /accesslog/combined/combined/sp11: "
['/accesslog', '/accesslog/host', '/accesslog/sp0', '/accesslog/ident', '/accesslog/sp1', '/accesslog/user', '/accesslog/sp2', '/accesslog/time', '/accesslog/sp3', '/accesslog/fm/request', '/accesslog/sp6', '/accesslog/status', '/accesslog/sp7', '/accesslog/size', '/accesslog/combined', '/accesslog/combined/combined', '/accesslog/combined/combined/sp9', '/accesslog/combined/combined/referer', '/accesslog/combined/combined/sp10', '/accesslog/combined/combined/user_agent', '/accesslog/combined/combined/sp11', '/accesslog/fm/request/method', '/accesslog/fm/request/sp5', '/accesslog/fm/request/request', '/accesslog/fm/request/sp6', '/accesslog/fm/request/version']
127.0.0.1 - - [17/May/2021:11:25:14 +0000] "GET / HTTP/1.1" 200 11229 "-" "Wget/1.20.3 (linux-gnu)"

2021-05-17 12:27:50 New value(s) detected
NewMatchPathValueDetector: "NewMatchPathValueDetector2" (1 lines)
  {'/accesslog/status': 200}
127.0.0.1 - - [17/May/2021:11:25:14 +0000] "GET / HTTP/1.1" 200 11229 "-" "Wget/1.20.3 (linux-gnu)"

2021-05-17 12:27:50 New value combination(s) detected
NewMatchPathValueComboDetector: "NewMatchPathValueComboDetector3" (1 lines)
  (0, b'/', b'Wget/1.20.3 (linux-gnu)')
127.0.0.1 - - [17/May/2021:11:25:14 +0000] "GET / HTTP/1.1" 200 11229 "-" "Wget/1.20.3 (linux-gnu)"

2021-05-17 12:27:50 New value(s) detected
NewMatchPathValueDetector: "NewMatchPathValueDetector2" (1 lines)
  {'/accesslog/status': 404}
127.0.0.1 - - [17/May/2021:12:21:16 +0000] "GET /doesntexist HTTP/1.1" 404 488 "-" "Wget/1.20.3 (linux-gnu)"

2021-05-17 12:27:50 New value combination(s) detected
NewMatchPathValueComboDetector: "NewMatchPathValueComboDetector3" (1 lines)
  (0, b'/doesntexist', b'Wget/1.20.3 (linux-gnu)')
127.0.0.1 - - [17/May/2021:12:21:16 +0000] "GET /doesntexist HTTP/1.1" 404 488 "-" "Wget/1.20.3 (linux-gnu)"

2021-05-17 12:27:50 New value combination(s) detected
NewMatchPathValueComboDetector: "NewMatchPathValueComboDetector3" (1 lines)
  (0, b'/', b'Mozilla/5.0 (X11; Linux x86_64; rv:74.0) Gecko/20100101 Firefox/74.0')
127.0.0.1 - - [17/May/2021:12:23:48 +0000] "GET / HTTP/1.1" 200 11229 "-" "Mozilla/5.0 (X11; Linux x86_64; rv:74.0) Gecko/20100101 Firefox/74.0"

2021-05-17 12:27:50 New value combination(s) detected
NewMatchPathValueComboDetector: "NewMatchPathValueComboDetector3" (1 lines)
  (0, b'/', b'Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.1b3) Gecko/20090305 Firefox/3.1b3 GTB5')
127.0.0.1 - - [17/May/2021:12:23:55 +0000] "GET / HTTP/1.1" 200 11229 "-" "Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.1b3) Gecko/20090305 Firefox/3.1b3 GTB5"

Please note that this time we trained the aminer that "/doesntexist" has status-code "404". It will not detect this as an anomaly anymore. So let us stop the training mode by entering "CTRL + c" to terminate the aminer and then edit /etc/aminer/config.yml to turn off the ``LearnMode'':

LearnMode: False

Now let us fire up the aminer:

alice@ubuntu2004:~$ sudo aminer --config /etc/aminer/config.yml

Just to verify, we will try to access "/doesnotexist" like we did in the previous section in another terminal session:

alice@ubuntu2004:~$ wget -qO- http://localhost/doesntexist

In the aminer session, no anomalies were detected. Perfect!

Now let's try to change the HTTP-method to POST:

alice@ubuntu2004:~$ wget -qO- --method=POST http://localhost/doesntexist

This time the aminer reported an anomaly:

alice@ubuntu2004:~$ sudo aminer --config /etc/aminer/config.yml
2021-05-17 12:30:04 New value combination(s) detected
NewMatchPathValueComboDetector: "NewMatchPathValueComboDetector3" (1 lines)
  (1, b'/doesntexist', b'Wget/1.20.3 (linux-gnu)')
127.0.0.1 - - [17/May/2021:12:30:03 +0000] "POST /doesntexist HTTP/1.1" 404 488 "-" "Wget/1.20.3 (linux-gnu)"

The combination of "method/path/user-agent" is different this time, because we used the POST-method. For the showcase we just used a very simple example, but I am sure that this example could give some ideas of how to use this detector practically.

What's next?

This tutorial showed how to install and configure the aminer in order to detect anomalies in logfiles. Even though the aminer ships with different parser-models, it might be necessary for some use-cases to write custom parser-models or use different analysis modules. Feel free to have a look at the following resources to dig deeper: