Add MLP trainer based on DB data #1

ChristophWurst · 2018-12-12T17:14:26Z

This is my very first implementation of a neural net classifier based on the multilayer perceptron classifier from php-ml.

In order to get okay-ish results I had to

hash the UIDs, convert to binary representation but use just a few bits to keep the feature vector smaller
convert the IPs to binary representation in order to allow the net learn patterns of IP addresses like their hierarchy (rather than learning the decimal representation)
generate negatives samples ...
- ... by mixing known UIDs with unrelated IPs
- ... by generating random IPs

All that is now available as a occ command with various optional parameters:

occ suspiciouslogin:train:mlp --help
Usage:
  suspiciouslogin:train:mlp [options]

Options:
      --shuffled[=SHUFFLED]                ratio of shuffled negative samples [default: 1]
      --random[=RANDOM]                    ratio of random negative samples [default: 1]
  -e, --epochs[=EPOCHS]                    number of epochs to train [default: 5000]
  -l, --layers[=LAYERS]                    number of hidden layers [default: 6]
      --learn-rate[=LEARN-RATE]            learning rate [default: 0.050000000000000003]
      --validation-rate[=VALIDATION-RATE]  relative size of the validation data set [default: 0.14999999999999999]
  -h, --help                               Display this help message

Some of the most recent training runs yield results like this:

Got 1375 samples for training: 573 positive, 401 random negative and 401 shuffled negative
Got 102 positive and 102 negative samples for validation (rate: 0.15)
Number of epochs: 1000
Number of hidden layers: 6
Learning rate: 0.01
Vecor dimensions: 48
Start training
Training finished after 118s
Prescision(y): 0.8141592920354
Prescision(n): 0.89010989010989
Recall(y): 0.90196078431373
Recall(n): 0.79411764705882
Average(precision): 0.85213459107264
Average(recall): 0.84803921568627
Average(f1score): 0.84759609591517

Roughly translated to our problem this means 80 to 90% of the addresses classified as 'n' (negative) are really from suspicious logins. On the other hand, of all logins from unknown IPs we detect ~80%.

Ref https://en.wikipedia.org/wiki/Precision_and_recall

Signed-off-by: Christoph Wurst <christoph@winzerhof-wurst.at>

ChristophWurst · 2018-12-12T17:15:34Z

Got 1375 samples for training: 573 positive

FYI: This 573 unique (uid, ip) combinations are the result of roughly two weeks of data collection of two dozen users.

fix #745 ValueError: random_int(): Argument #1 ($min) must be less than or equal to argument #2 ($max)

[stable26] fix #745 ValueError: random_int(): Argument #1 ($min) must be less than or equal to argument #2 ($max)

[stable28] fix #745 ValueError: random_int(): Argument #1 ($min) must be less than or equal to argument #2 ($max)

[stable27] fix #745 ValueError: random_int(): Argument #1 ($min) must be less than or equal to argument #2 ($max)

Add MLP trainer based on DB data

43239ed

Signed-off-by: Christoph Wurst <christoph@winzerhof-wurst.at>

ChristophWurst added the enhancement New feature or request label Dec 12, 2018

ChristophWurst self-assigned this Dec 12, 2018

ChristophWurst merged commit 442aae7 into master Dec 13, 2018

ChristophWurst deleted the feature/mlp-trainer branch December 13, 2018 07:54

YoSiJo mentioned this pull request Sep 16, 2019

No models found #158

Closed

pzzszk mentioned this pull request Feb 1, 2022

Running ./occ Suspiciouslogin:optimize fails with call to undefined function pcntl_signal_dispath #602

Closed

psyciknz mentioned this pull request Mar 4, 2022

training error. Not quite sure what im doing. #616

Closed

AndyXheli mentioned this pull request Jun 9, 2023

ValueError: random_int(): Argument #1 ($min) must be less than or equal to argument #2 ($max) #745

Closed

ChristophWurst added a commit that referenced this pull request Feb 14, 2024

Merge pull request #810 from furplag/fix-issue-#745

cc59010

fix #745 ValueError: random_int(): Argument #1 ($min) must be less than or equal to argument #2 ($max)

ChristophWurst added a commit that referenced this pull request Feb 15, 2024

Merge pull request #849 from nextcloud/backport/810/stable26

38219cd

[stable26] fix #745 ValueError: random_int(): Argument #1 ($min) must be less than or equal to argument #2 ($max)

ChristophWurst added a commit that referenced this pull request Feb 15, 2024

Merge pull request #847 from nextcloud/backport/810/stable28

7e8e975

[stable28] fix #745 ValueError: random_int(): Argument #1 ($min) must be less than or equal to argument #2 ($max)

ChristophWurst added a commit that referenced this pull request Feb 15, 2024

Merge pull request #848 from nextcloud/backport/810/stable27

8ab3ac3

[stable27] fix #745 ValueError: random_int(): Argument #1 ($min) must be less than or equal to argument #2 ($max)

wriver4 mentioned this pull request Sep 26, 2024

php occ suspiciouslogin:optimize fails #939

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add MLP trainer based on DB data #1

Add MLP trainer based on DB data #1

ChristophWurst commented Dec 12, 2018

ChristophWurst commented Dec 12, 2018

Add MLP trainer based on DB data #1

Add MLP trainer based on DB data #1

Conversation

ChristophWurst commented Dec 12, 2018

ChristophWurst commented Dec 12, 2018