Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add MLP trainer based on DB data #1

Merged
merged 1 commit into from
Dec 13, 2018
Merged

Conversation

ChristophWurst
Copy link
Member

This is my very first implementation of a neural net classifier based on the multilayer perceptron classifier from php-ml.

In order to get okay-ish results I had to

  • hash the UIDs, convert to binary representation but use just a few bits to keep the feature vector smaller
  • convert the IPs to binary representation in order to allow the net learn patterns of IP addresses like their hierarchy (rather than learning the decimal representation)
  • generate negatives samples ...
    • ... by mixing known UIDs with unrelated IPs
    • ... by generating random IPs

All that is now available as a occ command with various optional parameters:

occ suspiciouslogin:train:mlp --help
Usage:
  suspiciouslogin:train:mlp [options]

Options:
      --shuffled[=SHUFFLED]                ratio of shuffled negative samples [default: 1]
      --random[=RANDOM]                    ratio of random negative samples [default: 1]
  -e, --epochs[=EPOCHS]                    number of epochs to train [default: 5000]
  -l, --layers[=LAYERS]                    number of hidden layers [default: 6]
      --learn-rate[=LEARN-RATE]            learning rate [default: 0.050000000000000003]
      --validation-rate[=VALIDATION-RATE]  relative size of the validation data set [default: 0.14999999999999999]
  -h, --help                               Display this help message

Some of the most recent training runs yield results like this:

Got 1375 samples for training: 573 positive, 401 random negative and 401 shuffled negative
Got 102 positive and 102 negative samples for validation (rate: 0.15)
Number of epochs: 1000
Number of hidden layers: 6
Learning rate: 0.01
Vecor dimensions: 48
Start training
Training finished after 118s
Prescision(y): 0.8141592920354
Prescision(n): 0.89010989010989
Recall(y): 0.90196078431373
Recall(n): 0.79411764705882
Average(precision): 0.85213459107264
Average(recall): 0.84803921568627
Average(f1score): 0.84759609591517

Roughly translated to our problem this means 80 to 90% of the addresses classified as 'n' (negative) are really from suspicious logins. On the other hand, of all logins from unknown IPs we detect ~80%.

Ref https://en.wikipedia.org/wiki/Precision_and_recall

Signed-off-by: Christoph Wurst <christoph@winzerhof-wurst.at>
@ChristophWurst ChristophWurst added the enhancement New feature or request label Dec 12, 2018
@ChristophWurst ChristophWurst self-assigned this Dec 12, 2018
@ChristophWurst
Copy link
Member Author

Got 1375 samples for training: 573 positive

FYI: This 573 unique (uid, ip) combinations are the result of roughly two weeks of data collection of two dozen users.

@ChristophWurst ChristophWurst merged commit 442aae7 into master Dec 13, 2018
@ChristophWurst ChristophWurst deleted the feature/mlp-trainer branch December 13, 2018 07:54
@YoSiJo YoSiJo mentioned this pull request Sep 16, 2019
ChristophWurst added a commit that referenced this pull request Feb 14, 2024
fix #745 ValueError: random_int(): Argument #1 ($min) must be less than or equal to argument #2 ($max)
ChristophWurst added a commit that referenced this pull request Feb 15, 2024
[stable26] fix #745 ValueError: random_int(): Argument #1 ($min) must be less than or equal to argument #2 ($max)
ChristophWurst added a commit that referenced this pull request Feb 15, 2024
[stable28] fix #745 ValueError: random_int(): Argument #1 ($min) must be less than or equal to argument #2 ($max)
ChristophWurst added a commit that referenced this pull request Feb 15, 2024
[stable27] fix #745 ValueError: random_int(): Argument #1 ($min) must be less than or equal to argument #2 ($max)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant