-
Notifications
You must be signed in to change notification settings - Fork 287
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add Password Strength Checker script with ML and NN (#349)
* Add password strengtch checker source code * Update README.md * Fix README.md * Fix typo in model README.md
- Loading branch information
Showing
11 changed files
with
262 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,27 @@ | ||
# Password Strength Checker | ||
|
||
## Description | ||
A password strength checker that utilizes machine learning to classify the strength of passwords. This project provides a simple interface for users to input their passwords and receive feedback on their strength based on various criteria. | ||
|
||
## Features | ||
- Classifies password strength into multiple categories. | ||
|
||
## Installation | ||
1. Clone the repository: | ||
```bash | ||
git clone https://github.com/DhanushNehru/Python-Scripts | ||
cd "Password Strength Checker" | ||
|
||
2. Create and activate a virtual environment: | ||
```bash | ||
python3 -m venv venv | ||
source venv/bin/activate # On Windows use `venv\Scripts\activate` | ||
3. Install the required packages: | ||
```bash | ||
pip install -r requirements.txt | ||
|
||
## Usage | ||
To run the password strength checker: | ||
```bash | ||
python main.py |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
from model.model import predict # import model | ||
|
||
def main(): | ||
password_to_test = input("Enter a password to check its strength: ") # get password from terminal | ||
predicted_class = int(predict(password_to_test)) # evaluate password strength | ||
print(f"Password strength classification: {predicted_class} / 2") # output 0 - weak, 1 - moderate, or 2 - strong | ||
|
||
if __name__ == "__main__": main() |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,15 @@ | ||
# Password Strength Classification Model | ||
|
||
## Overview | ||
This model is designed to evaluate the strength of passwords using machine learning techniques. It analyzes input passwords and classifies them based on their strength, providing feedback for users to create stronger passwords. | ||
|
||
## Model Architecture | ||
- **Input Layer**: The model accepts passwords as input. | ||
- **Dense Layers**: A series of dense layers with activation functions (e.g., ReLU) process the input features. | ||
- **Output Layer**: The final layer outputs a classification score indicating password strength (e.g., weak - 0, medium - 1, strong - 2). | ||
|
||
## Training | ||
- The model is trained on a labeled dataset of passwords classified by strength. | ||
|
||
## Future improvements | ||
- In feature engineering, columns about the amount of common used passwords (etc. 'password') or common used words should be added and be taken into consideration properly in model training. |
Binary file not shown.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,68 @@ | ||
# disable debugging messages | ||
def warn(*args, **kwargs): | ||
pass | ||
import warnings | ||
warnings.warn = warn | ||
warnings.filterwarnings("ignore", category=DeprecationWarning) | ||
import os | ||
os.environ['TF_ENABLE_ONEDNN_OPTS'] = '0' | ||
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '1' | ||
from silence_tensorflow import silence_tensorflow | ||
silence_tensorflow("WARNING") | ||
|
||
import pandas as pd | ||
import pickle | ||
|
||
from tensorflow.keras.models import Sequential | ||
from tensorflow.keras.layers import Dense | ||
from model.utils.functions import calculate_entropy, count_repeats, count_sequential | ||
from model.utils.preprocessing import run_preprocessing | ||
from model.utils.training import run_training | ||
|
||
# run preprocessing and training | ||
# run_preprocessing() # uncomment to run preprocessing | ||
# run_training() # uncomment to train the model | ||
|
||
def prepare_input(password): # function to prepare input features from password | ||
# create a dataframe for a single input | ||
data = { | ||
'length': [len(password)], # calculate password length | ||
'lowercase_count': [sum(c.islower() for c in password)], # count lowercase characters | ||
'uppercase_count': [sum(c.isupper() for c in password)], # count uppercase characters | ||
'digit_count': [sum(c.isdigit() for c in password)], # count digits | ||
'special_count': [sum(not c.isalnum() for c in password)], # count special characters | ||
'entropy': [calculate_entropy(password)], # calculate entropy | ||
'repetitive_count': [count_repeats(password)], # count repetitive characters | ||
'sequential_count': [count_sequential(password)] # count sequential characters | ||
} | ||
|
||
with open('model/scaler.pkl', 'rb') as file: # load the fitted scaler from file | ||
scaler = pickle.load(file) | ||
|
||
# convert to dataframe | ||
input_df = pd.DataFrame(data) | ||
|
||
# normalize using the previously fitted scaler | ||
normalized_input = scaler.transform(input_df) | ||
|
||
return pd.DataFrame(normalized_input, columns=input_df.columns) # return normalized input as dataframe | ||
|
||
def predict(password): # function to predict password strength | ||
# load the model | ||
model = Sequential() # create a sequential model | ||
model.add(Dense(128, activation='relu', input_shape=(8,))) # add input layer with 128 neurons | ||
model.add(Dense(64, activation='relu')) # add hidden layer with 64 neurons | ||
model.add(Dense(3, activation='softmax')) # add output layer with softmax activation | ||
|
||
# load trained weights | ||
model.load_weights('model/deep_learning_model.h5') # load weights from the trained model file | ||
|
||
# prepare the input | ||
password_to_test = password # assign password to test | ||
input_features = prepare_input(password_to_test) # prepare input features | ||
|
||
# make the prediction | ||
prediction = model.predict(input_features, verbose=0) # predict using the model | ||
predicted_class = prediction.argmax(axis=-1) # get the predicted class index | ||
|
||
return predicted_class # return the predicted class |
Binary file not shown.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,18 @@ | ||
import numpy as np | ||
|
||
def calculate_entropy(password): # function to calculate the entropy of a password | ||
if len(password) == 0: # check if the password is empty | ||
return 0 # return 0 for empty passwords | ||
char_counts = np.array(list(password)) # convert password to a numpy array | ||
unique, counts = np.unique(char_counts, return_counts=True) # get unique characters and their counts | ||
probabilities = counts / len(password) # calculate the probability of each character | ||
entropy = -np.sum(probabilities * np.log2(probabilities)) # compute the entropy using the probabilities | ||
return entropy # return the calculated entropy | ||
|
||
def count_repeats(password): # function to count consecutive repeated characters in the password | ||
return sum(password[i] == password[i + 1] for i in range(len(password) - 1)) # sum the repeated characters | ||
|
||
def count_sequential(password): # function to count sequential characters in the password | ||
sequences = [''.join(chr(i) for i in range(start, start + 3)) for start in range(ord('a'), ord('z') - 1)] # generate sequences of 3 lowercase letters | ||
sequences += [''.join(str(i) for i in range(start, start + 3)) for start in range(10)] # generate sequences of 3 digits | ||
return sum(1 for seq in sequences if seq in password) # count how many of the sequences are in the password |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,32 @@ | ||
import pandas as pd | ||
import pickle | ||
|
||
from model.utils.functions import calculate_entropy, count_repeats, count_sequential | ||
from sklearn.preprocessing import StandardScaler | ||
|
||
def run_preprocessing(): | ||
# import data | ||
dataframe = pd.read_csv('model/passwords.csv', on_bad_lines='skip') # read csv data file | ||
dataframe = dataframe.dropna() # remove rows with empty values | ||
dataframe = dataframe.drop_duplicates(subset='password') # remove duplicates | ||
|
||
# add new columns | ||
dataframe['length'] = dataframe['password'].str.len() # column for password length | ||
dataframe['lowercase_count'] = dataframe['password'].apply(lambda x: sum(c.islower() for c in x)) # column for amount of lowercase characters | ||
dataframe['uppercase_count'] = dataframe['password'].apply(lambda x: sum(c.isupper() for c in x)) # column for amount of uppercase characters | ||
dataframe['digit_count'] = dataframe['password'].apply(lambda x: sum(c.isdigit() for c in x)) # column for amount of digits | ||
dataframe['special_count'] = dataframe['password'].apply(lambda x: sum(not c.isalnum() for c in x)) # column for amount of special characters | ||
dataframe['entropy'] = dataframe['password'].apply(calculate_entropy) # column for entropy | ||
dataframe['repetitive_count'] = dataframe['password'].apply(count_repeats) # column for amount of repetitive characters | ||
dataframe['sequential_count'] = dataframe['password'].apply(count_sequential) # column for amount of sequential characters | ||
|
||
scaler = StandardScaler() # use standard scaler because there is a gaussian distribution in passwords.csv | ||
numerical_features = ['length', 'lowercase_count', 'uppercase_count', 'digit_count', 'special_count', 'entropy', 'repetitive_count', 'sequential_count'] | ||
dataframe[numerical_features] = scaler.fit_transform(dataframe[numerical_features]) | ||
|
||
# save scaler model for future use | ||
with open('model/scaler.pkl', 'wb') as file: | ||
pickle.dump(scaler, file) | ||
|
||
# save preprocessed data | ||
dataframe.to_csv('model/output.csv', index=False, header=True) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,47 @@ | ||
# disable debugging messages | ||
def warn(*args, **kwargs): | ||
pass | ||
import warnings | ||
warnings.warn = warn | ||
warnings.filterwarnings("ignore", category=DeprecationWarning) | ||
import os | ||
os.environ['TF_ENABLE_ONEDNN_OPTS'] = '0' | ||
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '1' | ||
from silence_tensorflow import silence_tensorflow | ||
silence_tensorflow("WARNING") | ||
|
||
import pandas as pd | ||
|
||
from sklearn.model_selection import train_test_split | ||
from tensorflow.keras.models import Sequential | ||
from tensorflow.keras.layers import Dense | ||
from tensorflow.keras.utils import to_categorical | ||
|
||
|
||
def run_training(): # function to run the training process | ||
dataframe = pd.read_csv('model/output.csv') # load the processed data from output.csv | ||
|
||
# split the data into features and target variable | ||
X = dataframe[['length', 'lowercase_count', 'uppercase_count', 'digit_count', 'special_count', 'entropy', 'repetitive_count', 'sequential_count']] # feature columns | ||
y = dataframe['strength'] # target variable | ||
|
||
# convert target variable to categorical | ||
y = to_categorical(y) # convert labels to categorical format for multi-class classification | ||
|
||
# split into training and test sets | ||
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42) # 80-20 split | ||
|
||
# initialize the model | ||
model = Sequential() # create a sequential model | ||
model.add(Dense(128, activation='relu', input_shape=(X_train.shape[1],))) # add input layer with 128 neurons | ||
model.add(Dense(64, activation='relu')) # add hidden layer with 64 neurons | ||
model.add(Dense(y.shape[1], activation='softmax')) # add output layer with softmax activation | ||
|
||
# compile the model | ||
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy']) # compile the model with adam optimizer | ||
|
||
# train the model | ||
model.fit(X_train, y_train, epochs=50, batch_size=32, validation_split=0.2) # fit the model on training data | ||
|
||
# save the model to a file | ||
model.save('model/deep_learning_model.h5') # save the trained model |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,46 @@ | ||
absl-py==2.1.0 | ||
astunparse==1.6.3 | ||
certifi==2024.8.30 | ||
charset-normalizer==3.4.0 | ||
flatbuffers==24.3.25 | ||
gast==0.6.0 | ||
google-pasta==0.2.0 | ||
grpcio==1.67.0 | ||
h5py==3.12.1 | ||
idna==3.10 | ||
joblib==1.4.2 | ||
keras==3.6.0 | ||
libclang==18.1.1 | ||
Markdown==3.7 | ||
markdown-it-py==3.0.0 | ||
MarkupSafe==3.0.1 | ||
mdurl==0.1.2 | ||
ml-dtypes==0.4.1 | ||
namex==0.0.8 | ||
numpy==1.26.4 | ||
opt_einsum==3.4.0 | ||
optree==0.13.0 | ||
packaging==24.1 | ||
pandas==2.2.3 | ||
protobuf==4.25.5 | ||
Pygments==2.18.0 | ||
python-dateutil==2.9.0.post0 | ||
pytz==2024.2 | ||
requests==2.32.3 | ||
rich==13.9.2 | ||
scikit-learn==1.5.2 | ||
scipy==1.14.1 | ||
setuptools==75.2.0 | ||
silence_tensorflow==1.2.2 | ||
six==1.16.0 | ||
tensorboard==2.17.1 | ||
tensorboard-data-server==0.7.2 | ||
tensorflow-cpu==2.17.0 | ||
termcolor==2.5.0 | ||
threadpoolctl==3.5.0 | ||
typing_extensions==4.12.2 | ||
tzdata==2024.2 | ||
urllib3==2.2.3 | ||
Werkzeug==3.0.4 | ||
wheel==0.44.0 | ||
wrapt==1.16.0 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters