Fixed real-time integration with PyCaret for Credit Card Fraud Detect…

…ion (#370)
atoti · Oct 10, 2022 · 4a45332 · 4a45332
1 parent fb977de
commit 4a45332
Show file tree

Hide file tree

Showing 18 changed files with 4,799 additions and 140 deletions.
diff --git a/...ement/operational-risk/credit-card-fraud-detection/03-AutoML-PyCaret-classification.ipynb b/...ement/operational-risk/credit-card-fraud-detection/03-AutoML-PyCaret-classification.ipynb
@@ -84,11 +84,11 @@
     "#     print(\"Downloading: %d%% [%d / %d] bytes\" % (current / total * 100, current, total))\n",
     "\n",
     "\n",
-    "# url = \"https://data.atoti.io/notebooks/credit-card-fraud/output.zip\"\n",
+    "# url = \"https://data.atoti.io/notebooks/credit-card-fraud/data.zip\"\n",
     "# filename = wget.download(url, bar=bar_custom)\n",
     "\n",
     "# # unzipping the file\n",
-    "# with ZipFile(\"output.zip\", \"r\") as zipObj:\n",
+    "# with ZipFile(\"data.zip\", \"r\") as zipObj:\n",
     "#     # Extract all the contents of zip file in current directory\n",
     "#     zipObj.extractall(data_path)"
    ]
@@ -5109,7 +5109,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.8.12"
+   "version": "3.9.13"
   }
  },
  "nbformat": 4,

diff --git a/.../finance/risk-management/operational-risk/credit-card-fraud-detection/README.md b/.../finance/risk-management/operational-risk/credit-card-fraud-detection/README.md
@@ -6,6 +6,15 @@ This use case comprises of 3 main sections:
 - Detect credit card fraud with autoML, choosing the best model with PyCaret.
 - Real-time credit card analysis to investigate suspicious transaction flagged by the ML.
 
+Below depicts the flow between the libraries and their usage:  
+<img src="./img/app_flow.png" />
+
+This use case uses [PyCaret 2.3.4](https://pycaret.org/).  
+As the latest version of atoti has conflicting dependencies with PyCaret, the two programs are running on separate virtual environment, communicating through endpoints.
+
+<img src="./img/system_design.png" />  
+
+
 ## 1 Synthetic data generation
 
 [01-Synthetic-data-generation.ipynb](./01-Synthetic-data-generation.ipynb) is adapted from the GitHub repository [Sparkov_Data_Generation](https://github.com/namebrandon/Sparkov_Data_Generation). It makes use of Faker to generate customers and credit card transactions with varying profiles:
@@ -70,9 +79,28 @@ This notebook covers the following:
   - we used the LGBM model to perform fraud prediction
   - load transactions and its prediction into atoti
 - Evaluate incoming transaction from atoti web application on http://localhost:10327.
-- Create **source simulation** in atoti with prediction (with and without cumulative features) from:
-  - LGBM
-  - DT
-  - anomaly detection
 
-This allows us to compare the performance of the models and also decide if the additional cumulative features are necessary.
+### Real-time fraud prediction
+
+To test the real-time fraud detection, start the Flask application included under the `atoti-pycaret` package. Follow the [README.md](./atoti-pycaret/README.md) included under the package on how to start the application.  
+
+Alternatively, you can always integrate your own machine learning models and update the REST URI under the function `get_prediction` in the [main.ipynb](./main.ipynb):
+
+```
+def get_prediction(features_df):
+    url = "http://127.0.0.1:105/predict"
+    header = {"Content-Type": "application/json"}
+
+    payload = {
+        "features": features_df.to_json(orient="records"),
+    }
+
+    try:
+        response = requests.post(url, json=payload)
+
+        prediction = pd.DataFrame.from_dict(response.json())
+        return prediction
+
+    except requests.exceptions.HTTPError as e:
+        print(e.response.text)
+```
diff --git a/...isk-management/operational-risk/credit-card-fraud-detection/atoti-pycaret/.python-version b/...isk-management/operational-risk/credit-card-fraud-detection/atoti-pycaret/.python-version
@@ -0,0 +1 @@
+3.8.7
diff --git a/...management/operational-risk/credit-card-fraud-detection/atoti-pycaret/README.md b/...management/operational-risk/credit-card-fraud-detection/atoti-pycaret/README.md
@@ -0,0 +1,53 @@
+# Endpoint for Credit Card Fraud prediction
+
+The package automl consists of machine learning models that we have trained using [PyCaret](https://pycaret.org/).
+
+By creating a small [Flask application](https://flask.palletsprojects.com/en/2.2.x/), we are able to create an endpoint that takes in the features for the model to perform fraud prediction.
+
+# Installation
+
+Set up the virtual environment for the project using the below command:
+```
+poetry install
+```
+
+Refer to the [poetry documentation](https://python-poetry.org/docs/master/#installing-with-the-official-installer) for more information on the package manager.
+
+
+# Runtime
+To launch the Flask application, run the following command:
+```
+poetry run python .\automl\prediction.py
+```
+
+You should able to see the following:
+
+<img src="../img/flask_endpoint.png">
+
+We can post requests to the endpoint at http://127.0.0.1:105/predict, e.g.  
+
+```
+def get_prediction(features_df):
+    url = "http://127.0.0.1:105/predict"
+    header = {"Content-Type": "application/json"}
+
+    payload = {
+        "features": features_df.to_json(orient="records"),
+    }
+
+    try:
+        response = requests.post(url, json=payload)
+
+        prediction = pd.DataFrame.from_dict(response.json())
+        return prediction
+
+    except requests.exceptions.HTTPError as e:
+        print(e.response.text)
+```
+
+You can verify that the requests are received by the endpoint through the shell running this program:
+
+<img src="../img/request_received.png"/>  
+
+
+The endpoint returns a Pandas Dataframe containing the features and their corresponding prediction.
diff --git a/...-risk/credit-card-fraud-detection/atoti-pycaret/automl/models/Final_DT_Model_20211130.pkl b/...-risk/credit-card-fraud-detection/atoti-pycaret/automl/models/Final_DT_Model_20211130.pkl
diff --git a/...-risk/credit-card-fraud-detection/atoti-pycaret/automl/models/Final_ET_Model_20211130.pkl b/...-risk/credit-card-fraud-detection/atoti-pycaret/automl/models/Final_ET_Model_20211130.pkl
diff --git a/...it-card-fraud-detection/atoti-pycaret/automl/models/Final_IForest_Full_Model_20211201.pkl b/...it-card-fraud-detection/atoti-pycaret/automl/models/Final_IForest_Full_Model_20211201.pkl
diff --git a/.../credit-card-fraud-detection/atoti-pycaret/automl/models/Final_IForest_Model_20211201.pkl b/.../credit-card-fraud-detection/atoti-pycaret/automl/models/Final_IForest_Model_20211201.pkl
diff --git a/...isk/credit-card-fraud-detection/atoti-pycaret/automl/models/Final_LGBM_Model_20211130.pkl b/...isk/credit-card-fraud-detection/atoti-pycaret/automl/models/Final_LGBM_Model_20211130.pkl
diff --git a/...anagement/operational-risk/credit-card-fraud-detection/atoti-pycaret/automl/prediction.py b/...anagement/operational-risk/credit-card-fraud-detection/atoti-pycaret/automl/prediction.py
@@ -0,0 +1,33 @@
+from flask import Flask, jsonify, request
+import pandas as pd
+import pycaret.classification as pyc
+import pickle
+import os
+
+app = Flask(__name__)
+
+dir_path = os.path.dirname(os.path.realpath(__file__))
+print(dir_path)
+
+
+def predict(df):
+    model = pyc.load_model("./automl/models/Final_LGBM_Model_20211130")
+    return pyc.predict_model(model, data=df)
+
+
+@app.route("/predict", methods=["POST"])
+def predict_model():
+    test = request.json
+
+    features_json = test["features"]
+    features_df = pd.read_json(features_json)
+    print(f"Features received: {len(features_df)}")
+
+    model_prediction = predict(features_df)
+    print(f"Prediction completed for {len(model_prediction)}")
+
+    return model_prediction.to_json(orient="records")
+
+
+if __name__ == "__main__":
+    app.run(host="0.0.0.0", port=105)