Skip to content

Building a model for identifying potential Enron fraudsters based on financial and e-mail data with the use of Python. From data exploration to building and validating an algorithm.

Notifications You must be signed in to change notification settings

illi4/Enron_fraud

Repository files navigation

Using Machine Learning to identify Enron fraudsters

In this project, I will build a model for identifying potential fraudsters based on financial and e-mail data. For this, the following steps will be performed:

  • data exploration (learning about the data, cleaning and preparing the data)
  • feature selection and engineering (selecting the most significant features and creating new ones)
  • reducing the dimensionality of the data using principal component analysis
  • selection and tuning a supervised machine learning algorithms
  • validating the algorithm to ensure acceptable performance of the model

Results

The results are saved in the Jupyter notebook file in the repository.

Files

The following additional files can be found in the repository:

  • Enron_final.html: results in the html format.
  • final_project_dataset.pkl: dataset in pkl format.
  • final_project_dataset_modified.pkl, my_classifier.pkl, my_dataset.pkl, my_feature_list.pkl: files created as a result of project implementation.
  • poi_id.py: script with the python code referred to in the results file, as well as the final classifier.
  • tester.py: script used to test the classifier.
  • tools folder: scripts used for data processing.

About

Building a model for identifying potential Enron fraudsters based on financial and e-mail data with the use of Python. From data exploration to building and validating an algorithm.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published