Predicting Hard Drive Failures Using ML

Using Backblaze dataset on Kaggle.

About this project:

This was a Data Science Case Study. Dataset used for this project is private but a similar dataset and project can also be found on Kaggle.com

Disclaimer: This case study is based on a sample subset of a larger dataset and does not accurately solve the problem. Case study is done to demonstrate the use of different tools and libraries in ML, how to present your reports, use python for ML.

Sample Dataset:

A sample of SMART hard drives dataset can be found and downloaded at: https://www.kaggle.com/backblaze/hard-drive-test-data

What are SMART systems ?

SMART features or S.M.A.R.T. (Self-Monitoring, Analysis and Reporting Technology) is a software monitoring system for hard drives. SMART generates a collection different metrics related to help evaluate the overall health of a Hard Drive.

A single metrics may not always determine the exact failure prediction but are commonly accepted to help identify any imminent failure and help handle the backup and restore, in time.

About this case study :

This case study relies on a given data stream provided for this purpose. The goal of this case study is to try and analyze given data and find out meaningful information that can help determine drives failure trends and different factors that may idicate if a drive would fail, and attempt to propose a more data driven answer to future failures based on SMART metrics.

The study concludes with discussing possible opportunities and challenges with existing model and features that can help design a better predictive model for future.

Solution:

Full Analysis in Jupyter Notebook

To access the entire analysis code in Jupyeter notebook, go to: Predicting Hard drive failure

Overview of the approach

Here's a quick overview of how this problem has been approached:

Extraction and Load

Connect to the postgres server.
Download the dataset offline

Transform

Wrangle and explore
Change Dimentions, clean and slice and dice

Analyze

Analyze dataset, plot most significant trends

Predict:

Feature Selection
Model and predict

Sample report overview:

(This is Optional)

1. Number Hard Drives per model

2. Number of positive failures by model

3. Failure Trend over time

4. Daily Failure Trend to determine missing failure data pattern

and more...

Conclusion and Improvement Ideas:

Conclusion
Challenges with the current dataset and ways to improve it

Tech stack:

python, sql, pandas, scikit and other machine learning libaries, postgres

@ author:

@geekidharsh : I am Data Engineer with 4+ years of experience in E-commercal and Digital Acquisition. Analyzing swiftly changing user behaviors to make data driven decisions, at scale. Currently, I work at at Merck KGaA

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
.ipynb_checkpoints		.ipynb_checkpoints
graphs		graphs
images		images
python		python
reports		reports
sql		sql
.DS_Store		.DS_Store
.gitignore		.gitignore
Predicting Hard Drive Failure - A Data Science Case Study.ipynb		Predicting Hard Drive Failure - A Data Science Case Study.ipynb
Predicting Hard Drive Failure - A Data Science Case Study.slides.html		Predicting Hard Drive Failure - A Data Science Case Study.slides.html
README.md		README.md
notebook.tex		notebook.tex
output_22_1.png		output_22_1.png
sample_out_data_from_tablename_hard_drive_stats.csv		sample_out_data_from_tablename_hard_drive_stats.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Predicting Hard Drive Failures Using ML

About this project:

Sample Dataset:

What are SMART systems ?

About this case study :

Solution:

Full Analysis in Jupyter Notebook

Overview of the approach

Extraction and Load

Transform

Analyze

Predict:

Sample report overview:

Conclusion and Improvement Ideas:

Tech stack:

@ author:

About

Releases

Packages

Languages

geekidharsh/predicting-harddrive-failures-using-ml

Folders and files

Latest commit

History

Repository files navigation

Predicting Hard Drive Failures Using ML

About this project:

Sample Dataset:

What are SMART systems ?

About this case study :

Solution:

Full Analysis in Jupyter Notebook

Overview of the approach

Extraction and Load

Transform

Analyze

Predict:

Sample report overview:

Conclusion and Improvement Ideas:

Tech stack:

@ author:

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages