Skip to content

anamabo/Predicting_failure_hard_disks

Repository files navigation

Predicting Hard disk failure at the Delta center in the Netherlands

From centuries, Dutch people have pumped the water of the lakes and the sea in order to build big cities on the new dry land. That is why around of sixty percent of the surface area of the Netherlands is bellow the sea level, with a high risk of flooding. In order to prevent an overflow that could destroy the western part of the country, artificial beaches, sand dunes and dikes were built to absorb the forces of a rising sea. However, the Dutch hydraulic system was not built and maintained properly until the 50's. Proof of that were the effects of the most devastating flood in the Netherlands' history, where 1800 people and 200000 animals died as a result of the collapse of the dikes' structure.

The delta project started in 1953, twenty days after the flooding. The aim of the Delta project was to build a complex system of automatic dikes, barriers and dams that control the sea level and drain off the excess of water coming from the large rivers. Currently, the Netherlands has 700 km of dikes, which are divided in 53 dike areas. The dikes and damns are controlled with supercomputers, which monitor the status of these structures 24 hours per day. A damage in the supercomputer; for instance, a failure in some of its hard disks, would produce devastating effects that would result in another flood.

The aim of this project is to predict the number of hard disks that fail during the first week of 2016 at the Delta center in the Netherlands. For this task, I analyze the measurements of different hard disks’ features during the year of 2015.

This repository contains the following files:

  • Capstone_report.pdf: File that explains all the data analysis that was carried out to make the predictions.

  • Capstone project_proposal_CAMartinez.pdf: the introduction of the problem and the database used to make the analysis and predictions.

  • data_reading_and_wrangling.ipynb: script that makes the cleaning of the data.

  • exploratory_data_analysis.ipynb: script that makes a exploratory analysis of data.

  • statistical_analysis.ipynb: script that makes a statistical analysis of the data. In particular, it looks for the features where the distribution of failed disks is different from the distribution of working disks.

  • machine_learning.ipynb: script that uses different machine learning techniques to predict the failed hard drives at the delta center in the Netherlands.

  • final_presentation: slide deck of the project.

About

First Data Science project

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published