Skip to content

Latest commit

 

History

History
63 lines (57 loc) · 4.48 KB

README.md

File metadata and controls

63 lines (57 loc) · 4.48 KB

What's Stroking Your Brain?

Every 40 seconds, someone in the United States suffers from a stroke.


In fact, if you've already had a stroke, you are 25% more likely to have another one.


While numerous studies have been done on trying to figure out what triggers strokes in various individuals, it is still difficult to pinpoint exactly what triggers affect whom and by how much.


But that doesn't stop us from trying to find out!


Using the brain stroke prediction data set from kaggle, we wanted to see if we can use supervised machine learning to see what categories/variables play a role in predicting the probability of someone having a stroke.


Tools/Programs Used:

  • Jupyter Notebook
  • Tableau
  • Visual Studio Code

  • Libraries Used:

  • Bootstrap
  • D3.js
  • PANDAS
  • Matplotlib
  • Sklearn

  • Languages Used:

  • Python
  • HTML
  • Javascript
  • CSS

  • Cleaning Up The Data Set:

    Using Jupyter Notebook, and importing PANDAS, we cleaned up our data. Dropping any null values found, dropping the children (anyone under the age 18), and binning the ages of the participants into either "Below 40" or "40+", and the glucose levels and BMIs accordingly (based on their official medical groupings and/or classifications), our E.T.L. was complete.


    Applying The Machine Learning:

    With the data set in hand, we applied sklearn and imported all the dependencies needed to run our analysis. We dropped the patients that were classified as "children" and only worked with patients who were 18 and up. Then, we applied the pd.get_dummies onto our data and changed all of the object/string data types to int since the machine learning wouldn't work on previous data types. Now, we can set the y variable to the ["stroke"] column while assigning the rest of the dataset to the X, where we split the data into X_train, X_test, y_train, and y_test, and using the standard scaler to transform our data into z-scores as to not skew our results. Once scaled, we applied the Random Forest Classifier. In order to attempt to fine tune our model, we searched for the best hyper-parameters using the RandomizedSearchCV and applied those. Then we used the .best_params_ and applied those specifications, giving us our final model.

    import_feat


    Creating The fUZZbEED Page:

    fuzzed

    Using a template from the website themewagon.com and editing the code for the quiz from Gauri Khandke, we developed the front page of fUZZbEED to be not only educational, but also relatable. The quiz is now a 3 question quiz with 3 possible responses per question, no time limit, and instead of only adding 1 point for the correct question, it adds a different number of points correlating to the risk assessment value of each response (with a maximum number of 9 points available). Following the quiz is not only an analysis of our model, but also a few interactive Tableau charts that also correlate with our findings. We wanted to keep everything casual, emulating those older websites that became popular with their endless quizzes and listicles, while providing information to those in search of it.

    Click Here To Visit The Site: fUZZbEED


    Quiz


    Contributors:

    Tanisha Cooper

    GitHub: https://github.com/TanishaCooper

    LinkedIn: https://www.linkedin.com/in/tanisha-cooper-5b3743197/


    Diandra McNeill

    GitHub: https://github.com/dmcneill0711

    LinkedIn: https://www.linkedin.com/in/diandra-mcneill-765410233/


    Anna Pettigrew

    GitHub: https://github.com/annapettigrew

    LinkedIn: https://www.linkedin.com/in/anna-pettigrew/