GitHub - cmmalone/malone_OpenDataSciCon: repository for code related to the end-to-end data analysis in python workshop, from the Open Data Science Conference 2015

Welcome to the repository hosting code for the end-to-end data analysis workflows in python workshop!

Author: Katie Malone, data scientist @ Civis Analytics

Getting started: There are two ipython notebooks containing all the relevant code:

african_wells.ipynb (starter code, and some relevant explanations/links)
african_wells_solutions.ipynb

The first has prompts only and will hopefully be the only one you need. If you get stuck, or you are going through this workshop asynchronously, the second one might be a useful reference.

In order to work through this example, you will need the training and testing data associated with the Pump it Up: Data Mining the Water Table hosted on drivendata.org. In the notebooks, we call the training feature and labels files wells_features.csv and wells_labels.csv, respectively.

Software requirements are as follows:

python 3 (2.x might work with minimal changes, but no guarantees)
ipython
pandas
numpy
scipy
sklearn

If you are starting from scratch, and have none of the above installed, consider getting them via Anaconda, which includes all of the above (and more!) in an easy-to-use bundle.

Last, as you get toward the bottom of the notebook and GridSearchCV, you may want to consider porting your notebook workflow into a python script that can be run via your terminal. I have anecdotally found that some commands were running very slowly in the notebook, but faster when put in a script.

When you've made a workflow that you're satisfied with, I strongly suggest that you submit it to drivendata.org, and get involved in the competition!

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
LICENSE		LICENSE
README.md		README.md
african_wells.ipynb		african_wells.ipynb
african_wells_solutions.ipynb		african_wells_solutions.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Releases

Packages

License

cmmalone/malone_OpenDataSciCon

Folders and files

Latest commit

History

Repository files navigation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Packages