hdf5-pydata-munich

Introduction to HDF5 in Python.

If you are just curious and want to have a look at the notebook without installing anything, go to http://nbviewer.jupyter.org/ and type jackdbd/hdf5-pydata-munich in the search bar.

Installation

Create a Python 3.5 virtual environment. It seems that at this moment Bokeh has some issues with Python 3.6.

pip install -r requirements.txt

Usage

# start the notebook server
jupyter notebook  --port 8085
# open your browser and go to:
# http://localhost:8085/notebooks/hdf5_in_python.ipynb

Instructions to build the HDF5 file used in the notebook

Visit the NYC Taxi & Limousine Commission website and download the CSV files from the 2015 Yellow taxi dataset (TLC Trip Record Data). You can also download just one month (e.g. January) to try these snippets out.
Place the csv files here: hdf5-pydata-munich/data/nyctaxi/2015/<your-file-here>.csv
Create the HDF5 file which contains all the tables (1 table per month) with:

cd snippets
python create_taxi_table.py

This creates the HDF5 file NYC-yellow-taxis-10k.h5.

store a sample of each CSV file in the tables with:

python append_to_taxi_table.py

This reads a chunk of 10000 rows from all the CSV files that you downloaded, then stores the results in the HDF5 file NYC-yellow-taxis-10k.h5. This is just a small sample of the original dataset. If you want to store the entire dataset (~12 million rows per month), just remove the break statement in append_to_taxi_table.py.

To view the structure of the tables you can use a HDF5 viewer like HDFView, HDF Compass or ViTables.

Create a huge HDF5 file

If you want to play around with a huge HDF5 file, I created a snippet that generates some synthetic data. You can run it with:

python create_huge_hdf5_file.py

This takes roughly 5 minutes to run and creates the HDF5 file pytables-clinical-study.h5 which should be around 5GB in size. You can tweak the code just a little bit to create even bigger files.

Name		Name	Last commit message	Last commit date
Latest commit History 49 Commits
data/nyctaxi/2015		data/nyctaxi/2015
img		img
snippets		snippets
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
hdf5_in_python.ipynb		hdf5_in_python.ipynb
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

hdf5-pydata-munich

Installation

Usage

Instructions to build the HDF5 file used in the notebook

Create a huge HDF5 file

About

Releases

Packages

Languages

License

jackdbd/hdf5-pydata-munich

Folders and files

Latest commit

History

Repository files navigation

hdf5-pydata-munich

Installation

Usage

Instructions to build the HDF5 file used in the notebook

Create a huge HDF5 file

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages