Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[REVIEW] Full 2015 Data Update #14

Merged
merged 3 commits into from
Feb 3, 2020
Merged

[REVIEW] Full 2015 Data Update #14

merged 3 commits into from
Feb 3, 2020

Conversation

gumdropsteve
Copy link
Owner

@gumdropsteve gumdropsteve commented Feb 3, 2020

2015 Taxi Data Download & Processing Update

Users can now download & pre-process all 12 months of 2015 NYC Taxi (yellow cab) data. Total download size is ~20.07 GB before processing and ~18.94 GB (135,216,505 rows) after processing.

NOTE: taxi_dashboard.ipynb does NOT yet point to this new data. This will be implemented soon, but issues such as single-GPU users utilizing the data need to be addressed first.

New Files

  • data_download.ipynb
    • based off HoloViz taxi_preprocessing_example.py
    • downloads & processes all 12 months of 2015 NYC taxi data
    • uses BlazingSQL & Numpy to configure data for use with Datashader / HoloViews
      • single node / processes 1 month at a time to ensure anyone w/ compatible GPU can run
      • tested w/ 16GB Tesla T4 on AWS, runs end-to-end in 7-8 min
      • final visualization under "Extra" (at end) calls thru August (8/12 months)
        • this was the largest that was able to process w/o kernel crashing
  • sql_check.py
    • based off RAPIDS sql_check.py
    • checks for installation of BlazingSQL & installs via Anaconda if not found
    • called in data_download.ipynb imports section if BSQL not found & user wants to install

Related

Resolves #13

gumdropsteve and others added 3 commits February 2, 2020 11:54
# Initial progress 
* NYC Taxi Dashboard update (#7)
* [TEST] updated to process all Q1 data + rm elif cases & ChristmasNYC
* wildcard data path; map if/elifs; sql input errors
* need to find 'from download_sample_data import bar as progressbar' then this is ready
* first 100k rows converted 2015
* should work, for testing with hv
* for testing w/ ds/hv
* data seems to be the issue; origional convetsion not displaying properly either
#### Merge adjustments related to #12 
* Delete test_bsql_converted_taxi.csv
No longer in use
* Delete test_bsql_conversion.ipynb
No longer in use
* Delete bsql_preprocessing_taxi.py
No longer in use -> switch to `download_data.ipynb` of `data/download` branch
@gumdropsteve gumdropsteve added the enhancement New feature or request label Feb 3, 2020
@gumdropsteve gumdropsteve self-assigned this Feb 3, 2020
@gumdropsteve gumdropsteve merged commit a0ebada into master Feb 3, 2020
@gumdropsteve gumdropsteve deleted the data/download branch February 3, 2020 08:11
gumdropsteve added a commit that referenced this pull request Feb 3, 2020
Merge pull request #16 from gumdropsteve/master
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

More data please
1 participant