Skip to content

Commit

Permalink
Merge pull request #48 from SANDAG/visualizer
Browse files Browse the repository at this point in the history
RSM Visualizer
  • Loading branch information
AshishKuls authored Aug 31, 2023
2 parents fe07461 + a601c67 commit c371429
Show file tree
Hide file tree
Showing 117 changed files with 126,192 additions and 1 deletion.
5 changes: 4 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,9 @@ rsm.egg-info
_version.py
.DS_Store
test/data/*
notebooks/**/*.omx
visualizer/simwrapper/data/processed/pipeline.log
visualizer/simwrapper/data/external
*.pyc
sandag_rsm.egg-info/*
site/*
site/*
4 changes: 4 additions & 0 deletions .gitmodules
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
[submodule "visualizer/Data-Pipeline-Tool"]
path = visualizer/Data-Pipeline-Tool
url = https://github.com/SANDAG/Data-Pipeline-Tool.git
branch = rsm-visualizer
27 changes: 27 additions & 0 deletions visualizer/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
# RSM Visualizer
Model Results Visualizer using SimWrapper for Rapid Strategic Model (RSM)

## How to Setup
- Download the data for 'donor_model' and 'rsm_base' scenarios from the shared folder (contact Joe from SANDAG or Arash from WSP) and place it in the 'visualizer\simwrapper\data\external' folder.

- The visualizer is set up to compare three scenarios - Donor (full) Model, RSM Baseline and RSM Scenario. Each scenario folder in the external directory should have 'input' and 'report' as sub-folders.

- For each of the scenario folder, 'report' folder has the files that are generated as part of the data exporter step in the model and 'input' folder only needs to have mgra_crosswalk.csv and households.csv file for RSM scenarios.

## Configuration
- 'config/scenarios.yaml' file specifies the user configuration for the three scenarios. This file does not need to be modified unless config changes are desired by the user.

## How to Run
- Open Anaconda prompt and change the directory to visualizer folder in your local RSM repository.

- Run the process scenario script by typing command below and then press enter.

`python process_scenarios.py`

- Processing the scenario using pipeline will take some time.

- Next, open this link in the web browser
https://simwrapper.github.io/site/

- Click on 'Enter Site' button, then click on 'add local folder' and add simwrapper directory (visualizer\simwrapper) to run the SimWrapper Visualizer for RSM.

16 changes: 16 additions & 0 deletions visualizer/bin/run-pipeline.bat
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
@echo on

SET SETTINGS_FILE=%1
SET DATA_PIPELINE_PATH=%2

:: change the directory to data-pipeline-tool folder
cd /d %DATA_PIPELINE_PATH%

:: create conda environment using the environment.yml
CALL conda env create -f environment.yml

:: activate the conda environment
CALL conda activate sandag-rsm-visualizer

:: run the script for data pipeline tool
python run.py %SETTINGS_FILE%
11 changes: 11 additions & 0 deletions visualizer/bin/run-visulizer-support.bat
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
@echo on

SET VISUALIZER_PATH=%1
SET CONFIG=%2

:: activate the conda environment (created in run-pipeline.bat call)
CALL conda activate sandag-rsm-visualizer

:: run the support script.
cd /d %VISUALIZER_PATH%
python visualizer_support.py %CONFIG%
17 changes: 17 additions & 0 deletions visualizer/config/config_visualizer_support.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
inputs:
shapefile_dir: simwrapper\data\external\shapefile
cross_reference_mgra_file_name: mgra_crosswalk.csv
mode_summary_file_name: trip_mode_summary.csv
vmt_summary_file_name: vmt_summary.csv
compared_scenarios_dir: simwrapper\data\processed\all_runs
intrazonal_distance_mode_file_name: intrazonal_distance_by_mode_summary.csv
trip_od_summary_file_name: trip_od_summary.csv
zero_car_summary_file_name: zero_car_households_summary.csv
parameters:
rsm_scenario_list:
- rsm_before_calibration
- rsm_calibrated
base_scenario_list:
- donor_model
outputs:
total_vmt_file_name: vmt_total_summary.csv
22 changes: 22 additions & 0 deletions visualizer/config/scenarios.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@

rsm_scenarios:
rsm_base:
name: "run_base"
input: \simwrapper\data\external\rsm_base\input
report: \simwrapper\data\external\rsm_base\report
output: \simwrapper\data\processed\rsm_base\

rsm_scen:
name: "rsm_scen"
input: \simwrapper\data\external\rsm_scen\input
report: \simwrapper\data\external\rsm_scen\report
output: \simwrapper\data\processed\rsm_scen\

base_scenarios:
donor_model:
name: "donor_model"
input: \simwrapper\data\external\donor_model\input
report: \simwrapper\data\external\donor_model\report
output: \simwrapper\data\processed\donor_model\

shapefiles: \simwrapper\data\external\shapefile
8 changes: 8 additions & 0 deletions visualizer/pipeline/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class

# Outputs
output/*
!output/.gitkeep
101 changes: 101 additions & 0 deletions visualizer/pipeline/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,101 @@
# Data Pipeline Tool

The Data Pipeline Tool aims to aid in the process of building data pipelines that ingest, transform, and summarize data by taking advantage of the parameterization of data pipelines. Rather than coding from scratch, configure a few files and the tool will figure out the rest.

## Background

Data pipelines across projects can vary heavily in terms of the data that is used, how the data is transformed, and the eventual summaries that are required and produced. However, the fundamental ways to develop these data pipelines, from a technical perspective, tend to overlap. For example, in Python, one may repeatedly use *pd.read_csv()* to read in data or *pd.merge()* to combine data sets. Ultimately, these pipelines, to a certain degree, can be parameterized to minimize redundancy in implementation. This is precisely the reason for the creation of this tool.

## Configuring A Pipeline

To configure your data pipeline, you will need to edit the following three files in the `config` directory:

- `settings.yaml`: Main settings file for the tool. Controls the overall flow of the data pipeline.

- `processor.csv`: Contains expressions that are used to process data before or after summarization.

- `expressions.csv`: Contains expressions to summarize data and controls how summary tables are written upon tool completion.

- `user_added_functions.py`: Contains user-defined functions that can be called in the processor.

The following describes the contents of these files and how they can be edited.

### `settings.yaml`

This file consists of the following properties. Please see [this](config/settings.yaml) for an example.

- `extract`: Root property -- controls data extraction (reading). Note: Multiple data sources can be specified by the user.

- `filepath` (str): File path to data source
- `test_size` (int): Number of rows to read from input data -- for testing purposes. Leave empty to read all rows.
- `data`: List of files at the specified data *filepath* to read

- `transform`: Root property -- controls data processing.
- `processor` (str): File path to processor specification file
- `expressions` (str): File path to summary expressions specification file
- `steps`: Lists the processing steps to execute. Note: User can create as many necessary steps. The order of processing, concatenating, and merging that is specified will be followed for each step.
- `name` (str): User-defined name of processing step
- `process` (bool): True or False -- whether to run processor. Note: Only the processor expressions corresponding to a step will be executed
- `summarize`: (bool): True or False -- whether to run summarizer. Note: This property should only be called **once**. Once called, only the tables resulting from the summary expressions will be available for post-processing.
- `concat`: Controls data concatenation
- `table_name` (str): User-defined name of resulting table after concatenation
- `include`: List of data set names to concatenate. Note: Names are either those loaded in *extract* without the file extension or the user-defined names resulting from a previous concatenation or merge.
- `merge`: Controls data merging
- `table_name` (str): User-defined name of resulting table after merge
- `include`: List of data set names to merge. Note: Names are either those loaded in *extract* without the file extension or the user-defined names resulting from a previous concatenation or merge.
- `merge_cols`: List of columns to merge two data sets on. Note: The order of the columns must match the order specified in *in
- `merge_type` (str): Merge type -- 'left', 'right', 'inner', or 'outer' merge are supported

- `load`: Root property -- controls results loading/writing
- `outdir` (str): File path of directory to write results to
- `empty_fill` (str or numeric): Value to use for filling missing values in output results

### `processor.csv`

This file consists of the following fields. Please see [this](config/processor.csv) for an example.

- `Description`: User-specified description of what the processing row accomplishes
- `Step`: User-defined *processing step* name that the row belongs to
- `Type`: Processing type of the row
- `column`: Generate a new field from a combination of fields or a transformation of a field in *Table* as defined by *Func*. Note: User does not need to specify *In Col* if used.
- `rename`: Rename field(s) as defined by the dictionary in *Func*. Note: User does not need to specify *In Col* or *Out Col* if used.
- `replace`: Replace values in a field as defined by the dictionary in *Func*
- `bin`: Bin values in a field into discrete values as defined by the intervals in *Func*
- `cap`: Cap values in a field to a maximum value specified in *Func*
- `apply`: Apply a Pandas Series apply() function to every element in a field as defined by *Func*. Note: Function should be written as if writing directly within apply().
- `sum`: Take the row-wise sum of multiple columns as specified by comma delimited names in *In Col*. Note: User does not need to specify *Func* if used.
- `skim`: Query skim (.omx) origin-destination pairs as specified by the comma delimited pairs in *In Col* and the skim matrix specified in *Func*
- `raw`: Evaluate raw Python expression as defined by *Func*. Note: User does not need to specify *In Col*, *Out Col*, or *Table* if used.
- `Table`: Name of the table to evaluate the processor row on. Note: Names are either those loaded in *extract* without the file extension or the user-defined names resulting from a previous concatenation or merge.
- `Out Col`: Field name of processing result -- added to *Table*
- `In Col`: Field name of fie*Table* to apply processing
- `Func`: Function/expression to use for processing

### `expressions.csv`

This file consists of the following fields. Please see [this](config/expressions.csv) for an example.

- `Description`: User-specified description of what the summarization row accomplishes
- `Out Table`: User-defined summary table name to add result to. Npte: Unique set of table names in this column will be written out upon the tools completion.
- `Out Col`: Field name of expression result -- added to *Out Table*
- `In Table`: Name of the table to evaluate the expression on. Note: Names are either those loaded in *extract* without the file extension or the user-defined names resulting from a previous concatenation or merge.
- `Filter`: Pandas query filter to apply to *In Table* before evaluating expression
- `In Col`: Field name of field in *In Table* to apply expression
- `Func`: Pandas Series method to apply to *In Col* as defined by the [Pandas API](https://pandas.pydata.org/docs/reference/api/pandas.Series.html). Users can also specify a custom Python expression using the *Out Col* names of expressions previously evaluated (much like a measure in PowerBI) -- for such cases, *In Col*, *Filter*, and *In Table* do not need to be specified.
- `Group`: Comma delimited field names of fields in *In Table* to use for group aggregations

### `user_added_functions.py`

Any function defined in this script will be able to be called in the processor.

## Running A Pipeline

To run the configured pipeline, do the following:

1. Open Anaconda 3 Prompt
2. Create an Anaconda environment using the provided environment.yml file: `conda env create -f environment.yml`
3. Activate the newly created environment: `conda activate pipeline`
4. Change directory to the project folder
5. Configure the data pipeline as described in *Configuring A Pipeline*
6. Run the tool by executing the following: `python run.py`
7. When the process finishes, all resulting summary files are written to the output directory specified in `settings.yaml`
24 changes: 24 additions & 0 deletions visualizer/pipeline/config/expressions.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
Description,Out Table,Out Col,In Table,Filter,In Col,Func,Group
#,,,,,,,
Mode Share,trip_mode_summary,num_trips,trips,,weightTrip,sum,tripMode
Intrazonal summary,intrazonal_distance_by_mode_summary,distance,trips,originMGRA == destinationMGRA,distanceDrive,sum,tripMode
Number of transit boardings,transit_summary,EA,transit_onoff,TOD=='EA',BOARDINGS,sum,MODE
Number of transit boardings,transit_summary,AM,transit_onoff,TOD=='AM',BOARDINGS,sum,MODE
Number of transit boardings,transit_summary,MD,transit_onoff,TOD=='MD',BOARDINGS,sum,MODE
Number of transit boardings,transit_summary,PM,transit_onoff,TOD=='PM',BOARDINGS,sum,MODE
Number of transit boardings,transit_summary,EV,transit_onoff,TOD=='EV',BOARDINGS,sum,MODE
Trips by TOD and Purpose,trip_tod_purpose_summary,EA,trips,departTimeFiveTod == 'EA',weightTrip,sum,tripPurposeDestination
Trips by TOD and Purpose,trip_tod_purpose_summary,AM,trips,departTimeFiveTod == 'AM',weightTrip,sum,tripPurposeDestination
Trips by TOD and Purpose,trip_tod_purpose_summary,MD,trips,departTimeFiveTod == 'MD',weightTrip,sum,tripPurposeDestination
Trips by TOD and Purpose,trip_tod_purpose_summary,PM,trips,departTimeFiveTod == 'PM',weightTrip,sum,tripPurposeDestination
Trips by TOD and Purpose,trip_tod_purpose_summary,EV,trips,departTimeFiveTod == 'EV',weightTrip,sum,tripPurposeDestination
Trips by Origin and Destination,trip_od_summary,flows,trips,,weightTrip,sum,"originMGRA,destinationMGRA"
Zero Car Households,zero_car_households_summary,household_numbers,households,autos == 0,hhId,count,cluster_id
household by MGRA,households_sample_mgra_summary,sampled_households,households,,hhId,count,cluster_id
household sample by MGRA,households_mgra_summary,original_households,households_orig,,hhid,count,cluster_id
Network Summaries,network_summary,vmt_total,network,ifc_desc == 'Freeway',vmt_total,sum,ID
Network Summaries,network_summary,flow_total,network,ifc_desc == 'Freeway',flow_total,sum,ID
Network Summaries,network_summary,voc,network,ifc_desc == 'Freeway',voc,sum,ID
VMT by Class,vmt_summary,vmt_total,network,,vmt_total,sum,ifc_desc
VMT Auto by Class,vmt_summary,vmt_auto,network,,vmt_auto,sum,ifc_desc
VMT Truck by Class,vmt_summary,vmt_truck,network,,vmt_truck,sum,ifc_desc
22 changes: 22 additions & 0 deletions visualizer/pipeline/config/expressions_donor_model.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
Description,Out Table,Out Col,In Table,Filter,In Col,Func,Group
#,,,,,,,
Mode Share,trip_mode_summary,num_trips,trips,,weightTrip,sum,tripMode
Intrazonal summary,intrazonal_distance_by_mode_summary,distance,trips,originMGRA == destinationMGRA,distanceDrive,sum,tripMode
Number of transit boardings,transit_summary,EA,transit_onoff,TOD=='EA',BOARDINGS,sum,MODE
Number of transit boardings,transit_summary,AM,transit_onoff,TOD=='AM',BOARDINGS,sum,MODE
Number of transit boardings,transit_summary,MD,transit_onoff,TOD=='MD',BOARDINGS,sum,MODE
Number of transit boardings,transit_summary,PM,transit_onoff,TOD=='PM',BOARDINGS,sum,MODE
Number of transit boardings,transit_summary,EV,transit_onoff,TOD=='EV',BOARDINGS,sum,MODE
Trips by TOD and Purpose,trip_tod_purpose_summary,EA,trips,departTimeFiveTod == 'EA',weightTrip,sum,tripPurposeDestination
Trips by TOD and Purpose,trip_tod_purpose_summary,AM,trips,departTimeFiveTod == 'AM',weightTrip,sum,tripPurposeDestination
Trips by TOD and Purpose,trip_tod_purpose_summary,MD,trips,departTimeFiveTod == 'MD',weightTrip,sum,tripPurposeDestination
Trips by TOD and Purpose,trip_tod_purpose_summary,PM,trips,departTimeFiveTod == 'PM',weightTrip,sum,tripPurposeDestination
Trips by TOD and Purpose,trip_tod_purpose_summary,EV,trips,departTimeFiveTod == 'EV',weightTrip,sum,tripPurposeDestination
Trips by Origin and Destination,trip_od_summary,flows,trips,,weightTrip,sum,"originMGRA,destinationMGRA"
Zero Car Households,zero_car_households_summary,household_numbers,households,autos == 0,hhId,count,homeMGRA
Network Summaries,network_summary,vmt_total,network,ifc_desc == 'Freeway',vmt_total,sum,ID
Network Summaries,network_summary,flow_total,network,ifc_desc == 'Freeway',flow_total,sum,ID
Network Summaries,network_summary,voc,network,ifc_desc == 'Freeway',voc,sum,ID
VMT by Class,vmt_summary,vmt_total,network,,vmt_total,sum,ifc_desc
VMT Auto by Class,vmt_summary,vmt_auto,network,,vmt_auto,sum,ifc_desc
VMT Truck by Class,vmt_summary,vmt_truck,network,,vmt_truck,sum,ifc_desc
Loading

0 comments on commit c371429

Please sign in to comment.