Merge pull request #48 from SANDAG/visualizer

RSM Visualizer
SANDAG · Aug 31, 2023 · c371429 · c371429
2 parents fe07461 + a601c67
commit c371429
Show file tree

Hide file tree

Showing 117 changed files with 126,192 additions and 1 deletion.
diff --git a/.gitignore b/.gitignore
@@ -6,6 +6,9 @@ rsm.egg-info
 _version.py
 .DS_Store
 test/data/*
+notebooks/**/*.omx
+visualizer/simwrapper/data/processed/pipeline.log
+visualizer/simwrapper/data/external
 *.pyc
 sandag_rsm.egg-info/*
-site/*
+site/*
diff --git a/.gitmodules b/.gitmodules
@@ -0,0 +1,4 @@
+[submodule "visualizer/Data-Pipeline-Tool"]
+	path = visualizer/Data-Pipeline-Tool
+	url = https://github.com/SANDAG/Data-Pipeline-Tool.git
+        branch = rsm-visualizer
diff --git a/visualizer/README.md b/visualizer/README.md
@@ -0,0 +1,27 @@
+# RSM Visualizer
+Model Results Visualizer using SimWrapper for Rapid Strategic Model (RSM)
+
+## How to Setup
+- Download the data for 'donor_model' and 'rsm_base' scenarios from the shared folder (contact Joe from SANDAG or Arash from WSP) and place it in the 'visualizer\simwrapper\data\external' folder. 
+
+- The visualizer is set up to compare three scenarios - Donor (full) Model, RSM Baseline and RSM Scenario. Each scenario folder in the external directory should have 'input' and 'report' as sub-folders. 
+
+- For each of the scenario folder, 'report' folder has the files that are generated as part of the data exporter step in the model and 'input' folder only needs to have mgra_crosswalk.csv and households.csv file for RSM scenarios. 
+
+## Configuration
+- 'config/scenarios.yaml' file specifies the user configuration for the three scenarios. This file does not need to be modified unless config changes are desired by the user. 
+
+## How to Run
+- Open Anaconda prompt and change the directory to visualizer folder in your local RSM repository. 
+
+- Run the process scenario script by typing command below and then press enter.
+
+  `python process_scenarios.py`
+
+- Processing the scenario using pipeline will take some time. 
+
+- Next, open this link in the web browser
+  https://simwrapper.github.io/site/
+
+- Click on 'Enter Site' button, then click on 'add local folder' and add simwrapper directory (visualizer\simwrapper) to run the SimWrapper Visualizer for RSM. 
+
diff --git a/visualizer/bin/run-pipeline.bat b/visualizer/bin/run-pipeline.bat
@@ -0,0 +1,16 @@
+@echo on
+
+SET SETTINGS_FILE=%1
+SET DATA_PIPELINE_PATH=%2
+
+:: change the directory to data-pipeline-tool folder
+cd /d %DATA_PIPELINE_PATH%
+
+:: create conda environment using the environment.yml
+CALL conda env create -f environment.yml
+
+:: activate the conda environment
+CALL conda activate sandag-rsm-visualizer
+
+:: run the script for data pipeline tool 
+python run.py %SETTINGS_FILE%
diff --git a/visualizer/bin/run-visulizer-support.bat b/visualizer/bin/run-visulizer-support.bat
@@ -0,0 +1,11 @@
+@echo on
+
+SET VISUALIZER_PATH=%1
+SET CONFIG=%2
+
+:: activate the conda environment (created in run-pipeline.bat call)
+CALL conda activate sandag-rsm-visualizer
+
+:: run the support script. 
+cd /d %VISUALIZER_PATH%
+python visualizer_support.py %CONFIG%
diff --git a/visualizer/config/config_visualizer_support.yml b/visualizer/config/config_visualizer_support.yml
@@ -0,0 +1,17 @@
+inputs:
+  shapefile_dir: simwrapper\data\external\shapefile
+  cross_reference_mgra_file_name: mgra_crosswalk.csv
+  mode_summary_file_name: trip_mode_summary.csv
+  vmt_summary_file_name: vmt_summary.csv
+  compared_scenarios_dir: simwrapper\data\processed\all_runs
+  intrazonal_distance_mode_file_name: intrazonal_distance_by_mode_summary.csv
+  trip_od_summary_file_name: trip_od_summary.csv
+  zero_car_summary_file_name: zero_car_households_summary.csv
+parameters:
+  rsm_scenario_list:
+  - rsm_before_calibration
+  - rsm_calibrated
+  base_scenario_list:
+  - donor_model
+outputs:
+  total_vmt_file_name: vmt_total_summary.csv
diff --git a/visualizer/config/scenarios.yaml b/visualizer/config/scenarios.yaml
@@ -0,0 +1,22 @@
+
+rsm_scenarios:
+  rsm_base:
+      name: "run_base"
+      input: \simwrapper\data\external\rsm_base\input
+      report: \simwrapper\data\external\rsm_base\report
+      output: \simwrapper\data\processed\rsm_base\
+
+  rsm_scen:
+      name: "rsm_scen"
+      input: \simwrapper\data\external\rsm_scen\input
+      report: \simwrapper\data\external\rsm_scen\report
+      output: \simwrapper\data\processed\rsm_scen\
+
+base_scenarios:
+  donor_model:
+      name: "donor_model"
+      input: \simwrapper\data\external\donor_model\input
+      report: \simwrapper\data\external\donor_model\report
+      output: \simwrapper\data\processed\donor_model\
+
+shapefiles: \simwrapper\data\external\shapefile
diff --git a/visualizer/pipeline/.gitignore b/visualizer/pipeline/.gitignore
@@ -0,0 +1,8 @@
+# Byte-compiled / optimized / DLL files
+__pycache__/
+*.py[cod]
+*$py.class
+
+# Outputs
+output/*
+!output/.gitkeep
diff --git a/visualizer/pipeline/README.md b/visualizer/pipeline/README.md
@@ -0,0 +1,101 @@
+# Data Pipeline Tool
+
+The Data Pipeline Tool aims to aid in the process of building data pipelines that ingest, transform, and summarize data by taking advantage of the parameterization of data pipelines. Rather than coding from scratch, configure a few files and the tool will figure out the rest.
+
+## Background
+
+Data pipelines across projects can vary heavily in terms of the data that is used, how the data is transformed, and the eventual summaries that are required and produced. However, the fundamental ways to develop these data pipelines, from a technical perspective, tend to overlap. For example, in Python, one may repeatedly use *pd.read_csv()* to read in data or *pd.merge()* to combine data sets. Ultimately, these pipelines, to a certain degree, can be parameterized to minimize redundancy in implementation. This is precisely the reason for the creation of this tool.
+
+## Configuring A Pipeline
+
+To configure your data pipeline, you will need to edit the following three files in the `config` directory:
+
+- `settings.yaml`: Main settings file for the tool. Controls the overall flow of the data pipeline.
+
+- `processor.csv`: Contains expressions that are used to process data before or after summarization.
+
+- `expressions.csv`: Contains expressions to summarize data and controls how summary tables are written upon tool completion.
+
+- `user_added_functions.py`: Contains user-defined functions that can be called in the processor.
+
+The following describes the contents of these files and how they can be edited.
+
+### `settings.yaml`
+
+This file consists of the following properties. Please see [this](config/settings.yaml) for an example.
+
+- `extract`: Root property -- controls data extraction (reading). Note: Multiple data sources can be specified by the user.
+
+  - `filepath` (str): File path to data source
+  - `test_size` (int): Number of rows to read from input data -- for testing purposes. Leave empty to read all rows.
+  - `data`: List of files at the specified data *filepath* to read
+
+- `transform`: Root property -- controls data processing.
+  - `processor` (str): File path to processor specification file
+  - `expressions` (str):  File path to summary expressions specification file
+  - `steps`: Lists the processing steps to execute. Note: User can create as many necessary steps. The order of processing, concatenating, and merging that is specified will be followed for each step.
+    - `name` (str): User-defined name of processing step
+    - `process` (bool): True or False -- whether to run processor. Note: Only the processor expressions corresponding to a step will be executed
+    - `summarize`: (bool): True or False -- whether to run summarizer. Note: This property should only be called **once**. Once called, only the tables resulting from the summary expressions will be available for post-processing.
+    - `concat`: Controls data concatenation
+      - `table_name` (str): User-defined name of resulting table after concatenation
+      - `include`: List of data set names to concatenate. Note: Names are either those loaded in *extract* without the file extension or the user-defined names resulting from a previous concatenation or merge.
+    - `merge`: Controls data merging
+      - `table_name` (str): User-defined name of resulting table after merge
+      - `include`: List of data set names to merge. Note: Names are either those loaded in *extract* without the file extension or the user-defined names resulting from a previous concatenation or merge.
+      - `merge_cols`: List of columns to merge two data sets on. Note: The order of the columns must match the order specified in *in
+      - `merge_type` (str): Merge type -- 'left', 'right', 'inner', or 'outer' merge are supported
+
+- `load`: Root property -- controls results loading/writing
+  - `outdir` (str): File path of directory to write results to
+  - `empty_fill` (str or numeric): Value to use for filling missing values in output results
+
+### `processor.csv`
+
+This file consists of the following fields. Please see [this](config/processor.csv) for an example.
+
+- `Description`: User-specified description of what the processing row accomplishes
+- `Step`: User-defined *processing step* name that the row belongs to
+- `Type`: Processing type of the row
+  - `column`: Generate a new field from a combination of fields or a transformation of a field in *Table* as defined by *Func*. Note: User does not need to specify *In Col* if used.
+  - `rename`: Rename field(s) as defined by the dictionary in *Func*. Note: User does not need to specify *In Col* or *Out Col* if used.
+  - `replace`: Replace values in a field as defined by the dictionary in *Func*
+  - `bin`: Bin values in a field into discrete values as defined by the intervals in *Func*
+  - `cap`: Cap values in a field to a maximum value specified in *Func*
+  - `apply`: Apply a Pandas Series apply() function to every element in a field as defined by *Func*. Note: Function should be written as if writing directly within apply().
+  - `sum`: Take the row-wise sum of multiple columns as specified by comma delimited names in *In Col*. Note: User does not need to specify *Func* if used.
+  - `skim`: Query skim (.omx) origin-destination pairs as specified by the comma delimited pairs in *In Col* and the skim matrix specified in *Func*
+  - `raw`: Evaluate raw Python expression as defined by *Func*. Note: User does not need to specify *In Col*, *Out Col*, or *Table* if used.
+- `Table`: Name of the table to evaluate the processor row on. Note: Names are either those loaded in *extract* without the file extension or the user-defined names resulting from a previous concatenation or merge.
+- `Out Col`: Field name of processing result -- added to *Table*
+- `In Col`: Field name of fie*Table* to apply processing
+- `Func`: Function/expression to use for processing
+
+### `expressions.csv`
+
+This file consists of the following fields. Please see [this](config/expressions.csv) for an example.
+
+- `Description`: User-specified description of what the summarization row accomplishes
+- `Out Table`: User-defined summary table name to add result to. Npte: Unique set of table names in this column will be written out upon the tools completion.
+- `Out Col`: Field name of expression result -- added to *Out Table*
+- `In Table`: Name of the table to evaluate the expression on. Note: Names are either those loaded in *extract* without the file extension or the user-defined names resulting from a previous concatenation or merge.
+- `Filter`: Pandas query filter to apply to *In Table* before evaluating expression
+- `In Col`: Field name of field in *In Table* to apply expression
+- `Func`: Pandas Series method to apply to *In Col* as defined by the [Pandas API](https://pandas.pydata.org/docs/reference/api/pandas.Series.html). Users can also specify a custom Python expression using the *Out Col* names of expressions previously evaluated (much like a measure in PowerBI) -- for such cases, *In Col*, *Filter*, and *In Table* do not need to be specified.
+- `Group`: Comma delimited field names of fields in *In Table* to use for group aggregations
+
+### `user_added_functions.py`
+
+Any function defined in this script will be able to be called in the processor.
+
+## Running A Pipeline
+
+To run the configured pipeline, do the following:
+
+1. Open Anaconda 3 Prompt
+2. Create an Anaconda environment using the provided environment.yml file: `conda env create -f environment.yml`
+3. Activate the newly created environment: `conda activate pipeline`
+4. Change directory to the project folder
+5. Configure the data pipeline as described in *Configuring A Pipeline*
+6. Run the tool by executing the following: `python run.py`
+7. When the process finishes, all resulting summary files are written to the output directory specified in `settings.yaml`
diff --git a/visualizer/pipeline/config/expressions.csv b/visualizer/pipeline/config/expressions.csv
@@ -0,0 +1,24 @@
+Description,Out Table,Out Col,In Table,Filter,In Col,Func,Group
+#,,,,,,,
+Mode Share,trip_mode_summary,num_trips,trips,,weightTrip,sum,tripMode
+Intrazonal summary,intrazonal_distance_by_mode_summary,distance,trips,originMGRA == destinationMGRA,distanceDrive,sum,tripMode
+Number of transit boardings,transit_summary,EA,transit_onoff,TOD=='EA',BOARDINGS,sum,MODE
+Number of transit boardings,transit_summary,AM,transit_onoff,TOD=='AM',BOARDINGS,sum,MODE
+Number of transit boardings,transit_summary,MD,transit_onoff,TOD=='MD',BOARDINGS,sum,MODE
+Number of transit boardings,transit_summary,PM,transit_onoff,TOD=='PM',BOARDINGS,sum,MODE
+Number of transit boardings,transit_summary,EV,transit_onoff,TOD=='EV',BOARDINGS,sum,MODE
+Trips by TOD and Purpose,trip_tod_purpose_summary,EA,trips,departTimeFiveTod == 'EA',weightTrip,sum,tripPurposeDestination
+Trips by TOD and Purpose,trip_tod_purpose_summary,AM,trips,departTimeFiveTod == 'AM',weightTrip,sum,tripPurposeDestination
+Trips by TOD and Purpose,trip_tod_purpose_summary,MD,trips,departTimeFiveTod == 'MD',weightTrip,sum,tripPurposeDestination
+Trips by TOD and Purpose,trip_tod_purpose_summary,PM,trips,departTimeFiveTod == 'PM',weightTrip,sum,tripPurposeDestination
+Trips by TOD and Purpose,trip_tod_purpose_summary,EV,trips,departTimeFiveTod == 'EV',weightTrip,sum,tripPurposeDestination
+Trips by Origin and Destination,trip_od_summary,flows,trips,,weightTrip,sum,"originMGRA,destinationMGRA"
+Zero Car Households,zero_car_households_summary,household_numbers,households,autos == 0,hhId,count,cluster_id
+household by MGRA,households_sample_mgra_summary,sampled_households,households,,hhId,count,cluster_id
+household sample by MGRA,households_mgra_summary,original_households,households_orig,,hhid,count,cluster_id
+Network Summaries,network_summary,vmt_total,network,ifc_desc == 'Freeway',vmt_total,sum,ID
+Network Summaries,network_summary,flow_total,network,ifc_desc == 'Freeway',flow_total,sum,ID
+Network Summaries,network_summary,voc,network,ifc_desc == 'Freeway',voc,sum,ID
+VMT by Class,vmt_summary,vmt_total,network,,vmt_total,sum,ifc_desc
+VMT Auto by Class,vmt_summary,vmt_auto,network,,vmt_auto,sum,ifc_desc
+VMT Truck by Class,vmt_summary,vmt_truck,network,,vmt_truck,sum,ifc_desc
diff --git a/visualizer/pipeline/config/expressions_donor_model.csv b/visualizer/pipeline/config/expressions_donor_model.csv
@@ -0,0 +1,22 @@
+Description,Out Table,Out Col,In Table,Filter,In Col,Func,Group
+#,,,,,,,
+Mode Share,trip_mode_summary,num_trips,trips,,weightTrip,sum,tripMode
+Intrazonal summary,intrazonal_distance_by_mode_summary,distance,trips,originMGRA == destinationMGRA,distanceDrive,sum,tripMode
+Number of transit boardings,transit_summary,EA,transit_onoff,TOD=='EA',BOARDINGS,sum,MODE
+Number of transit boardings,transit_summary,AM,transit_onoff,TOD=='AM',BOARDINGS,sum,MODE
+Number of transit boardings,transit_summary,MD,transit_onoff,TOD=='MD',BOARDINGS,sum,MODE
+Number of transit boardings,transit_summary,PM,transit_onoff,TOD=='PM',BOARDINGS,sum,MODE
+Number of transit boardings,transit_summary,EV,transit_onoff,TOD=='EV',BOARDINGS,sum,MODE
+Trips by TOD and Purpose,trip_tod_purpose_summary,EA,trips,departTimeFiveTod == 'EA',weightTrip,sum,tripPurposeDestination
+Trips by TOD and Purpose,trip_tod_purpose_summary,AM,trips,departTimeFiveTod == 'AM',weightTrip,sum,tripPurposeDestination
+Trips by TOD and Purpose,trip_tod_purpose_summary,MD,trips,departTimeFiveTod == 'MD',weightTrip,sum,tripPurposeDestination
+Trips by TOD and Purpose,trip_tod_purpose_summary,PM,trips,departTimeFiveTod == 'PM',weightTrip,sum,tripPurposeDestination
+Trips by TOD and Purpose,trip_tod_purpose_summary,EV,trips,departTimeFiveTod == 'EV',weightTrip,sum,tripPurposeDestination
+Trips by Origin and Destination,trip_od_summary,flows,trips,,weightTrip,sum,"originMGRA,destinationMGRA"
+Zero Car Households,zero_car_households_summary,household_numbers,households,autos == 0,hhId,count,homeMGRA
+Network Summaries,network_summary,vmt_total,network,ifc_desc == 'Freeway',vmt_total,sum,ID
+Network Summaries,network_summary,flow_total,network,ifc_desc == 'Freeway',flow_total,sum,ID
+Network Summaries,network_summary,voc,network,ifc_desc == 'Freeway',voc,sum,ID
+VMT by Class,vmt_summary,vmt_total,network,,vmt_total,sum,ifc_desc
+VMT Auto by Class,vmt_summary,vmt_auto,network,,vmt_auto,sum,ifc_desc
+VMT Truck by Class,vmt_summary,vmt_truck,network,,vmt_truck,sum,ifc_desc