Skip to content

Latest commit

 

History

History
134 lines (95 loc) · 12.9 KB

README.md

File metadata and controls

134 lines (95 loc) · 12.9 KB

Introduction

Welcome to the CalCORVID (California Clustering for Operational Real-time Visualization of Infectious Diseases) dashboard repository. This repository provides sample data, code for a RShiny dashboard displaying spatiotemporal cluster results (CalCORVID), and functions to preprocess your data for display in the dashboard. Although CalCORVID and the accompanying functions can be directly implemented as a final product, we encourage users to customize and build on the provided code base.

🛈 Note This application is designed using R programming language and is developed in the RStudio integrated development environment (IDE). Click here to download R software and here to download RStudio to get started.

Getting Started

⚠️ Warning The current iteration of this dashboard is only designed to display spatiotemporal cluster results from SaTScan software. The user must analyze their data in SaTScan before using this repository.

Repository Structure

Repository Home:

  • app.R: Contains code for the user interface (UI) and server functions.
  • global.R: Sourced by app.R to load libraries, load data, and preprocess data which improves dashboard processing time.
  • /R/ folder: R scripts with relevant functions including generating dashboard files, cleaning input data, and running SaTScan within R.
  • /data/ folder: Stores all SaTScan outputs and generated data files by global.R that the dashboard uses.
    • This folder contains files corresponding to two sample datasets: 1) The California vaccination analysis in the displayed dashboard example, and 2) simulated sample data in the test_data subfolder for users to get familiar with using the dashboard.

/R/:

  • satscan_run_example.R: Script to run SaTScan using R with the rsatscan package. This generates the example data displayed on the dashboard, but is commented out as the results are already provided in the repository. However, this script can be adapted by users who are interested in running SaTScan within R.
  • check_data.R: Script containing functions that:
    • clean_data(): Check if newest SaTScan outputs are in the correct format.
    • combine_datasets(): Merge cluster center (*.col) and location ID (*.gis) files over a specified time frame.
    • clean_combined_datasets(): Reformat the combined dataset after merging with Social Vulnerability Index (SVI) calculations for dashboard display.
  • generate_map.R: Script containing functions that:
    • generate_svi_vars(): Calculate average SVI percentiles for the geographic unit of analysis for each cluster.
    • generate_county_shapes(): Create a geojson shapefile containing county boundaries for the specified state. This function will also calculate the centroid of each county polygon to display centered county label names on the leaflet map.
    • generate_state_coords(): Generate a CSV file with geographic coordinates to orient the leaflet map to the specified state (more useful if detecting clusters over a larger geographic area).
    • generate_cluster_coords(): Generate a CSV file with geographic coordinates to orient the leaflet map to the detected clusters (more useful if detecting clusters over a smaller geographic area).
  • generate_test_data.R: Script containing the code used to generate simulated test data at the California census tract-level.
  • run_test_data.R: Script to run SaTScan using R with the rsatscan package for the simulated test data.

/data/:

  • CAvax_combgiscol_fnl.csv: Final version of the example dataset to test running this dashboard that is generated by clean_combined_datasets()
  • coords/: Folder containing CSV file/s of generated centroids given the state of analysis using generate_state_coords() or generate_cluster_coords().
  • county_boundary/: Folder containing geojson files of county boundaries given the state of analysis using generate_county_shapes().
  • giscol_files/: Folder containing merged cluster center files (*.col) and location IDs (*.gis) using combine_datasets() that are aggregated over a given time period.
  • satscan_output/: Folder to store required raw SaTScan outputs (cluster center - *.col and location ID files - *.gis).
  • svi/: Folder containing Social Vulnerability Index (SVI) scores for the given geographic unit and state.
  • test_data/: Folder containing relevant files for the simulated test data (see "Build and Test" section below) for users to test generating their own dashboard files.

Using the Dashboard

Obtaining Dashboard Input Files -- in SaTScan software

Required files: This dashboard requires these two output files and corresponding columns from your SaTScan output:

  • Cluster centers (*.col file): LOC_ID*, LATITUDE, LONGITUDE, RADIUS, START_DATE, END_DATE, OBSERVED, EXPECTED, CLUSTER
  • Location IDs (*.gis file): LOC_ID*, CLUSTER

* The LOC_ID variables must be expressed as FIPS codes (census tract or county) or ZIP codes to calculate average Social Vulnerability Index values for the dashboard tooltip.

If you already have these two output files and corresponding columns, you can skip to the Dashboard Implementation section.

If you do not have these files: Follow the steps below to obtain them.

  1. Analyze your data using SaTScan software, which is available here or through the rsatscan package. This dashboard is currently designed to display only circular clusters.
  2. Establish the nomenclature and organization of the result files so that analyses of the same input data are found in the same folder and are easily identifiable. For example, our sample dataset contains vaccination data from California so we name all our output files with CAvax, and all of our SaTScan outputs can be found in the /data/satscan_output folder.
  3. Save the cluster center and location information results in CSV format by either checking the "Cluster Information" and "Location Information" output options in the SaTScan software or saving the $col and $gis objects from running rsatscan::satscan in the same folder. Similarly, you need to establish a nomenclature to distinguish these files in the output folder. We use _col_ and _gis_, resulting in files called CAvax_col_20240123.csv and CAvax_gis_20240123.csv. If you are saving the files from the SaTScan software, you may need to convert from a text file (.txt).

Dashboard implementation -- within this repository

  1. Fork this Repository
  2. Modify necessary parameter values starting from line 35 to line 75 of the global.R file.
    • Change any file paths and subfolder names necessary under the "File Paths and Subfolders" section.
    • Provide the nomenclature used for your SaTScan outputs under the "Input Files" section.
    • Provide the desired parameter values for the dashboard functions given in the R folder. For example, if your analysis is at the census tract level instead of the zip code level, change level="zcta" to level="tract".
  3. After specifying the parameter values, running the app.R file will automatically run the provided functions in the global.R file in the following order: clean_data(), combine_datasets(), generate_svi_vars(), and clean_combined_df(). These functions check if your files are in the correct format, combine them to create a historical file, calculate Social Vulnerability Index (SVI) averages for each cluster, and cleans the dataset for the dashboard display. More specifically:
    • clean_data(): Checks if most recently run dataset is in the correct CSV format
    • combine_datasets(): If the most recent dataset is in the correct format, aggregate to the historical period specified in the parameters above. The default is time_value=10 and time_unit="days", so the aggregated dataset will contain results from the past 10 days.
    • generate_svi_vars(): Given the geographic level of analysis (level="zcta", level="tract", level="county") and state of analysis (state="CA"), calculate average Social Vulnerability Index (SVI) percentiles for each cluster.
    • clean_combined_df(): Reformats the combined dataset (removing unnecessary columns, renaming columns) containing SVI information for map and table displays on the RShiny dashboard.
  4. If this is your first time running app.R for a given state (default is state="CA"), the global.R file will also run either generate_state_coords() or generate_cluster_coords() depending on the zoom_level specified, which generates coordinates to center the leaflet map to the state of the analysis or the detected clusters. The default for the CAvax data is set to zoom_level=state because the clusters are spread across a large geographic area. The global.R file will also run generate_county_shapes(), which generates the county boundaries of the state of analysis. If these files are already generated, the files will be read in.
  5. Deploy the dashboard.

Build and Test

We suggest forking this repository, cloning into your local environment, and trying to run the app.R and global.R files with the provided sample data. After becoming familiar with the structure of the dashboard and the underlying functions, try plugging in your SaTScan results and modifying simpler features like the dashboard theme. We also provide a simulated test dataset detailed below for testing purposes.

Example using simulated test data

  1. We provide a simulated dataset in the data/test_data folder, which contains census tract-level analyses for the state of California over a period of 14 days. Poisson-distributed counts are randomly assigned to each census tract and date combination in the R/generate_test_data.R file. This allows us to create case and geography files (ca_tract_case.csv, ca_tract_geo.csv) for the space-time permutation model. We then use the SaTScan software to run the case and geography files in R/run_test_data.R to generate example results. The sample results are in the same data/test_data folder called CAtest_col_20240221.csv and CAtest_gis_20240221.csv, following the nomenclature guidelines provided in the previous section.

  2. Modify the necessary parameter values starting from line 35 to line 75 in the global.R file.

    • File paths:
      • Change satscan_output_folder_name = "satscan_output" to satscan_output_folder_name = "test_data"
    • Input files:
      • Change model <- "CAvax" to model <- "CAtest"
    • Dashboard functions:
      • Change zoom_level <- "cluster" to zoom_level <- "state"
      • Change level="zcta" to level="tract"
      • Note: time_value and time_unit parameters are relevant only when aggregating cluster results over a given time period. For instance, if you run SaTScan analyses daily, the default setting of time_value=10 and time_unit="days" will aggregate the last 10 days of data to display on the dashboard.
  3. Run the app.R file, which will automatically run the provided functions in the global.R file in the following order: clean_data(), combine_datasets(), generate_svi_vars(), and clean_combined_df(). These functions are detailed in the previous section. The final dataset the dashboard uses will be output in the main data/ folder as CAtest_combgiscol_fnl.csv.

  4. Deploy the dashboard.

Deploying your dashboard

Since every organization has its own internal requirements, we are unable to provide specific guidance on deploying CalCORVID. However, we have provided resources below based on feedback.

Contribute

We can't wait to see what you do with this. Please fork, edit and send us back as pull requests the changes you'd like to see. Potentials for extension include:

  • Incorporating different spatial resolutions in addition to the state-level display
  • Inclusion of other socioeconomic variables
  • Support for other spatiotemporal clustering methods
  • Display line lists with each cluster
  • Support for non-circular clusters
  • Incorporating a quantitative method to track clusters over time (e.g., Moran's I)

A gift from California with love. “Together, all things are possible.” -- Cesar Chavez

Resources