Technologies | Getting Started | Collaborators | References | Contribute
The MIDRC Diversity Calculator is a tool designed to compare the representativeness of biomedical data. By leveraging the Jensen-Shannon distance (JSD) measure, this tool provides insights into the demographic representativeness of datasets within the biomedical field. It also supports monitoring the representativeness of datasets over time by assessing the representativeness of historical data. Developed and utilized by MIDRC, this tool assesses the representativeness of data within the open data commons to the US population. Additionally, it can be generalized by users for other diversity representativeness needs, such as assessing the similarity of demographic distributions across multiple attributes in different biomedical datasets.
- Jensen-Shannon Distance (JSD) Calculation: Uses the JSD measure to assess the representativeness of data.
- Comparative Analysis: Enables comparisons between different datasets to evaluate demographic diversity.
- Biomedical Focus: Specifically tailored for analyzing biomedical data, ensuring relevance and similarity.
- Historical Data: Enables the ability to assess data over time for monitoring changes in representativeness.
The methodology behind the Diversity Calculator is based on the 2023 paper by Whitney et al. titled "Longitudinal assessment of demographic representativeness in the Medical Imaging and Data Resource Center open data commons"[1]. This paper provides the theoretical foundation for using JSD in evaluating demographic representativeness.
Technologies used with this application
- Python
- PySide6
- numpy
- scipy
- pandas
There is a requirements.txt file available to install requirements
First, configure your own jsdconfig.yaml file to select which data to load by default. There is a jsdconfig-example.yaml file provided that may be copied over or used as a template for your own config file.
- The filename needs to be specified, and a human-readable name should be provided for use in the plots and figures.
- Please see the Generating custom Excel files section for additional information.
- On your first run, you may use
cp jsdconfig-example.yaml jsdconfig.yaml
to load the MIDRC data.
To start the application, run python main.py
- Select the files you wish to compare in the drop-down menus that you wish to make comparisons between.
- A checkbox is provided next to the drop-down menus to select whether additional plots should be shown for each individual file selected.
- Note: displaying plots for two or more files simultaneously may require a 4k monitor
- Use the provided MIDRC, CDC, and Census Excel files as an example on how to prepare your custom data.
- For each date, cumulative sums are expected.
- Each attribute should have its own sheet which will be automatically parsed by the application.
- Column names within each sheet are parsed and compared between files
- Where there is a matching column name within a worksheet of the same name, the JSD will be calculated using those values.
- A Date column is expected, and it should be sorted. Please see how the census data is loaded using the example config file if your data does not have multiple dates, and you do not have a date column.
- The list of attributes provided in the GUI should be a list where worksheets with an identical name exist in both files. If it is not, please check your spelling
- The
remove column name text
config parameter is due to how the MIDRC data is generated. There is a(CUSUM)
suffix that needs to be removed to compare it to CDC and Census data.
The plots and figures should be movable, adjustable, re-sizable, or hidden.
To see the list of available dock widgets, you can right-click on any menu/title bar area, i.e. either the main window menu bar or any title bar in a dock widget. This is useful if you hide one of hte docked widgets and wish to view them again.
Keyboard commands may be used to copy and paste the calculated JSD values (and dates) and pasted in Excel or a notebook as tab-delimited data.
- Python 3.9 or highter
- Git
How to clone the project
git clone https://github.com/MIDRC/MIDRC_Diversity_Calculator.git
You may install project dependencies using pip.
Using pip:
cd MIDRC_Diversity_Calculator
python -m pip install --upgrade pip
pip install -r requirements.txt
How to start the project
cd MIDRC_Diversity_Calculator
cp jsdconfig-example.yaml jsdconfig.yaml
python main.py
Robert Tomek, Maryellen Giger, Heather Whitney
Natalie Baughan, Kyle Myers, Karen Drukker, Judy Gichoya, Brad Bower, Weijie Chen, Nicholas Gruszauskas, Jayashree Kalpathy-Cramer, Sanmi Koyejo, Rui Sá, Berkman Sahiner, Zi Zhang,
- Co-leads
- Karen Drukker
- Judy Wawira Gichoya
- AAPM
- Weijie Chen
- Kyle Myers
- Heather Whitney
- ACR
- Jayashree Kalpathy-Cramer
- RSNA
- Zi Jill Zhang
- NIH
- Rui Sá
- Brad Bower
- MIDRC Central (University of Chicago)
- Maryellen Giger
- Nick Gruszaukas,
- Katie Pizer
- Robert Tomek
- Project Manager
- Emily Townley
git clone https://github.com/MIDRC/MIDRC_Diversity_Calculator.git
git checkout -b feature/NAME
- Open a Pull Request explaining the problem solved or feature made, if exists, append screenshot of visual modifications and wait for the review!