Corex Dashboard

This repository contains a Bokeh app for exploring topic models generated with the Anchored Correlation Explanation (CorEx) package. The dataset is a collection of scientific abstracts related to Energy Storage, see here for more information.

The app is hosted on Heroku here. It's hosted with the free tier of heroku so will take a moment for the server to start up.

Using locally with your own dataset

Installation

The preferred way to install the required packages with anaconda. In an anaconda prompt: conda env create -f environment.yml followed by conda activate corex-dashboard.

You can also install the requirements with pip using python -m pip install -r requirements_local.txt. requirements.txt does not include some pacakages as it is used when building for the Heroku page.

Preparing input data

The dashboard can be used with the example dataset by default. Follow these instructions to use your own text data.

The folder example_data has example files that are used with the dashboard. All of the files in this folder are generated from input_data.csv, except for anchor_default.txt which is an optional list of default anchor words to be used in generating models.

To start, create a folder called data and generate (e.g. write a script) an input_data.csv file in this folder. The csv file has a unique integer index for each document called ID as the first column. There are the additional columns

title (required) - title of each document
processed_text (required) - space separated list of strings corresponding to the text of each document
url (optional) - A url to generate a hyperlink to a given text
prob (optional) - This field in my case was generated by Microsoft Academic as a metric of how highly ranked a paper is in terms of citations and was a exponentially formatted number < 1. It is used in one of the displays to show highly ranked papers in a given topic

Processing the data

Run python gendata.py to process the input texts into a form ready for the dashboard. This involve bigram generation with gensim, vectorization with CountVectorizer, and generating display text.

Running the dashboard

Run bokeh serve dashboard.py --show and a browser window should open showing the dashboard.

Select a combination of unsupervised topics and anchor words, define a model name, and press 'Generate/Overwrite Model' to generate a model. Refresh the page to populate the model selection dropdown with this new model.A text display will indicate when the model is done being fit.

Anchor words for each topic should be separated by spaces, and each topic separated by a newline. See example_data/anchor_default.txt to see how anchor words should be formatted. The anchor strength slider determines how tightly each topic is constrained to the anchor words. You can check if a word is in your vocabulary with the 'Check word' input (press enter and it will turn green or red).
Press Generate graph with a selected model. Note that this populates the model generation controls with the model data for tweaking.

Also note the Jupyter notebook graph_plots.ipynb to explore and make plots from the generated models.

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
corex_scripts		corex_scripts
example_data		example_data
.gitignore		.gitignore
.gitmodules		.gitmodules
Procfile		Procfile
README.md		README.md
dashboard.py		dashboard.py
environment.yml		environment.yml
gendata.py		gendata.py
graph_plots.ipynb		graph_plots.ipynb
requirements.txt		requirements.txt
requirements_full.txt		requirements_full.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Corex Dashboard

Using locally with your own dataset

Installation

Preparing input data

Processing the data

Running the dashboard

About

Releases

Packages

Contributors 2

Languages

aspitarl/corex_dashboard

Folders and files

Latest commit

History

Repository files navigation

Corex Dashboard

Using locally with your own dataset

Installation

Preparing input data

Processing the data

Running the dashboard

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages