Create an initial dashboard that provides a predtimechart-based forecast visualization component for the hub. The thinking is that this will allow us to get something practical into the hands of hubverse users relatively quickly.
Considerations:
- Client side only, i.e., nothing server-side. This will greatly simplify new hub onboarding.
- Others: @todo
A python application to create predtimechart JSON files is available from this repository and can be installed in a fresh python environment via pip:
pip install git+https://github.com/hubverse-org/hub-dashboard-predtimechart
The application can then be run from the command line:
hub_predtimechart --help
The major parts of this project are:
- Forecast visualization component: To visualize forecast data, we will generalize the predtimechart JavaScript component to work with Hubverse hubs. Details:
- specific component changes: @todo
- Visualization data files: This project will configure predtimechart to load its data from
.json
files that will be generated from hub forecast files, an approach similar to how viz.covid19forecasthub.org works (GitHub repo). This requires us to write a program (we will use Python) to generate those.json
files, like the R files here. The.json
files will be stored in the AWS S3 bucket for each hub, akin to how hubverse-transform saves its generated.parquet
files to S3. Our initial constraints:output_type
: To start we will only support hubs that containquantile
forecasts (please see Output types in the docs).intervals
: @todo
- Predtimechart configuration: Predtimechart is configured via a JavaScript options object that specifies settings like
available_as_ofs
,task_ids
,models
, etc. Our current thinking is that this object will be generated from hub configuration files.- generation details (reference_date -> as_of/selected date, horizon, target_date: x axis, task id vars -> dropdowns, ...): @todo
- Server/Dashboard: We will write a simple dashboard page providing a link to the forecast visualization (predtimechart) page. Our initial thought is to implement this via a straighforward S3 static website (i.e., a self-contained
index.html
file, perhaps with some JavaScript to access basic hubverse admin information to orient the viewer such as hub name, tasks summary, etc.) Two comparable sites are https://respicast.ecdc.europa.eu/ (especially) and https://covid19forecasthub.org/ . See [Dashboard architecture] below for details.
Initially the visualization will have these limitations:
- Only one round block in
tasks.json > rounds
can be plotted. - Only one
model_tasks
group within that round block can be plotted, and onlymodel_tasks
groups withquantile
output_types will be considered. - The following quantile levels (
output_type_id
s) are present in the data: 0.025, 0.25, 0.5, 0.75, 0.975 - The hub has
reference_date
|origin_date
andtarget_date
|target_end_date
task IDs intasks.json > rounds > model_tasks > task_ids
. - Model metadata must contain a boolean
designated_model
field. - In the specified
model_tasks
object within the specifiedrounds
object, all objects in thetarget_metadata
list must have the same single key in thetarget_keys
object. - Only forecast data will be plotted, not target data.
- We assume all hub files have been validated.
- For the
task_ids
entry in predtimechart config option generation, we usevalue
for bothvalue
andtext
, rather than asking the user to provide a mapping fromvalue
totext
. A solution is to require that mapping inpredtimechart-config.yml
. - The
initial_as_of
andcurrent_date
config fields are the last ofhub_config.fetch_reference_dates
. - The
initial_task_ids
config field is the firsttask_ids
value
. - Target data generation: The app
generate_target_json_files.py
is limited to hubs that store their target data as a .csv file in thetarget-data
subdirectory. That file is specified via thetarget_data_file_name
field in the hub'spredtimechart-config.yml
file. We expect the file has these columns:date
,value
, andlocation
.
Some visualization-related information must be configured for each hub, including:
- which interval levels to show. initially: None, 50%, 95%
- which round block in
tasks.json
to use - reference_date column name
- target_date column name
- name of boolean field for model inclusion. initially we will assume it is
designated_model
- names of hub models - to be listed first
initial_checked_models
(a predtimechart option)- others: @todo
Our initial thinking is an approach where we provide a fixed layout (e.g., a menubar at top and a content area in the middle, such as found at https://respicast.ecdc.europa.eu/ ) that allows limited customization specified by convention via markdown files (some with specific names) placed in directories with specific names. Details:
- Configurable content is specified via markdown files located in a directory named
hub-website
(say) in the root hub directory. - The site layout is a single column (100% width) with two rows: A menubar/header at the top, and a content area taking up the rest of the vertical space.
- The menubar contains these items (from left to right): Home (brand image/text), "Forecasts", "Evaluations", "Background", "Community", "Get in touch".
- The content area depends on the selected menu item:
- Home: Content is loaded from
hub-website/home.md
. - "Forecasts": Content is the predtimechart visualization.
- "Evaluations": @todo
- "Background", "Community", "Get in touch": @todo loaded from specific files under
hub-website
such asbackground.md
, etc.
- Home: Content is loaded from
We plan to primarily use https://github.com/hubverse-org/example-complex-forecast-hub for development unit tests.
- How/when will file generation be triggered? This applies to both
.json
visualization files and the predtimechart configuration object. For example, and admin UI, GitHub Action on schedule, round close, etc. - Is this a good time to remove predtimechart's user ensemble, if desired?
- Is this an opportunity to set up some kind of general purpose notification service for interested parties (e.g., hub admins) that informs them when, say, the viz is configured or updated, viz data files are updated, etc.?
- Dashboard: Do we want to allow users to add menu items that link to pages with content loaded from .md files? For example, should we support a
hub-website/menus
where users can put files that become menu items with the file name (capitalized, say) and content generated from the file. - Generation/scheduling: We will need a flag to indicate whether we want to regenerate forecast json files for all past weeks, or only for the present week.
- Where is the source data coming from - GitHub vs. S3?
- Which model output formats will we support? The hub docs mention CSV and parquet, but that others (e.g., zipped files) might be supported.
- Regarding naming the .json files, should we be influenced by Arrow's partitioning scheme where it names intermediate directories according to filtering.
- We might need separate apps to update config options vs. visualization data (json files) for the case where the user has changed
predtimechart-config.yml
independent of a round closing. - Should we filter out
hub_config.horizon_col_name == 0
? - Should
forecast_data_for_model_df()
'squantile_levels
be stored in a config file somewhere?
Use the following to create a local dev setup using pyenv and pipenv, which we assume are already installed.
$ cd <this repo>
$ pyenv versions # you should see this repo's .python-version set
$ pipenv --python $(pyenv which python)
$ pipenv install pip-tools # for `pip-compile`
$ pipenv run pip-compile --extra=dev --output-file=requirements/requirements.txt pyproject.toml
$ cd <this repo>
$ pipenv install -r requirements/requirements.txt -e .
$ cd <this repo>
$ pipenv run python -m pytest
$ pipenv run python src/hub_predtimechart/app.py