Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a script to retrieve model-output and target-data row_counts from hub repos #7

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

bsweger
Copy link
Collaborator

@bsweger bsweger commented Dec 5, 2024

Background

This PR adds a script that can be run to generate row count statistics for a given list of Hubverse GitHub repos: get_hub_stats.py.

Directions for running it are at the beginning of the script (required: a personal GitHub token and uv).

Output

The script saves data in data/hub_stats, including file-level counts for the hubs it processes. The output of most interest is likely hub_stats_summary.csv, which aggregates row counts by hub and file type (model-output and target-data)

The summary .csv is updated each time the script is run.

This script captures rowcounts for all model-output and
target-date files in a list of Hubverse-compatible hubs
This commit also adds stats for cdcepi/FluSight-forecast-hub
This changeset also adds a summary .csv file that groups
file counts by hub and directory (e.g., model-output/target-data)
This changeset ensures that the script considers all files with the
same name when counting rows (for example, some hubs use the
same target-data file names in different subdirectories)

The commit also includes updated data for the hubs we've already
tallied (and changes the "team" column name to the more correct
"model_id")
@bsweger bsweger requested a review from elray1 December 5, 2024 21:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant