Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Caching issue/PR data #18

Open
choldgraf opened this issue Dec 1, 2019 · 1 comment
Open

Caching issue/PR data #18

choldgraf opened this issue Dec 1, 2019 · 1 comment

Comments

@choldgraf
Copy link
Member

Sometimes it's useful to store issues / PRs / etc if you want to analyze them later. This wouldn't be useful for generating changelogs (since you want to make sure you've got the latest activity for those) but it could be useful for generating datasets that one can analyze with, e.g., https://github.com/choldgraf/jupyter-activity-snapshot.

Perhaps this could keep a cache folder in ~/data_github_activity that would keep this data over time. A few points / questions:

  • It could either be a single CSV files for all the data, a couple CSV files for different types of data (e.g., issues.csv, prs.csv, comments.csv), or sub-folders for different github orgs/repos
  • When new data is downloaded, it could do simple joins on these CSV files and then drop the duplicates based on the unique ID of that item

@consideRatio what do you think about this? Useful or unnecessary complexity?

@consideRatio
Copy link
Collaborator

Hmmm, i dont want to influence you much on this as i represent a very specific need about changelog generation mainly, but i think its not out of scope for the github-activity project to allow for output to csv or json etc that are more suitable to process from disk than a markdown file.

I can imagine we could do some nice things from this. Perhaps putting out systematic metrics for releases that could be fun to look at between the projects etc.

How long time since last release, how many prs, how large prs, how many people contributed, was that an increase or decrease, etc etc, assuming we start to analyze data more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants