Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cache external URI calls #58

Merged
merged 6 commits into from
Jan 7, 2016
Merged

Cache external URI calls #58

merged 6 commits into from
Jan 7, 2016

Conversation

glasnt
Copy link
Collaborator

@glasnt glasnt commented Nov 29, 2015

I've rearranged the external calls, including the pagination calls, so that I can memoise via a unique url the JSON response of the API calls.

The cache dict is stored as a local JSON file so it can be recalled on subsequent runs of the script.

Additionally, this captures all external calls, so re-runs of the script do not call externally (testable by disabling local network after a full successful run and re-running)

Thanks to @aurynn, @tveastman and @ncoghlan for the tips to get this happening :)

OMG this is an awesome bit of python that I had no idea you could use before

This creates a function that stores the output of the function `get_code_commentors` for an entire repo call. This means that repeated calls do not have to re-get commentors.

Results are saved to file in a json format, as to not introduce yet-another-keyvalue-store.

This isn't as effective as it could be, because it stores the full result as a list of people. For proper effectiveness, results should be stored on a per pull/issue bases, as so to allow for incremental checks (e.g. 5000 tokens used for a repo with 5000+ issues will mean multiple runs spaced out. Storing the results of previous runs will build up a store of data, which on the final run will just complete 'instantly' rather than have to use any requests)

Thanks to @aurryn, @tveastman and @ncoghlan for their assistance :D
@glasnt
Copy link
Collaborator Author

glasnt commented Nov 29, 2015

Resolves #49

@SvenDowideit
Copy link
Contributor

I'm kinda wary - if i grok the code right, you're caching without checking if the cache is invalid (ie, if there are more comments etc now)

That said, it totally depends on what the goal for this codebase and specifically for this PR are - if its saving on API calls during development for now, then 👍

@glasnt
Copy link
Collaborator Author

glasnt commented Nov 30, 2015

@SvenDowideit yeah, this is a rudimentary fix for now. The cache invalidation would be a Big Thing(tm) and would have to be introduced alongside #34.

'Mutating cache': running octohatrack.py with a sufficiently full cache_file.json for a repo was resulting in an increating cache. This was reproducible even when the script was running on a machine lacking network.

The results for a pull/comments were increasing on subsequent runs. A result with 3 comments would increase to 6, then 9.

Fact: for a PR in github, content for comments appears at both the pulls/ID/comments and issues/ID/comments endpoints. The counters for these will be the same.

Speculation: the results for the issues were being appended to the results for the pulls. Hence a constant increase and not a multiplicative increase in the results.

I have no idea *how* it was mutating, but removing the `+=` of the results to loop through once, and processing the results of two memoise'd calls separately means the cache no longer mutates.
@SvenDowideit
Copy link
Contributor

oh go on, mergify it!

glasnt added a commit that referenced this pull request Jan 7, 2016
@glasnt glasnt merged commit 4439d31 into master Jan 7, 2016
@glasnt glasnt deleted the demo/full_comment_cache branch March 5, 2016 03:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants