-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cache external URI calls #58
Conversation
OMG this is an awesome bit of python that I had no idea you could use before This creates a function that stores the output of the function `get_code_commentors` for an entire repo call. This means that repeated calls do not have to re-get commentors. Results are saved to file in a json format, as to not introduce yet-another-keyvalue-store. This isn't as effective as it could be, because it stores the full result as a list of people. For proper effectiveness, results should be stored on a per pull/issue bases, as so to allow for incremental checks (e.g. 5000 tokens used for a repo with 5000+ issues will mean multiple runs spaced out. Storing the results of previous runs will build up a store of data, which on the final run will just complete 'instantly' rather than have to use any requests) Thanks to @aurryn, @tveastman and @ncoghlan for their assistance :D
Resolves #49 |
I'm kinda wary - if i grok the code right, you're caching without checking if the cache is invalid (ie, if there are more comments etc now) That said, it totally depends on what the goal for this codebase and specifically for this PR are - if its saving on API calls during development for now, then 👍 |
@SvenDowideit yeah, this is a rudimentary fix for now. The cache invalidation would be a Big Thing(tm) and would have to be introduced alongside #34. |
'Mutating cache': running octohatrack.py with a sufficiently full cache_file.json for a repo was resulting in an increating cache. This was reproducible even when the script was running on a machine lacking network. The results for a pull/comments were increasing on subsequent runs. A result with 3 comments would increase to 6, then 9. Fact: for a PR in github, content for comments appears at both the pulls/ID/comments and issues/ID/comments endpoints. The counters for these will be the same. Speculation: the results for the issues were being appended to the results for the pulls. Hence a constant increase and not a multiplicative increase in the results. I have no idea *how* it was mutating, but removing the `+=` of the results to loop through once, and processing the results of two memoise'd calls separately means the cache no longer mutates.
oh go on, mergify it! |
I've rearranged the external calls, including the pagination calls, so that I can memoise via a unique url the JSON response of the API calls.
The cache dict is stored as a local JSON file so it can be recalled on subsequent runs of the script.
Additionally, this captures all external calls, so re-runs of the script do not call externally (testable by disabling local network after a full successful run and re-running)
Thanks to @aurynn, @tveastman and @ncoghlan for the tips to get this happening :)