octohat as a service #34

glasnt · 2015-09-24T23:37:31Z

*dun dun dun*

The text was updated successfully, but these errors were encountered:

glasnt · 2015-09-24T23:46:52Z

Some initial notes from brainstorms with @freakboy3742:

Octohat as a service will be interesting, because of the rate-limiting issue.

Per GitHub's rating limiting, you get 60 requests/hour unauthenticated, or 5000 requests/hour authenticated. Authentication requires an unscoped personal access token. Trying to circumvent the ratelimiting is bad.

I know I'm probably being slightly wasteful with the use of requests at the moment, and could cut back on duplicates/unrequired calls, but it would be interesting to see the requests required to get a total scope of a project. It'd be something like count(users) + (count(issues)) + overhead where overhead is fixed and the users depends on who's found in the issues.

Also, for sufficiently large repos, you can't get all the information in one pass. I've already implemented --limit to go check the last x number of issues, but ideally we'd want a full check of a project, and then update it at some regular interval.

I'd also like to be able to have nice things like octohat.com/user/repo endpoints, where you could trigger a build, and then come back later to see the data, and when it was last updated.

Now, this is assuming I stick with the API-driven version.

What I could do is use the GitHub archive. There are a few problems with this approach

the archive is good for getting changes in a day, but historical parsing will be harder. If I want to use just the GitHub archive, I'd have to parse all data ever collected. Plus, the data formats change over time, see the mothballing of the Open Source Report Card project
the data is event driven, so ensuring that all the events types (CommitComment, IssueComment, etc) match up to what's expected will be difficult.

I could take a hybrid approach - collect all the information once from the API, then update at x frequency from the archives.

glasnt · 2015-10-15T20:51:54Z

The hacktoberfest verify checker from @erikaheidi appears to do just want I want - a github groking as a service \o/

https://github.com/erikaheidi/hacktoberfest-verify

Gotta think about caping things for the webmodel, for usability and not-exploding-the-server-ness

edunham · 2016-02-03T02:18:39Z

To deal with rate limiting, could you just make the service's user plug in their own API token? This could be done pretty transparently by having them oauth login to github when they start using the service, a la https://nightli.es/

glasnt · 2016-02-07T10:18:38Z

Oh yes of course! Setting up octohatrack as an application would solve authentication and limiting concerns from a centralisation standpoint!

glasnt · 2016-02-07T10:20:18Z

Now the question is timelength of results.

I'm tempted to say something like "Only look at the last month, or 20 issues", then detail the exact commands to run to use the CLI to get the whole version. I'm very worried about a bounce rate when it takes ~minutes to go through big repos in entirety, not to mention the larger scale caching issue.

software-opal · 2016-02-10T09:41:06Z

Could adding some degree of asynchronous-ity improve the speed(say through asyncio), and possibly the option of providing partial results as they become available. Because the major speed problem is loading the issues and all the comments associated with them.

We could also implement various caching policies to speed up re-requesting a repository:

Cache based on E-Tags or Last-Updated headers, which would prevent re-downloading chunks of data and save rate-limited requests.
By checking the 'updated_at' key(on an issue), we can further reduce the number of requests needed. If the updated_at is the same as our cached one, then we know that no new comments have been made so we don't even need to check them.

To speed up providing information about recent contributors we could sort the issues by 'updated' so we get the most recently updated ones first, prioritising loading those before the others.

We could also store information about each contributor's last contribution to provide a quick mechanism to only display recent contributors.

Initial requests for data(where we have nothing cached) could be handed off to a task runner like Celery, and respond indicating that the data is being fetched, poll-back later(or we could do some websocket magic, which would be cool and interesting but also hard).

If we wanted to get really efficient we could simply request the first page of the /contributor and /issues APIs:

If there are any changes in the /issues then start a task running to update.
If there are any changes in the /contributors then asses how many
- more than a page, pass off a task
- less than a page, just update the DB and respond.

glasnt · 2016-02-10T09:48:42Z

Added context here (because I didn't update this issue fast enough):

Lee has been awesome and made a proof of concept at https://github.com/leesdolphin/js-hatrack
All I did was fork this repo, and create a gh-pages branch for github hosting. so it's completely usable here: glasnt.github.io/js-hatrack

glasnt · 2016-02-10T20:53:25Z

@leesdolphin would you like me to make you an organisation contributor?

With that level of access, you can move your proof of concept into the labhr organisation, and we can consolidate our efforts on the 'as a service' model.

I want to mock up some UI that I've been mulling over. Also, I'd like to figure out how to make the oauth things mentioned by @edunham work, because at the moment throwing raw api keys around is not best practice.

software-opal · 2016-02-12T05:16:08Z

Yeah, I'd love to move js-labhr under LABHR. And I'd love to help with aaS model too.

glasnt · 2016-02-12T22:45:32Z

You should now have enough rights to enact a transfer of the repo into the organisation. Let me know if you need a hand :)

glasnt · 2016-03-04T06:17:55Z

We now have a hat rack as a service!

glasnt mentioned this issue Sep 30, 2015

gh-pages landing page #35

Closed

glasnt mentioned this issue Nov 30, 2015

Cache external URI calls #58

Merged

software-opal mentioned this issue Feb 11, 2016

Implement persistant cache LABHR/hatrack#2

Open

glasnt mentioned this issue Feb 13, 2016

Implement OAuth LABHR/hatrack#5

Closed

glasnt closed this as completed Mar 4, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

octohat as a service #34

octohat as a service #34

glasnt commented Sep 24, 2015

glasnt commented Sep 24, 2015

glasnt commented Oct 15, 2015

edunham commented Feb 3, 2016

glasnt commented Feb 7, 2016

glasnt commented Feb 7, 2016

software-opal commented Feb 10, 2016

glasnt commented Feb 10, 2016

glasnt commented Feb 10, 2016

software-opal commented Feb 12, 2016

glasnt commented Feb 12, 2016

glasnt commented Mar 4, 2016

octohat as a service #34

octohat as a service #34

Comments

glasnt commented Sep 24, 2015

glasnt commented Sep 24, 2015

glasnt commented Oct 15, 2015

edunham commented Feb 3, 2016

glasnt commented Feb 7, 2016

glasnt commented Feb 7, 2016

software-opal commented Feb 10, 2016

glasnt commented Feb 10, 2016

glasnt commented Feb 10, 2016

software-opal commented Feb 12, 2016

glasnt commented Feb 12, 2016

glasnt commented Mar 4, 2016