-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[docstrCoverage] Add docstrCoverage badge service #5340
Conversation
|
Hello. Thanks for working on this. I'm looking forward to reviewing this as I think it may be a useful pattern for other badges. That said, I guess the first very high-level thing we need to raise here is that our badges usually fetch an API response (JSON/XML/YAML) and extract a value from it. We do restrict the maximum response size we will accept, but usually if the job is "fetch some JSON and extract a number from it" we can reasonably expect that this will be of an acceptable size and this operation can be completed in some sensible amount of time. Once the problem becomes "fetch a zip file, unzip it and extract a number from one file in that zip", it feels like we could easily get into dealing with larger files, longer execution times, more disk space/memory used, etc. I've done no investigations on this myself yet, but...
I'd also welcome feedback from one of the other @badges/shields-maintainers on these issues. I think its important not to sleepwalk into a big performance problem here. Aside from that, I haven't looked at the service code in any real detail yet, but just doing a really quick first pass, there are several basic things to sort out:
Lets also get those issues sorted then we can dive into the code review properly. |
I haven't had a chance to look at this in detail yet either, but share the same concerns. For some reason based on the messages I saw in Discord I thought this would just need a json/txt type of file via the GH APIs, with some preceding work to identify the correct run/build. I'd have significant concerns about downloading archive files and extracting them |
Since posting last night, I've done a little bit more thinking/reading about this and there's a couple of other points to throw in:
|
Thank you for taking a look at this PR. I know it's borderline but I hope we can sort things out. Following your concerns:
Sadly Github Actions API are supported only using the JSON API.. If GraphQL were supported we could skip the second JSON request and directly download the zip file. I also thought of inserting the percentage directly in the artifact name so we could avoid downloading it, but it feels really hacky.
I think they can be arbitrarily large, but the second JSON response contains the size of the artifact we're about to download, so we could manually limit the size of it (since it's going to contain only a 0-100 number, we could limit this to a small amount of bytes)
Between 3-6 seconds of execution time. This is actually the reason some tests are currently failing.
I'm not saving files to disk, I'm using a Buffer, so there should be no problems regarding this.
I'm not sure about this, I'll take a deeper look at it. Regarding the basic things to sort out, I'll update the PR as soon as possible. Surfacing an artifact through the CI build and reading from it was something we discussed about on Discord, and to me it seems the most sensible/secure way to achieve a dynamic badge without a real external service. What do you think about this? |
So this in itself is a big issue. The most common place where our badges are viewed is GitHub. GitHub proxies images via its own cache and there's a hard limit of 4 seconds it will wait for a response. If the cache calls our endpoint and it takes more than 4 seconds, GitHub won't render the image, even if it returns a valid image after 4.1 seconds so if the best-case scenario is 3 seconds this badge would probably mostly render fail to render in one of the places its most likely to be viewed.
I think it would be fine to remove the second call and infer the zip URL from the first response The other option would be to call https://api.github.com/repos/fabiosangregorio/telereddit/actions/artifacts That does give us the size up-front. Then we could key off the artifact name instead of the workflow name. That would get us down to 2 API calls and we'd know the size up-front (allowing us to 'fail fast' easily if we think the zip is too big). We don't get to do the As you say, it would be really nice if this was available via the v4 API.
Lets make sure the core idea is sound first. I don't want to give you a bunch of "busy work" if there just isn't a way forward here.
I do think this could be a useful pattern which is why I'm trying to suggest ways we can find a solution rather than just saying "no", but we do need to find a performant way to do it if we're going to merge this. |
Thank you again for the celerity and for the research. Sorry for the late replies, but I'm on holiday :) 😎
This is not entirely true. We only want to fetch the runs for a specific branch that were successful. From the docs this can be achieved using Parameters such as
I think the response headers for the download request would contain the size of the file.
This would mean we could have the user name the artifact
I think I wasn't clear on this. I meant that I also thought of inserting the percentage directly in the artifact name, so we wouldn't have to download it in order to get the percentage, something like I think leveraging the name will make the performance acceptable. If you agree, I'll go ahead and make the modifications when I get back. |
Why is the artifact zipped? |
|
I see what you mean now. Yes - agreed. As far as possible, the procedure should be that the user dumps the output of a thing to an artifact (in this case the output of
Yeah. Lets have a go at:
..and see what the performance is like. My instinct is that should be in a sensible ballpark but see how it goes. |
There's clearly been a lot of great discussion in this thread, but I confess I'm having trouble following the current/latest plan, particularly as it relates to retrieving the information from GitHub. Here's what I think I've gathered (please let me know if I'm wrong).
If I do indeed have that correct, then one question I'd have is whether we'd still be able to support branch filtering so that users could have a coverage badge based on their default branch. |
Gah. That's a good point that I hadn't thought of - |
If I'm correct, we agreed to have the branch name in the artifact name, in a format like |
I suppose that could functionally get the job done, but I do have some concerns about a solution that's not really deterministic and necessitates education and communication outside of the Shields ecosystem. We can try to document that requirement for being able to use this badge as best we can (in the modal badge builder, somewhere within our repo, etc.) but a large portion of our user base will miss those and I suspect we'd end up fielding a lot of "why isn't this working right" questions, especially if this pattern was extended to support other report-related information published from a Workflow/Job run. I feel like one of the biggest gaps still in place today with GitHub Actions is the absence of basic test results/code coverage reporting that most other CI/pipeline platforms seem to support these days. Perhaps in parallel it'd be worth seeing if GitHub have any plans to support this more natively. |
I'm not massively keen on "this works, but you have to adopt a specific naming convention" either.
Hmm. I'm not convinced this would be any easier with Travis/CircleCI/etc. They're all fairly reliant on reporting to external services like coveralls/codecov/etc. I think we could arrive at a reasonable solution here, but the REST endpoints that exist just spread what we need across a few different API calls. Just for the lolz, I opened a thread on the GH community forum asking about support for action artifacts on the the Graphql API https://github.community/t/artifacts-with-the-v4-graphql-api/124640/2 as that would probably allow us to solve that. |
Travis is fairly in the same boat, though Circle provides native support for test results as does AppVeyor https://circleci.com/docs/2.0/collect-test-data/#api Other platforms we integrate with like Jenkins and Azure DevOps make both test results and coverage natively available, and GitLab Pipelines make the data available as well. I'm not saying that GHA is alone in not really having support here, just that they're definitely on the lower end of the spectrum on feature/support for test artifacts. |
I must admit that I've not followed the discussions here. My understanding is that the badge would require multiple API calls which would take too long to render, and cutting down those API calls would force us to infer information such as branch filtering using error-prone conventions. Do we see an acceptable way forward on this one or should we drop it? |
@PyvesB you are correct. Unless GitHub introduces support for GraphQL APIs on Github Actions (no evidence of it so far) the render time of this badge would be untenable. Feel free to decide whether to leave this PR open in hope of this feature becoming available, or to close it. |
Thanks for confirming @fabiosangregorio ! I'll go ahead and close it, it can always be reopened with a single button click if/when the time comes. Regardless, thanks for your interest in the Shields.io project. 😉 |
This PR adds the badge for docstr-coverage (#5339), a documentation coverage package for python.