Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add code coverage support #3759

Merged
merged 24 commits into from
Jan 30, 2023
Merged

add code coverage support #3759

merged 24 commits into from
Jan 30, 2023

Conversation

jmayclin
Copy link
Contributor

@jmayclin jmayclin commented Jan 14, 2023

This commit adds cmake configuration to support source-based code coverage. It also adds the CI configuration necessary to upload code coverage to codecov.io during CI runs.

Description of changes:

We already use source-based code coverage in our codebase for fuzzing coverage information, but that is only enabled through make, which we are trying to move away from.

Coverage is enabled with the COVERAGE cmake option. Note that compiling to a static archive .a seems to break something, and LLVM will report that the coverage information is corrupted. To get around this we just compile to a shared library .so.

cmake . -Bbuild \
    -DCOVERAGE=ON \
    -DBUILD_SHARED_LIBS=ON

Each process then writes out profiling information to a file, which is specified by the LLVM_PROFILE_FILE environment variable. We set LLVM_PROFILE_FILE="ut_%p.profraw" in our CI, which causes each process to write to it's own file. Otherwise the profiles would get overwritten.

The CI configuration is a bit of a break from tradition, since it doesn't rely on the omniscript (s2n_codebuild.sh). Rather than adding configuration to the run_unit_tests method in the omniscript, I thought that it would be neater to try and keep the complexity contained to the coverage specific stuff. It also has the nice advantage of fitting more neatly into codebuild's different phases.

Infrastructure added for CI:

  • parameter store -> holds codecov.io token
  • codebuild generalbatch service role -> added permission to call ssm::getParameters for the code cov token resource
  • s3 bucket -> holds html coverage reports + 30 day lifecycle policy
  • codebuild generalbatch service role -> added permission to write/list s3 bucket
  • cloudfront distribution -> public access for html coverage reports
  • s3 bucket policy -> gives cloudfront access

CallOuts

This PR doesn't currently include any code cov configuration. Their documentation isn't exactly great (Can I upload an lcov info file? Idk, read the source code and find out) so my current approach is to add some minimal integration in this PR, and modify it as needed once we have a better idea of what it looks like.

We don't run CI on changes to our buildspec's, so the buildspec has a txt extension. There will need to be a follow-up PR to change this to ayml extension. I preferred this txt approach since that way the buildspec can be reviewed alongside the relevant changes.

I think we'll also need to go in and modify the codebuild jobs manually for these changes to take effect. The will require editing the build spec for the s2nGeneralBatch job to include

    - identifier: s2nUnitCoverage
      buildspec: codebuild/spec/buildspec_unit_coverage.yml
      env:
        privileged-mode: true
        compute-type: BUILD_GENERAL1_LARGE
        image: 024603541914.dkr.ecr.us-west-2.amazonaws.com/docker:ubuntu22codebuild
        variables:
          S2N_LIBCRYPTO: openssl-1.1.1

I included this in the buildspec_omnibus, but it looks like our actual CI jobs have diverged from the buildspecs in that folder.

Also, the s3 upload (and publicly visible coverage report) currently is just under the latest/ path, which means that there won't be specifically visible reports through cloudfront: https://dx1inn44oyl7n.cloudfront.net/index.html

Additionally, codecov.io is being nasty and doesn't count any lines with missed branches as covered. Which means that our coverage score is really low, because we use lots of macros and really test all the branches on every line for the macros. Not quite sure what the solution is here. So LLVM reports lines coverage of ~90% but codecov says something more like 65%.

Testing:

I reran the build to ensure that the coverage information remained relatively constant from run to run. There are some branch differences, but generally the code coverage remain very close from run to run. There were some branch and region count differences, but most of these looked extremely nasty, e.g. branches in __typeof(n) macros. While I'm sure it's possible to make our unit tests totally reproducible, I expect it to be rather expensive to do so.

I also manually kicked off a CI job to confirm that the coverage reports got written to S3, and that the coverage report was uploaded to codecov.io.

Not sure about the exact codecov configuration.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

Ubuntu and others added 16 commits January 6, 2023 02:18
I think the link/compile options might be too broad. Need to only enable
them when LLVM is the compiler. Otherwise should probably just enable
the coverage flag. Or perhaps the coverage flag should always be
enabled.

The reports are not particularly nice looking. We should add an
additional to link the summary table to the line level coverage. I think
if we just shove all of the things into relative html paths that should
be do-able.
This commit adds a few files that will be necessary for code coverage in
CI runs. It breaks a little bit from tradition by avoiding the
omni-script, but I think that is quite justifiable in this case since
the unit test setup in the omniscript (s2n_codebuild.sh) doesn't work
for me anyways.

I will be attempting to slot most functionality into the correct
codebuild phases, since it will make the codebuild logs a bit nicer to
look at.
This commit should add the functionality necessary to run the proper
build and upload reports to code cov and s3. Once they are uploaded to
s3 they will be fronted by a cloudfront distribution which will allow
them to be publicly accessible.
post-build is not valid. it should be post_build. :( In my defence, I
feel like codebuild should really be supporting arbitrary phase names
anyways.
I always forget about this part. Silly inodes shneakily making
themselves part of the source control. I'm also sourcing the environment
setup. It would be nice to minimally separate those out rather than
chaining them together, but that feels like a task for a different time.
I didn't actually do that in the last commit even though I thought that
I did.
I should go take a nap. These silly mistakes are irritating me.
I'm kinda just copying the fuzz setup at this point. There are path
difficulties in the build, and it seems somewhat reasonable that this
might solve it? Uncertain.
I don't think that the fuzzers get their clang from any of our CI stuff
:( But it is stabbing me in the back because I don't think the fuzzer
specs require cmake, but yours truly most certainly does. Two build
images will make this go faster right?

amend: nvm, I'm just sticking with the one
Should just be setting the CC environment variable. Unfortunately, we
still haven't found clang so not sure how that's supposed to work.
So, it looks like our source control isn't really authoritative here.
It's more of a "best-effort" to track our CI jobs. The previous fuzz
jobs that I was trying to copy were outdated. Hoping these ones are a
bit more in style.
I made progress that time, but so many logs were generated that I
couldn't see what the original error message was. Now we should abort on
failure.
Talked to Doug, and it looks like the LatestClang isn't necessarily
expected to happen. I was expecting that to set the CC/CXX variables,
but since that isn't going to be the case we can just set it in the
buildspec.
My build ended up failing a random openssl test :( not quite sure why,
and it is going to be very painful to debug in CI. Trying to focus on
getting coverage actually working, so punting that task off to some
other day.
The last run was able to successfully go through my manually invoked CI
run. Moving towards pushing this into CR. I didn't find out that code
cov really doesn't like some of the branch misses in the macros's, and
our overall codebase coverage is significantly lower than codecov
because of that. There doesn't seem to be a way to turn it off either.
@github-actions github-actions bot added the s2n-core team label Jan 14, 2023
@jmayclin jmayclin requested a review from lrstewart January 14, 2023 01:46
@jmayclin jmayclin marked this pull request as ready for review January 14, 2023 01:47
@jmayclin jmayclin requested a review from dougch as a code owner January 14, 2023 01:47
@jmayclin jmayclin requested a review from maddeleine January 17, 2023 20:43
Copy link
Contributor

@maddeleine maddeleine left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the end result of this PR? Is it that a coverage report will be generated per PR? How will a reviewer/author navigate from the PR to seeing that report?

@jmayclin
Copy link
Contributor Author

jmayclin commented Jan 17, 2023

end result of this PR?

CMake configuration to support code coverage & CI job that runs unit tests, calculates code coverage, then uploads that to S3/CloudFront & CodeCov.io.

Is it that a coverage report will be generated per PR?

Yes, each time the GeneralBatch coverage job is run, the coverage will be computed / uploaded. I expect that codecov will handle deduping for each PR

How will a reviewer/author navigate from the PR to seeing that report?

My expectation is that the codecov app will post a link onto the PR: https://github.com/apps/codecov
We already have the app installed onto this repository, and my understanding is that the CodeCov app will automatically post the coverage information, but the documentation is a little opaque on that, and doesn't really cover the CodeBuild CI use case. I followed the quick-start steps here: https://docs.codecov.com/docs/quick-start

Also worth calling out that the codecov reports should all be stored individually (per-pr) here: https://app.codecov.io/gh/aws/s2n-tls while the current cloudfront report are just "last write wins" in the S3 bucket. I think that we can get the reports to write to the PR prefix by using the CODEBUILD_SOURCE_VERSION mentioned in codebuild documentation but I figured that was more of a v2 task.

So the exact shape of the codecov.io integration is a bit uncertain right now, but next steps are

  1. manually change the CI to include relevant buildspec
  2. observe default/current codecov.io settings
  3. make relevant codecov.io config changes.
    We can try and frontload the config changes, but I expect "observe it" is going to be more efficient than trying to read through their documentation, although it will be a bit more noisy.

@maddeleine
Copy link
Contributor

My expectation is that the codecov app will post a link onto the PR: https://github.com/apps/codecov We already have the app installed onto this repository, and my understanding is that the CodeCov app will automatically post the coverage information, but the documentation is a little opaque on that, and doesn't really cover the CodeBuild CI use case. I followed the quick-start steps here: https://docs.codecov.com/docs/quick-start

Currently codecov just reports a little badge in our Readme. So technically we are already using codecov. I assume the reports would show up automatically, but I don't think we know... 🤔 You're probably right. The best thing to do to figure it out is get this running in our CI and go from there.

Copy link
Contributor

@dougch dougch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've been splitting out the buildspec changes so as to allow the codebuild checks to run (or include forced/over-ridden job links)

codebuild/spec/buildspec_unit_coverage.yml Show resolved Hide resolved
codebuild/spec/buildspec_unit_coverage.yml Outdated Show resolved Hide resolved
@jmayclin jmayclin force-pushed the code-coverage branch 2 times, most recently from 7e6000e to 02b7948 Compare January 23, 2023 18:25
@codecov
Copy link

codecov bot commented Jan 23, 2023

Codecov Report

Merging #3759 (02b7948) into main (15bdf99) will increase coverage by 0.52%.
The diff coverage is 65.38%.

❗ Current head 02b7948 differs from pull request most recent head 3fbd377. Consider uploading reports for the commit 3fbd377 to get more accurate results

@@            Coverage Diff             @@
##             main    #3759      +/-   ##
==========================================
+ Coverage   68.56%   69.08%   +0.52%     
==========================================
  Files         233      233              
  Lines       21913    21630     -283     
  Branches     7757     7676      -81     
==========================================
- Hits        15024    14944      -80     
+ Misses       1898     1769     -129     
+ Partials     4991     4917      -74     
Impacted Files Coverage Δ
tls/extensions/s2n_client_alpn.c 60.60% <ø> (+9.32%) ⬆️
tls/extensions/s2n_client_key_share.c 65.21% <ø> (+0.83%) ⬆️
tls/extensions/s2n_client_max_frag_len.c 84.21% <ø> (+20.21%) ⬆️
tls/extensions/s2n_client_pq_kem.c 61.29% <ø> (+9.93%) ⬆️
tls/extensions/s2n_client_renegotiation_info.c 70.90% <ø> (+3.66%) ⬆️
tls/extensions/s2n_client_sct_list.c 100.00% <ø> (+46.15%) ⬆️
tls/extensions/s2n_client_server_name.c 66.66% <ø> (+8.88%) ⬆️
tls/extensions/s2n_client_session_ticket.c 90.90% <ø> (+19.48%) ⬆️
tls/extensions/s2n_client_signature_algorithms.c 100.00% <ø> (+60.00%) ⬆️
tls/extensions/s2n_client_status_request.c 80.95% <ø> (+17.98%) ⬆️
... and 34 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

This commit addresses some of the PR feedback. I don't think I actually
need to be doing any of the third party test dependency things.
@jmayclin
Copy link
Contributor Author

Sorry for the ugly commit history, I had some sort of dirty merge/rebase happen, but should be good now.

CodeBuild jobs won't run for yml files. But I still want this to be
reviewed with the code build configuration stuff.
Copy link
Contributor

@dougch dougch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

CMakeLists.txt Outdated Show resolved Hide resolved
codebuild/spec/buildspec_unit_coverage.txt Outdated Show resolved Hide resolved
Comment on lines 17 to 19

# upload to codecov.io
curl -Os https://uploader.codecov.io/latest/linux/codecov
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If codecov isn't accurate for our codebase, is this a useful step?

But I'm torn on whether it IS useful, from your description. The problem is that some single-line macros are very important and both branches SHOULD be tested. But some single-line macros aren't important and should be literally impossible to fail or test, like most of our extremely paranoid null checks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've gone back and forth on this, but I've mostly arrived at "I can't fault CodeCov.io for pedantically accurate coverage". I think it is still nice to have from the diff perspective, but don't have a strong opinion on that.

One other option that we could consider is modifying the unit_test_coverage.info to lie about our branch coverage, so that CodeCov.io reporting would be equivalent to the reporting at https://dx1inn44oyl7n.cloudfront.net/index.html

Copy link
Contributor

@lrstewart lrstewart Jan 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think my point is that the LLVM is arguably just as inaccurate as codecov.io, just in our favor. If I'm understanding the behavior you're describing, then codecov.io isn't being pedantic, it's being more accurate. We can't assume all single-line macros don't need to be tested any more than we can assume all single-line macros do need to be tested. We've got a weird mix.

- comment for fatal error
- declare cmake option
- add libcrypto root thing for build script

Previously I think we were just using the default libcrypto from the
environment.
@jmayclin jmayclin requested a review from lrstewart January 25, 2023 18:51
@jmayclin
Copy link
Contributor Author

jmayclin commented Jan 26, 2023

Had a chat with team members and arrived at the following conclusion

codecov.io requires all branches to be checked before a line account, including those in macros. codecov.io probably isn't the best fit for us, because of our paranoid usage of safety and assertion macros, which we never expect to be hit.

So the route forward is to rely a little bit more heavily on the lcov reports. To do that we want

  • coverage report for each PR to be uploaded to it's own path
  • summary dumped to logs
  • link to coverage report outputted in logs.

- remove the codecoverage upload after talking to Lindsay
- add branch coverage to the lcov report
- upload the text summary to s3
- upload the report to s3
- s3 uploads are done to a specific PR path
@jmayclin
Copy link
Contributor Author

#3781 tracks proposed future enhancement for code coverage

@jmayclin jmayclin enabled auto-merge (squash) January 28, 2023 03:03
@jmayclin jmayclin merged commit 4bd1505 into aws:main Jan 30, 2023
@jmayclin jmayclin deleted the code-coverage branch December 22, 2023 01:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants