Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up BAM, CRAM 3.1 and 4.0 testing in test/test.pl's test_view. #1308

Closed
wants to merge 1 commit into from

Conversation

jkbonfield
Copy link
Contributor

@jkbonfield jkbonfield commented Jul 8, 2021

Many of the SAM test files cover corner cases that don't change with 3.1 or 4.0, so we don't need to rerun every test in every format variation. Additionally we don't need to use compression for every test using BAM.

These have been tested using gcov to validate the total code coverage doesn't change beyond about 5 lines of code.

  • Only run 3.1/4.0 tests on a subset of SAM files.

  • Only test 3.1/4.0 when running with threads. (Second call to test_view). TODO: Likely this could be applied to 3.0 tests too to
    limit which files are tested in a threaded environment.

  • Drop some of the 2.1 tests. If we can go from SAM -> 2.1 -> SAM then it's good enough to validate encode and decode, with no need for 2.1 to 3 interchange checks.

  • Drop some of the 3.0/3.1/4.0 interchange tests. As with 2.1, we only need to know we can go to/from a common SAM format to test the code paths.

  • Don't test compressed BAM on all files. 1 is enough, and then uncompressed for the rest (lvl 1).

  • Don't test uncompressed CRAM on all files. Just do it for 1. We do want compressed mostly though as CRAM has a lot of codecs to explore.

  • No Point in BAM -> CRAM -> BAM -> SAM when BAM -> CRAM -> SAM is sufficient.

  • Drop the multi-ref CRAM testing for v3.1 and v4.0 as it's the same code path tested in v3.0 anyway. This does change coverage marginally, but it's trivial (3 lines reallocing MD in cram_decode.c).

  • Only do multi-ref CRAM testing on CRAM files with > 1 ref and when threaded. (Albeit with a minor 2 line testing difference to cram_io.c.)

  • Reduce the number of CRAM profiles tested. Do full 4 profiles on one file per v3.1 and v4.0. Otherwise a single profile.

The inpact of this is time spent in the test_view subroute drops from

real    2m3.525s
user    1m20.362s
sys     0m26.804s

to

real    0m21.618s
user    0m12.925s
sys     0m4.148s

This single function dominated all testing (which on the same system took 2m37s for the entire test suite), so the impact on overall time is considerable.

Many of the SAM test files cover corner cases that don't change with
3.1 or 4.0, so we don't need to rerun every test in every format
variation.  Additionally we don't need to use compression for every
test using BAM.

These have been tested using gcov to validate the total code coverage
doesn't change.

- Only run 3.1/4.0 tests on a subset of SAM files.

- Only test 3.1/4.0 when running with threads. (Second call to
  test_view).  TODO: Likely this could be applied to 3.0 tests too to
  limit which files are tested in a threaded environment.

- Drop some of the 2.1 tests.  If we can go from SAM -> 2.1 -> SAM
  then it's good enough to validate encode and decode, with no need
  for 2.1 to 3 interchange checks.

- Drop some of the 3.0/3.1/4.0 interchange tests.  As with 2.1, we
  only need to know we can go to/from a common SAM format to test the
  code paths.

- Don't test compressed BAM on all files.  1 is enough, and then
  uncompressed for the rest (lvl 1).

- Don't test uncompressed CRAM on all files.  Just do it for 1.  We do
  want compressed mostly though as CRAM has a lot of codecs to explore.

- No Point in BAM -> CRAM -> BAM -> SAM when BAM -> CRAM -> SAM is
  sufficient.

- Drop the multi-ref CRAM testing for v3.1 and v4.0 as it's the same
  code path tested in v3.0 anyway.  This does change coverage
  marginally, but it's trivial (3 lines reallocing MD in cram_decode.c).

- Only do multi-ref CRAM testing on CRAM files with > 1 ref and when
  threaded. (Albeit with a minor 2 line testing differnce to cram_io.c.)

- Reduce the number of CRAM profiles tested.  Do full 4 profiles on
  one file per v3.1 and v4.0.  Otherwise a single profile.

The inpact of this is time spent in the test_view subroute drops from

    real    2m3.525s
    user    1m20.362s
    sys     0m26.804s

to

    real    0m21.618s
    user    0m12.925s
    sys     0m4.148s

This single function dominated all testing (which on the same system
took 2m37s for the entire test suite), so the impact on overall time
is considerable.
@valeriuo
Copy link
Contributor

Merged as d16bed5

@valeriuo valeriuo closed this Jul 14, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants