Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up BAM, CRAM 3.1 and 4.0 testing in test/test.pl's test_view. #1308

Closed
wants to merge 1 commit into from

Commits on Jul 8, 2021

  1. Speed up BAM, CRAM 3.1 and 4.0 testing in test/test.pl's test_view.

    Many of the SAM test files cover corner cases that don't change with
    3.1 or 4.0, so we don't need to rerun every test in every format
    variation.  Additionally we don't need to use compression for every
    test using BAM.
    
    These have been tested using gcov to validate the total code coverage
    doesn't change.
    
    - Only run 3.1/4.0 tests on a subset of SAM files.
    
    - Only test 3.1/4.0 when running with threads. (Second call to
      test_view).  TODO: Likely this could be applied to 3.0 tests too to
      limit which files are tested in a threaded environment.
    
    - Drop some of the 2.1 tests.  If we can go from SAM -> 2.1 -> SAM
      then it's good enough to validate encode and decode, with no need
      for 2.1 to 3 interchange checks.
    
    - Drop some of the 3.0/3.1/4.0 interchange tests.  As with 2.1, we
      only need to know we can go to/from a common SAM format to test the
      code paths.
    
    - Don't test compressed BAM on all files.  1 is enough, and then
      uncompressed for the rest (lvl 1).
    
    - Don't test uncompressed CRAM on all files.  Just do it for 1.  We do
      want compressed mostly though as CRAM has a lot of codecs to explore.
    
    - No Point in BAM -> CRAM -> BAM -> SAM when BAM -> CRAM -> SAM is
      sufficient.
    
    - Drop the multi-ref CRAM testing for v3.1 and v4.0 as it's the same
      code path tested in v3.0 anyway.  This does change coverage
      marginally, but it's trivial (3 lines reallocing MD in cram_decode.c).
    
    - Only do multi-ref CRAM testing on CRAM files with > 1 ref and when
      threaded. (Albeit with a minor 2 line testing differnce to cram_io.c.)
    
    - Reduce the number of CRAM profiles tested.  Do full 4 profiles on
      one file per v3.1 and v4.0.  Otherwise a single profile.
    
    The inpact of this is time spent in the test_view subroute drops from
    
        real    2m3.525s
        user    1m20.362s
        sys     0m26.804s
    
    to
    
        real    0m21.618s
        user    0m12.925s
        sys     0m4.148s
    
    This single function dominated all testing (which on the same system
    took 2m37s for the entire test suite), so the impact on overall time
    is considerable.
    jkbonfield committed Jul 8, 2021
    Configuration menu
    Copy the full SHA
    23a4916 View commit details
    Browse the repository at this point in the history