Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow fingerprinting of SAM files that only have a partial dictionary match to the haplotype map #1955

Conversation

yfarjoun
Copy link
Contributor

@yfarjoun yfarjoun commented Apr 3, 2024

Despite efforts to the contrary, FingerprintChecker cannot fingerprinting sam files that have a mismatching dictionary from the one stated in the haplotype map (it works for vcfs). This is because when fingerprinting sam files, the SamLocusIterator is given an IntervalList to query, and this IL has a mismatching dictionary which isn't supported.

This PR resolves this by creating a new IL just before querying the Iterator using the proper dictionary.

Description

Give your PR a concise yet descriptive title
Please explain the changes you made here.
Explain the motivation for making this change. What existing problem does the pull request solve?
Mention any issues fixed, addressed or otherwise related to this pull request, including issue numbers or hard links for issues in other repos.
You can delete these instructions once you have written your PR description.


Checklist (never delete this)

Never delete this, it is our record that procedure was followed. If you find that for whatever reason one of the checklist points doesn't apply to your PR, you can leave it unchecked but please add an explanation below.

Content

  • [x ] Added or modified tests to cover changes and any new functionality
  • Edited the README / documentation (if applicable)
  • All tests passing on github actions

Review

  • Final thumbs-up from reviewer
  • Rebase, squash and reword as applicable

For more detailed guidelines, see https://github.com/broadinstitute/picard/wiki/Guidelines-for-pull-requests

…ting sam files that have a mismatching dictionary from the one stated in the haplotype map (it works for vcfs). This is because when fingerprinting sam files, the SamLocusIterator is given an IntervalList to query, and this IL has a mismatching dictionary which isn't supported.

This PR resolves this by creating a new IL just before querying the Iterator using the proper dictionary.
Copy link
Contributor

@kachulis kachulis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @yfarjoun! Just what I think should be one very quick request, otherwise looks good to me.

@Test(dataProvider = "bamFilesRGs")
public void testCrossCheckRGs_with_short_dictionary(final File file1, final File file2, final boolean expectAllMatch, final int expectedRetVal, final int expectedNMetrics) throws IOException {

File metrics = File.createTempFile("Fingerprinting", "NA1291.RG.crosscheck_metrics");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

final

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.


if (file1.getName().endsWith(SamReader.Type.CRAM_TYPE.fileExtension())) {
args.add("R=" + referenceForCrams);
args.add("HAPLOTYPE_MAP=" + HAPLOTYPE_MAP_FOR_CRAMS);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you add a HAPLOTYPE_MAP_SHORT_FOR_CRAMS as well just so we're testing that the functionality works for both bams and crams?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

@yfarjoun yfarjoun requested a review from kachulis April 5, 2024 00:59
@yfarjoun
Copy link
Contributor Author

yfarjoun commented Apr 5, 2024

Hi @kachulis I'm not sure what the failure it about, but I'm pretty sure it's not due to my changes....

Copy link
Contributor

@kachulis kachulis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like the error is related to the new data provider. I think you've ended up duplicating some tests in a way I'm not sure you intended.

private Object[][] haplotypeMaps(){
return new Object[][]{{HAPLOTYPE_MAP}, {HAPLOTYPE_MAP_SHORT}};
}
@DataProvider(name = "bamFilesRGs")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this has gotten a little twisted around. You're testing the both full and short dictionary here, but only the full dictionary crams, and then you also have testCrossCheckRGs_with_short_dictionary, where you basically do the same thing, expect hardcoded instead of via the data provider. I think that's also where the test failure is coming from, bamFilesRGs now includes the haplotypeMaps, but testCrossCheckRGs_with_short_dictionary doesn't include the haplotypeMaps as a parameter input.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😊 huh. I saw other errors before. yes, something got strange.

@yfarjoun
Copy link
Contributor Author

yfarjoun commented Apr 5, 2024

OK. back to you again @kachulis . thanks for your patience.

Copy link
Contributor

@kachulis kachulis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good, thanks @yfarjoun!

@yfarjoun
Copy link
Contributor Author

yfarjoun commented Apr 5, 2024

I've seems to have lost my merge privileges....be my guest!

@kockan kockan merged commit 0495b01 into broadinstitute:master Apr 5, 2024
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants