Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NEW: filter_kraken2_classifications action #226

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

cherman2
Copy link
Contributor

@cherman2 cherman2 commented Jan 6, 2025

Hi Bok Lab!

I recently was taking to a collaborator who's area of expertise is metagenomic contaminations/spurious hits. They suggested that I should be preforming low abundance filtering on my data to filter out spurious hits. They suggested that the kraken reports is the best place to filter, so that bracken isn't estimating taxons that were spuriously assigned.

@colinvwood and I develped a methods to allow for filtering the kraken reports by abundance! This method also filters the output files so that the reports and outputs are not out of sync.

Just as a side note: I used this code on a dataset of mine and found that filtering at 0.0001 (which means the taxon has to have roughly 15-60 hits or it gets discarded) retains my diversity signal but minizes the low abundance feature overlap that was happening in my samples!

Let me know if y'all have any questions!

Copy link

codecov bot commented Jan 6, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 95.62%. Comparing base (b6d068a) to head (51a3b7f).
Report is 9 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #226      +/-   ##
==========================================
+ Coverage   95.60%   95.62%   +0.01%     
==========================================
  Files          34       34              
  Lines        1956     2010      +54     
  Branches      226      235       +9     
==========================================
+ Hits         1870     1922      +52     
- Misses         48       49       +1     
- Partials       38       39       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@misialq misialq requested a review from Copilot January 14, 2025 09:54
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot reviewed 3 out of 13 changed files in this pull request and generated 1 comment.

Files not reviewed (10)
  • q2_moshpit/kraken2/tests/data/abundance-filter/outputs-only-unclassified/sample1.output.txt: Language not supported
  • q2_moshpit/kraken2/tests/data/abundance-filter/outputs-only-unclassified/sample2.output.txt: Language not supported
  • q2_moshpit/kraken2/tests/data/abundance-filter/outputs/sample1.output.txt: Language not supported
  • q2_moshpit/kraken2/tests/data/abundance-filter/outputs/sample2.output.txt: Language not supported
  • q2_moshpit/kraken2/tests/data/abundance-filter/reports-only-unclassified/sample1.report.txt: Language not supported
  • q2_moshpit/kraken2/tests/data/abundance-filter/reports-only-unclassified/sample2.report.txt: Language not supported
  • q2_moshpit/kraken2/tests/data/abundance-filter/reports-w-unclassified/sample1.report.txt: Language not supported
  • q2_moshpit/kraken2/tests/data/abundance-filter/reports-w-unclassified/sample2.report.txt: Language not supported
  • q2_moshpit/kraken2/tests/data/abundance-filter/reports/sample1.report.txt: Language not supported
  • q2_moshpit/kraken2/tests/data/abundance-filter/reports/sample2.report.txt: Language not supported
Comments suppressed due to low confidence (1)

q2_moshpit/kraken2/classification.py:278

  • [nitpick] Consider including the sample_id in the error message for better debugging.
raise ValueError("All Taxonomic bins were filtered by the"

report_df = report.view(pd.DataFrame)

if (len(report_df) == 1):
raise ValueError("Kraken2 abundance filtering can not be preformed"
Copy link
Preview

Copilot AI Jan 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The word 'preformed' should be 'performed'.

Suggested change
raise ValueError("Kraken2 abundance filtering can not be preformed"
raise ValueError("Kraken2 abundance filtering can not be performed"

Copilot is powered by AI, so mistakes are possible. Review output carefully before use.

Positive Feedback
Negative Feedback

Provide additional feedback

Please help us improve GitHub Copilot by sharing more details about this comment.

Please select one or more of the options
Copy link
Contributor

@misialq misialq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @cherman2, this looks great - thanks! 🏅

Please see the very minor cosmetic comment below.

parameter_descriptions={},
output_descriptions={},
name='Filter Kraken2 Classifications by Abundance',
description='...',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we just put here the text from the "name" field or did you want to keep the ...?

Comment on lines +1819 to +1821
input_descriptions={},
parameter_descriptions={},
output_descriptions={},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please provide these missing descriptions 🙏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants