NEW: filter_kraken2_classifications action #226

cherman2 · 2025-01-06T22:52:05Z

Hi Bok Lab!

I recently was taking to a collaborator who's area of expertise is metagenomic contaminations/spurious hits. They suggested that I should be preforming low abundance filtering on my data to filter out spurious hits. They suggested that the kraken reports is the best place to filter, so that bracken isn't estimating taxons that were spuriously assigned.

@colinvwood and I develped a methods to allow for filtering the kraken reports by abundance! This method also filters the output files so that the reports and outputs are not out of sync.

Just as a side note: I used this code on a dataset of mine and found that filtering at 0.0001 (which means the taxon has to have roughly 15-60 hits or it gets discarded) retains my diversity signal but minizes the low abundance feature overlap that was happening in my samples!

Let me know if y'all have any questions!

codecov · 2025-01-06T23:01:28Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 95.62%. Comparing base (b6d068a) to head (51a3b7f).
Report is 9 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #226      +/-   ##
==========================================
+ Coverage   95.60%   95.62%   +0.01%     
==========================================
  Files          34       34              
  Lines        1956     2010      +54     
  Branches      226      235       +9     
==========================================
+ Hits         1870     1922      +52     
- Misses         48       49       +1     
- Partials       38       39       +1

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copilot

Copilot reviewed 3 out of 13 changed files in this pull request and generated 1 comment.

Files not reviewed (10)

q2_moshpit/kraken2/tests/data/abundance-filter/outputs-only-unclassified/sample1.output.txt: Language not supported
q2_moshpit/kraken2/tests/data/abundance-filter/outputs-only-unclassified/sample2.output.txt: Language not supported
q2_moshpit/kraken2/tests/data/abundance-filter/outputs/sample1.output.txt: Language not supported
q2_moshpit/kraken2/tests/data/abundance-filter/outputs/sample2.output.txt: Language not supported
q2_moshpit/kraken2/tests/data/abundance-filter/reports-only-unclassified/sample1.report.txt: Language not supported
q2_moshpit/kraken2/tests/data/abundance-filter/reports-only-unclassified/sample2.report.txt: Language not supported
q2_moshpit/kraken2/tests/data/abundance-filter/reports-w-unclassified/sample1.report.txt: Language not supported
q2_moshpit/kraken2/tests/data/abundance-filter/reports-w-unclassified/sample2.report.txt: Language not supported
q2_moshpit/kraken2/tests/data/abundance-filter/reports/sample1.report.txt: Language not supported
q2_moshpit/kraken2/tests/data/abundance-filter/reports/sample2.report.txt: Language not supported

Comments suppressed due to low confidence (1)

q2_moshpit/kraken2/classification.py:278

[nitpick] Consider including the sample_id in the error message for better debugging.

raise ValueError("All Taxonomic bins were filtered by the"

Copilot · 2025-01-14T09:55:40Z

q2_moshpit/kraken2/classification.py

+        report_df = report.view(pd.DataFrame)
+
+        if (len(report_df) == 1):
+            raise ValueError("Kraken2 abundance filtering can not be preformed"


The word 'preformed' should be 'performed'.

Suggested change

raise ValueError("Kraken2 abundance filtering can not be preformed"

raise ValueError("Kraken2 abundance filtering can not be performed"

misialq

Hey @cherman2, this looks great - thanks! 🏅

Please see the very minor cosmetic comment below.

misialq · 2025-01-14T10:01:22Z

q2_moshpit/plugin_setup.py

+    parameter_descriptions={},
+    output_descriptions={},
+    name='Filter Kraken2 Classifications by Abundance',
+    description='...',


Should we just put here the text from the "name" field or did you want to keep the ...?

misialq · 2025-01-14T10:02:07Z

q2_moshpit/plugin_setup.py

+    input_descriptions={},
+    parameter_descriptions={},
+    output_descriptions={},


Please provide these missing descriptions 🙏

cherman2 added 6 commits December 11, 2024 15:17

first implementation of abundance filtering

5db9bc2

make method action

158e8bd

add typematch

5fdd514

fix typematch

1d89d72

fix typematch for outputs

59965d1

enable filtering more than 0.01

916cb3c

Merge branch 'main' into chloe-filter-abundance

51a3b7f

misialq requested a review from Copilot January 14, 2025 09:54

Copilot AI reviewed Jan 14, 2025

View reviewed changes

misialq requested changes Jan 14, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NEW: filter_kraken2_classifications action #226

NEW: filter_kraken2_classifications action #226

cherman2 commented Jan 6, 2025

codecov bot commented Jan 6, 2025 •

edited

Loading

Copilot AI left a comment

Copilot AI Jan 14, 2025

Provide additional feedback

Please help us improve GitHub Copilot by sharing more details about this comment.

misialq left a comment

misialq Jan 14, 2025

misialq Jan 14, 2025

	raise ValueError("Kraken2 abundance filtering can not be preformed"
	raise ValueError("Kraken2 abundance filtering can not be performed"

NEW: filter_kraken2_classifications action #226

Are you sure you want to change the base?

NEW: filter_kraken2_classifications action #226

Conversation

cherman2 commented Jan 6, 2025

codecov bot commented Jan 6, 2025 • edited Loading

Codecov Report

Copilot AI left a comment

Choose a reason for hiding this comment

Copilot AI Jan 14, 2025

Choose a reason for hiding this comment

misialq left a comment

Choose a reason for hiding this comment

misialq Jan 14, 2025

Choose a reason for hiding this comment

misialq Jan 14, 2025

Choose a reason for hiding this comment

codecov bot commented Jan 6, 2025 •

edited

Loading