[FEATURE REQUESTS] - post here for suggestions/feature requests #6

bluenote-1577 · 2023-12-16T22:41:08Z

Feature requests

Purpose: this is a place to easily log suggestions/feature requests. E.g:

"I want to display XXX output as an option!"
"I want to be able to combine database sketches!"

Give a rationale and provide concise/clear instructions if possible. Opinions are welcome too.

You're welcome to email me or open another issue. This thread is to aggregate suggestions without the hassle of opening another issue.

Current feature requests

Here are some current feature requests.

Originally posted by @jolespin in bluenote-1577/skani#23 (comment)

~~Option for renaming samples. Sylph currently fixes each sample sketch to the read names. `~~ done in v0.5.0
Command line options for inspecting database sketches.
Command line option to append/merge databases.
~~Line-delimited file for database sketches for sylph profile/query~~ done in v0.5.0

@fplaza #6 (comment)

~~Save read length while sketching so the user does not have to provide it to compute true coverage.~~ done in v0.5.0

#7

Different ways of sketching reads and groupings

The text was updated successfully, but these errors were encountered:

fplazaonate · 2023-12-17T08:33:22Z

Hi Jim,
Here is a suggestion:
Save read length while sketching so the user does not have to provide it to compute true coverage.

astrovsky01 · 2024-04-05T19:04:37Z

Is there/could there be a method to output unassigned reads into a file as an output?

bluenote-1577 · 2024-04-05T19:40:32Z

Hi @astrovsky01,

This is unfortunately not possible due to the way sylph works. It doesn't classify each read. It operates on the ensemble of reads. This means it can not output unassigned reads, only estimate the percentage of.

jolespin · 2024-07-12T17:33:36Z

@bluenote-1577 I'm not sure how the backend algorithm works but is it possible to add Align_fraction_ref to Sylph output similar to Skani? Would be useful to know how much of the genomes being profiled are covered.

bluenote-1577 · 2024-07-12T18:05:55Z

@jolespin Hi Josh, unfortunately this isn't possible. This is because skani actually tries to get a pseudo-ish alignment, by sylph doesn't do anything like that. I agree it would be very nice if it were possible though...

jolespin · 2024-07-12T19:31:29Z

Ok that's good to know! Would finding the overlap in kmers do the trick or is it way more complicated than that?

jolespin · 2024-07-23T01:32:08Z

Also one more question, does sylph allow for outputting abundance instead of relative abundance?

bluenote-1577 · 2024-07-23T03:48:47Z

@jolespin sorry for the late response:

There may be something that could be done for pseudo-alignment overlapping k-mers ... but it's a very nontrivial algorithmic thing :)
What do you mean by abundance instead of relative abundance? Sylph outputs coverage (Est_cov) if that's what could be helpful

jolespin · 2024-07-23T17:16:45Z

There may be something that could be done for pseudo-alignment overlapping k-mers ... but it's a very nontrivial algorithmic thing :)

I can imagine that is quite complicated. Themisto just popped up on my radar so I'm going to give this a try soon.

What do you mean by abundance instead of relative abundance? Sylph outputs coverage (Est_cov) if that's what could be helpful

Is the Est_cov what you use before normalizing the Taxonomic abundance?

Taxonomic_abundance: normalized taxonomic abundance as a percentage. Coverage-normalized - same as MetaPhlAn abundance
https://github.com/bluenote-1577/sylph/wiki/Output-format

I'm mostly curious on how some of my compositionally valid network analysis (https://github.com/jolespin/ensemble_networkx) differs between coverage normalized and unnormalized data but definitely not a critical assessment. Just a bit of curiosity.

jolespin · 2024-10-01T19:25:06Z

This is unfortunately not possible due to the way sylph works. It doesn't classify each read. It operates on the ensemble of reads. This means it can not output unassigned reads, only estimate the percentage of.

I'm looking at the docs now and not sure which field indicates the % of reads/k-mers not aligned/overlapping(sorry if that's the wrong term) with the k-mers in the database/sketch. If this isn't currently available, would it be possible to add this metric? It would greatly benefit my workflow when determining whether or not I want to assemble/bin genomes from a metagenomic assembly.

bluenote-1577 · 2024-10-01T19:33:04Z

@jolespin

You'll notice that the "Sequence abundance" column doesn't sum to 100% if the -u option is specified (unless sylph determines that 100% of your reads are classified at species level). Ths sum of this column indicates the % of reads classified at species level. That is, the sequence abundance is scaled by the % of classified reads. So

So without -u, you'll get

Species 1 50%
Species 2 50%

But with -u, if 10% of the reads come from species-level detected genomes, you'll get

Species 1 5%
Species 2 5%

let me know if that makes sense. I didn't want to add a new column because it doesn't really make sense... but this is a bit non-obvious to see

jolespin · 2024-10-01T19:37:48Z

Excellent! Love that functionality. So essentially, I can just run -u and then w/ Pandas do something like X["unclassified"] = 100 - X.sum(axis=1) if I want the % of unclassified reads in the table. Alternatively, I can do (X/X.sum(axis=1).values.reshape(-1,1)) * 100 before I add the unclassified column to get the relative abundance of the taxa.

bluenote-1577 · 2024-10-01T19:41:45Z

@jolespin yes exactly. BTW, I discovered the "discussion" feature in github. I think we can migrate the questions there, as well as suggestions, perhaps keeping this thread for specific feature requests.

bluenote-1577 added the enhancement New feature or request label Dec 16, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEATURE REQUESTS] - post here for suggestions/feature requests #6

[FEATURE REQUESTS] - post here for suggestions/feature requests #6

bluenote-1577 commented Dec 16, 2023 •

edited

Loading

fplazaonate commented Dec 17, 2023

astrovsky01 commented Apr 5, 2024

bluenote-1577 commented Apr 5, 2024

jolespin commented Jul 12, 2024 •

edited

Loading

bluenote-1577 commented Jul 12, 2024 •

edited

Loading

jolespin commented Jul 12, 2024

jolespin commented Jul 23, 2024

bluenote-1577 commented Jul 23, 2024

jolespin commented Jul 23, 2024

jolespin commented Oct 1, 2024

bluenote-1577 commented Oct 1, 2024

jolespin commented Oct 1, 2024

bluenote-1577 commented Oct 1, 2024

[FEATURE REQUESTS] - post here for suggestions/feature requests #6

[FEATURE REQUESTS] - post here for suggestions/feature requests #6

Comments

bluenote-1577 commented Dec 16, 2023 • edited Loading

Feature requests

Current feature requests

fplazaonate commented Dec 17, 2023

astrovsky01 commented Apr 5, 2024

bluenote-1577 commented Apr 5, 2024

jolespin commented Jul 12, 2024 • edited Loading

bluenote-1577 commented Jul 12, 2024 • edited Loading

jolespin commented Jul 12, 2024

jolespin commented Jul 23, 2024

bluenote-1577 commented Jul 23, 2024

jolespin commented Jul 23, 2024

jolespin commented Oct 1, 2024

bluenote-1577 commented Oct 1, 2024

jolespin commented Oct 1, 2024

bluenote-1577 commented Oct 1, 2024

bluenote-1577 commented Dec 16, 2023 •

edited

Loading

jolespin commented Jul 12, 2024 •

edited

Loading

bluenote-1577 commented Jul 12, 2024 •

edited

Loading