Output of `umi_tools extract` not compatible with `umi_tools count_tab` #651

eachanjohnson · 2024-07-06T11:55:03Z

The command umi_tools extract results in read names being suffixed with the pattern _[cell barcode]_[UMI]. See the docs here for an example.

However, umi_tools count_tab expects read names suffixed with the pattern _[UMI]_[cell barcode]. See the docs here.

As a result, pipelines naively expecting to use the output of umi_tools extract for umi_tools count_tab (after e.g. a cut | sort manipulation) will have incorrect output.

This does not seem to be simply a documentation error. On this line, umi_tools count_tab counts the barcodes using sam_methods.get_gene_count_tab(), which by default uses the sam_methods.get_cell_umi_read_string() function, returning the tuple (read_id.split(sep)[-1].encode('utf-8'), read_id.split(sep)[-2].encode('utf-8')). For the output read names from extract, this corresponds to (UMI, cell barcode). But then this output is unpacked here as cell, umi = bc_getter(read_id). So the cell barcode and UMI are swapped around.

Apologies if I've missed a step, and this behaviour is intended. I thought I should point it out to save others some trouble in future.

The text was updated successfully, but these errors were encountered:

IanSudbery · 2024-07-08T10:43:25Z

Thanks for this, it does indeed seem that you are correct. @TomSmithCGAT - any thoughts? Did we swtich the order at somepoint and forget to propogate through to count_tab?

IanSudbery · 2024-08-08T10:30:48Z

Since #654 has been merged, can we close this issue?

eachanjohnson mentioned this issue Jul 31, 2024

Make count_tab compatible with umi_tools extract #654

Merged

eachanjohnson closed this as completed Aug 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Output of `umi_tools extract` not compatible with `umi_tools count_tab` #651

Output of `umi_tools extract` not compatible with `umi_tools count_tab` #651

eachanjohnson commented Jul 6, 2024

IanSudbery commented Jul 8, 2024

IanSudbery commented Aug 8, 2024

Output of umi_tools extract not compatible with umi_tools count_tab #651

Output of umi_tools extract not compatible with umi_tools count_tab #651

Comments

eachanjohnson commented Jul 6, 2024

IanSudbery commented Jul 8, 2024

IanSudbery commented Aug 8, 2024

Output of `umi_tools extract` not compatible with `umi_tools count_tab` #651

Output of `umi_tools extract` not compatible with `umi_tools count_tab` #651