Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Output of umi_tools extract not compatible with umi_tools count_tab #651

Closed
eachanjohnson opened this issue Jul 6, 2024 · 2 comments
Closed

Comments

@eachanjohnson
Copy link
Contributor

The command umi_tools extract results in read names being suffixed with the pattern _[cell barcode]_[UMI]. See the docs here for an example.

However, umi_tools count_tab expects read names suffixed with the pattern _[UMI]_[cell barcode]. See the docs here.

As a result, pipelines naively expecting to use the output of umi_tools extract for umi_tools count_tab (after e.g. a cut | sort manipulation) will have incorrect output.

This does not seem to be simply a documentation error. On this line, umi_tools count_tab counts the barcodes using sam_methods.get_gene_count_tab(), which by default uses the sam_methods.get_cell_umi_read_string() function, returning the tuple (read_id.split(sep)[-1].encode('utf-8'), read_id.split(sep)[-2].encode('utf-8')). For the output read names from extract, this corresponds to (UMI, cell barcode). But then this output is unpacked here as cell, umi = bc_getter(read_id). So the cell barcode and UMI are swapped around.

Apologies if I've missed a step, and this behaviour is intended. I thought I should point it out to save others some trouble in future.

@IanSudbery
Copy link
Member

Thanks for this, it does indeed seem that you are correct. @TomSmithCGAT - any thoughts? Did we swtich the order at somepoint and forget to propogate through to count_tab?

@IanSudbery
Copy link
Member

Since #654 has been merged, can we close this issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants